lenskit.data.ItemList
=====================

.. py:class:: lenskit.data.ItemList(source: ItemList, *, ordered: bool | None = None, vocabulary: lenskit.data._vocab.Vocabulary | None = None, scores: numpy.typing.NDArray[numpy.generic] | torch.Tensor | numpy.typing.ArrayLike | Literal[False] | numpy.floating | float | None = None, **fields: numpy.typing.NDArray[numpy.generic] | torch.Tensor | numpy.typing.ArrayLike | Literal[False])
   :canonical: lenskit.data._items.ItemList
              ItemList(source = None, *, item_nums = None, vocabulary = None, ordered = None, scores = None, **fields)
              ItemList(source = None, *, item_ids = None, item_nums = None, vocabulary = None, ordered = None, scores = None, **fields)

   Representation of a (usually ordered) list of items, possibly with scores
   and other associated data; many components take and return item lists.  Item
   lists are to be treated as **immutable** — create a new list with modified
   data, do not do in-place modifications of the list itself or the arrays or
   data frame it returns.

   To get the length of an item list, use the standard :func:`len` function.

   An item list logically a list of rows, each of which is an item with
   multiple fields.  A designated field, ``score``, is available through the
   :meth:`scores` method, and is always single-precision floating-point.

   Item lists can be subset as an array (e.g. ``items[selector]``), where
   integer indices (or arrays thereof), boolean arrays, and slices are allowed
   as selectors.

   When an item list is pickled, it is pickled compactly but only for CPUs: the
   vocabulary is dropped (after ensuring both IDs and numbers are computed),
   and all arrays are pickled as NumPy arrays.  This makes item lists compact
   to serialize and transmit, but does mean that that serializing an item list
   whose scores are still on the GPU will deserialize on the CPU in the
   receiving process.  This is usually not a problem, because item lists are
   typically used for small lists of items, not large data structures that need
   to remain in shared memory.

   In a few places, the “items” in an item list may be other entities, such as
   users, tags, authors, etc.  This seems less confusing than calling it
   ``EntityList``, but having a very different use case and feature set than
   :class:`~lenskit.data.EntitySet`.

   .. note::

       Naming for fields and accessor methods is tricky, because the usual
       convention for a data frame is to use singular column names (e.g.
       “item_id”, “score”) instead of plural (“item_ids”, “scores”) — the data
       frame, like a database table, is a list of instances, and the column
       names are best interpreted as naming attributes of individual instances.

       However, when working with a list of e.g. item IDs, it is more natural —
       at least to this author — to use plural names: ``item_ids``.  Since this
       class is doing somewhat double-duty, representing a list of items along
       with associated data, as well as a data frame of columns representing
       items, the appropriate naming is not entirely clear.  The naming
       convention in this class is therefore as follows:

       * Field names are singular (``item_id``, ``score``).
       * Named accessor methods are plural (:meth:`item_ids`, :meth:`scores`).
       * Both singular and plural forms are accepted for item IDs numbers, and
         scores in the keyword arguments.  Other field names should be
         singular.

   .. todo::

       Right now, selection / subsetting only happens on the CPU, and will move
       data to the CPU for the subsetting operation.  There is no reason, in
       principle, why we cannot subset on GPU.  Future revisions may add
       support for this.

   :param source: A source item list. If provided and an :class:`ItemList`, its fields
                  and data are used to initialize any aspects of the item list that
                  are not provided in the other arguments.  Otherwise, it is
                  interpreted as ``item_ids``.
   :param item_ids: A list or array of item identifiers. ``item_id`` is accepted as an
                    alternate name.
   :param item_nums: A list or array of item numbers. ``item_num`` is accepted as an
                     alternate name.
   :param vocabulary: A vocabulary to translate between item IDs and numbers.
   :param ordered: Whether the list has a meaningful order.
   :param scores: An array of scores for the items.  Pass the value ``False`` to
                  remove the scores when copying from a source list.
   :param fields: Additional fields, such as ``score`` or ``rating``.  Field names
                  should generally be singular; the named keyword arguments and
                  accessor methods are plural for readability (“get the list of item
                  IDs”).  Pass the value ``False`` to remove the field when copying
                  from a source list.

   :Stability: Caller


   .. py:attribute:: ordered
      :type:  bool
      :value: False


      Whether this list has a meaningful order.


   .. py:method:: from_df(df, *, vocabulary = None, keep_user = False)
      :classmethod:


      Create a item list from a Pandas data frame.  The frame should have
      ``item_num`` and/or ``item_id`` columns to identify the items; other
      columns (e.g. ``score`` or ``rating``) are added as fields. If the data
      frame has user columns (``user_id`` or ``user_num``), those are dropped
      by default.

      :param df: The data frame to turn into an item list.
      :param vocabulary: The item vocabulary.
      :param keep_user: If ``True``, keeps user ID/number columns instead of dropping them.


   .. py:method:: from_arrow(tbl, *, vocabulary = None)
      :classmethod:


      Create a item list from a Pandas table or structured array.  The table
      should have ``item_num`` and/or ``item_id`` columns to identify the
      items; other columns (e.g. ``score`` or ``rating``) are added as fields.
      If the data frame has user columns (``user_id`` or ``user_num``), those
      are dropped by default.

      :param tbl: The Arrow table or array to convert to an item list.
      :param vocabulary: The item vocabulary.


   .. py:method:: from_vocabulary(vocab)
      :classmethod:


   .. py:method:: clone()

      Make a shallow copy of the item list.


   .. py:property:: vocabulary
      :type: lenskit.data._vocab.Vocabulary | None


      Get the item list's vocabulary, if available.


   .. py:method:: ids(*, format: Literal['numpy'] = 'numpy') -> lenskit.data.types.IDArray
                  ids(*, format: Literal['arrow']) -> pyarrow.Array

      Get the item IDs.

      :returns: An array of item identifiers.

      :raises RuntimeError: if the item list was not created with IDs or a :class:`Vocabulary`.


   .. py:method:: numbers(format: Literal['numpy'] = 'numpy', *, vocabulary: lenskit.data._vocab.Vocabulary | None = None, missing: Literal['error', 'negative'] = 'error') -> lenskit.data.types.NPVector[numpy.int32]
                  numbers(format: Literal['torch'], *, vocabulary: lenskit.data._vocab.Vocabulary | None = None, missing: Literal['error', 'negative'] = 'error') -> torch.Tensor
                  numbers(format: Literal['arrow'], *, vocabulary: lenskit.data._vocab.Vocabulary | None = None, missing: Literal['error', 'negative', 'null'] = 'error') -> pyarrow.Array[pyarrow.Int32Scalar]
                  numbers(format: LiteralString = 'numpy', *, vocabulary: lenskit.data._vocab.Vocabulary | None = None, missing: Literal['error', 'negative'] = 'error') -> numpy.typing.ArrayLike

      Get the item numbers.

      :param format: The array format to use.
      :param vocabulary: A alternate vocabulary for mapping IDs to numbers.  If provided,
                         then the item list must have IDs (either stored, or through a
                         vocabulary).

      :returns: An array of item numbers.

      :raises RuntimeError: if the item list was not created with numbers or a
      :raises Vocabulary:


   .. py:method:: scores(format: Literal['numpy'] = 'numpy') -> lenskit.data.types.NPVector | None
                  scores(format: Literal['torch']) -> torch.Tensor | None
                  scores(format: Literal['arrow']) -> pyarrow.Array[pyarrow.FloatScalar] | None
                  scores(format: Literal['pandas'], *, index: Literal['ids', 'numbers'] | None = None) -> pandas.Series | None
                  scores(format: LiteralString = 'numpy') -> numpy.typing.ArrayLike | None

      Get the item scores (if available).


   .. py:method:: ranks(format: Literal['numpy'] = 'numpy') -> numpy.typing.NDArray[numpy.int32] | None
                  ranks(format: Literal['torch']) -> torch.Tensor | None
                  ranks(format: Literal['arrow']) -> pyarrow.Array[pyarrow.Int32Scalar] | None
                  ranks(format: Literal['pandas']) -> pandas.Series[int] | None
                  ranks(format: LiteralString = 'numpy') -> numpy.typing.ArrayLike | None

      Get an array of ranks for the items in this list, if it is ordered.
      Unordered lists have no ranks.  The ranks are based on the order in the
      list, **not** on the score.

      Item ranks start with **1**, for compatibility with common practice in
      mathematically defining information retrieval metrics and operations.

      :returns: An array of item ranks, or ``None`` if the list is unordered.


   .. py:method:: field(name: str, format: Literal['numpy'] = 'numpy') -> numpy.typing.NDArray[numpy.floating] | None
                  field(name: str, format: Literal['torch']) -> torch.Tensor | None
                  field(name: str, format: Literal['arrow']) -> pyarrow.Array | pyarrow.Tensor | None
                  field(name: str, format: Literal['pandas'], *, index: Literal['ids', 'numbers'] | None = None) -> pandas.Series | None
                  field(name: str, format: Literal['multi']) -> lenskit.data._mtarray.MTArray | None
                  field(name: str, format: LiteralString) -> numpy.typing.ArrayLike | None


   .. py:method:: isin(other)

      Return a boolean mask identifying the items of this list that are in the
      other list.

      This is equivalent to :func:`numpy.isin` applied to the ID arrays, but
      is much more efficient in many cases.


   .. py:method:: to_df(*, ids = True, numbers = True)

      Convert this item list to a Pandas data frame.  It has the following columns:

      * ``item_id`` — the item IDs (if available and ``ids=True``)
      * ``item_num`` — the item numbers (if available and ``numbers=True``)
      * ``score`` — the item scores
      * ``rank`` — the item ranks (if the list is ordered)
      * all other defined fields, using their field names


   .. py:method:: to_arrow(*, ids: bool = True, numbers: bool = False, ranks: bool = True, type: Literal['table'] = 'table', columns: collections.abc.Sequence[str] | None = None) -> pyarrow.Table
                  to_arrow(*, ids: bool = True, numbers: bool = False, ranks: bool = True, type: Literal['array'], columns: collections.abc.Sequence[str] | None = None) -> pyarrow.StructArray

      Convert the item list to a Pandas table.


   .. py:method:: arrow_schema(*, ids = True, numbers = False, ranks = True)


   .. py:method:: arrow_types(*, ids = True, numbers = False, ranks = True)

      Get the Arrow data types for this item list.


   .. py:method:: top_n(n = None, *, scores = None)

      Get the top _N_ items in this list, sorted in decreasing order.

      If any scores are undefined (``NaN``), those items are excluded.

      :param n: The number of items.  If ``None`` or negative, returns all items
                sorted by score.
      :param scores: The name of a field containing the scores, or a NumPy vector of
                     scores, for selecting the top _N_.  If ``None``, the item list's
                     scores are used.

      :returns: An ordered item list containing the top ``n`` items.


   .. py:method:: concat(other)

      Concatenate this item list with another.

      If ``self`` has a vocabulary that accomodates the item IDs in ``other``,
      the returned item list will share the vocabulary.


   .. py:method:: update(other)

      Create a copy of the item list, updated with new data from another list.

      This creates a new item list with the same items as ``self``, but with
      new or modified fields from ``other``.  For any item in both ``self``
      and ``other``, and each field present in ``other``, the values of that
      field in ``other`` are used instead of ``self``.  That is:

      - If a field is only present in ``self``, it is used unchanged.
      - If a field is only present in ``other``, its values are used for the
        items that appear in both sets, and items only in ``self`` have an
        unset / null value for the field (``NaN`` for foating arrays).
      - If a field is present in both lists, then its values from ``other``
        are used for items appearing in ``other``, and the values from
        ``self`` are used for items that appear only in ``self``.
      - Items that appear only in ``other`` are ignored.

      .. note::
          Only the *presence or absence* of an item in ``self`` and ``other``
          is considered, not the nullity of individual field values.  If a
          field appears in both ``self`` and ``other``, and some of its values
          in ``other`` are null or ``NaN``, then they will be null or ``NaN``
          in the resulting item list, regardless of the value of that field
          for those items on ``self``.

          That is, this method does not do item-level coalescing of null field
          values.


      :param other: The item list to merge into this one.

      :returns: The updated item list.


   .. py:method:: remove(*, ids: lenskit.data.types.IDSequence | None = None) -> ItemList
                  remove(*, numbers: pyarrow.Int32Array | lenskit.data.types.NPVector[numpy.integer] | pandas.Series[int] | None = None) -> ItemList

      Return an item list with the specified items removed.

      The items to remove are not required to be in the list.

      :param ids: The item IDs to remove.
      :param numbers: The item numbers to remove.


   .. py:method:: __len__()


   .. py:method:: __getitem__(sel)

      Subset the item list.

      .. todo::
          Support on-device masking.

      :param sel: The items to select. Can be either a Boolean array of the same
                  length as the list that is ``True`` to indicate selected items,
                  or an array of indices of the items to retain (in order in the
                  list, starting from 0).


   .. py:method:: __getstate__()


   .. py:method:: __setstate__(state)


   .. py:method:: __str__()


   .. py:method:: __repr__()