lenskit.data.EntitySet#
- class lenskit.data.EntitySet(name, schema, vocabulary, table, _sel=None)#
Representation of a set of entities from the dataset. Obtained from
Dataset.entities().Client code does not need to construct this class; obtain instances from a dataset’s
entities()method.- Parameters:
name (str)
schema (lenskit.data.schema.EntitySchema)
vocabulary (lenskit.data._vocab.Vocabulary)
table (pyarrow.Table)
_sel (pyarrow.Int32Array | None)
- vocabulary: lenskit.data._vocab.Vocabulary#
The identifier vocabulary for this schema.
- ids()#
Get the identifiers of the entities in this set. This is returned directly as PyArrow array instead of NumPy.
- Return type:
- numbers()#
Get the numbers (from the vocabulary) for the entities in this set.
- Return type:
numpy.ndarray[tuple[int], numpy.dtype[numpy.int32]]
- arrow()#
Get these entities and their attributes as a PyArrow table.
- Return type:
- pandas()#
Get the entities and their attributes as a Pandas data frame.
- Return type:
- attribute(name)#
Get values of an attribute for the entites in this entity set.
- Parameters:
name (str)
- Return type:
- select(*, ids: lenskit.data.types.IDSequence | None = None) EntitySet#
- select(*, numbers: numpy.ndarray[tuple[int], numpy.dtype[numpy.integer[Any]]] | pyarrow.IntegerArray[Any] | None = None) EntitySet
Select a subset of the entities in this set.
Note
The vocabulary is unchanged, so numbers in the resulting set will be entity numbers in the dataset’s vocabulary. They are not rearranged to be relative to this entity set.
- Parameters:
ids – The entity identifiers to select.
numbers – The entity numbers to select.
- Returns:
The entity subset.
- __len__()#