lenskit.data.EntitySet#

class lenskit.data.EntitySet(name, schema, vocabulary, table, _sel=None)#

Representation of a set of entities from the dataset. Obtained from Dataset.entities().

Client code does not need to construct this class; obtain instances from a dataset’s entities() method.

Parameters:
name: str#

The name of the entity class for these entities.

schema: lenskit.data.schema.EntitySchema#
vocabulary: lenskit.data._vocab.Vocabulary#

The identifier vocabulary for this schema.

property attributes: list[str]#

Get the attribute names for this entity class.

Return type:

list[str]

count()#

Return the number of entities in this entity set.

Return type:

int

ids()#

Get the identifiers of the entities in this set. This is returned directly as PyArrow array instead of NumPy.

Return type:

lenskit.data.types.IDArray

numbers()#

Get the numbers (from the vocabulary) for the entities in this set.

Return type:

numpy.ndarray[tuple[int], numpy.dtype[numpy.int32]]

arrow()#

Get these entities and their attributes as a PyArrow table.

Return type:

pyarrow.Table

pandas()#

Get the entities and their attributes as a Pandas data frame.

Return type:

pandas.DataFrame

attribute(name)#

Get values of an attribute for the entites in this entity set.

Parameters:

name (str)

Return type:

lenskit.data._attributes.EntityAttribute

select(*, ids: lenskit.data.types.IDSequence | None = None) EntitySet#
select(*, numbers: numpy.ndarray[tuple[int], numpy.dtype[numpy.integer[Any]]] | pyarrow.IntegerArray[Any] | None = None) EntitySet

Select a subset of the entities in this set.

Note

The vocabulary is unchanged, so numbers in the resulting set will be entity numbers in the dataset’s vocabulary. They are not rearranged to be relative to this entity set.

Parameters:
  • ids – The entity identifiers to select.

  • numbers – The entity numbers to select.

Returns:

The entity subset.

__len__()#