lenskit.data.EntityAttribute#

class lenskit.data.EntityAttribute(name, spec, table, vocab, rows)#

Bases: abc.ABC

Base class for an attribute associated with an entity class. This class effectively represents a _column_ of a data table of entities: the attribute values for one or more entities. In that regard, it is similar to a Pandas series: it records entity IDs/numbers, like an index, and associated attribute values.

This is the general interface for all entity attributes. Not all access methods are supported for all layouts.

Stability:
Caller (see Stability Levels).
Parameters:
name: str#

The name of the attribute.

layout: lenskit.data.schema.AttrLayout#

The attribute layout.

property data_type: pyarrow.DataType#
Abstractmethod:

Return type:

pyarrow.DataType

Get the data type of this attribute set.

ids()#

Get the entity IDs for this collection of entities.

Return type:

lenskit.data.types.IDArray

id_index()#

Get the entity IDs as a Pandas index.

Return type:

pandas.Index

numbers()#

Get the entity numbers for the attributes

Return type:

numpy.ndarray[tuple[int], numpy.dtype[numpy.int32]]

abstractmethod cat_matrix(*, normalize=None)#

Compute a categorical matrix representation of the attribute.

Parameters:

normalize (Literal['unit', 'distribution'] | None) – Optional normalization method. “unit”: Normalize each row to unit length. “distribution”: Normalize each row so elements sum to 1

Returns:

A tuple containing:

matrix (numpy.ndarray or scipy.sparse.csr_array): The categorical matrix. vocab (Vocabulary or None): The vocabulary associated with the categories.

Return type:

tuple

property dim_names: list[str] | None#

Get the names attached to this attribute’s dimensions.

Note

Only applicable to vector and sparse attributes.

Return type:

list[str] | None

property is_scalar: bool#

Query whether this attribute is scalar.

Return type:

bool

property is_list: bool#

Query whether this attribute is a list.

Return type:

bool

property is_vector: bool#

Query whether this attribute is a dense vector.

Return type:

bool

property is_sparse: bool#

Query whether this attribute is a sparse vector.

Return type:

bool

abstractmethod pandas(*, missing='null')#
Parameters:

missing (Literal['null', 'omit'])

Return type:

pandas.Series | pandas.DataFrame

numpy()#

Get the attribute values as a NumPy array.

Note

Undefined attribute values may have undefined contents; they will _usually_ be NaN or similar, but this is not fully guaranteed.

Return type:

numpy.typing.NDArray[Any]

arrow()#

Get the attribute values as an Arrow array.

Return type:

pyarrow.Array[Any] | pyarrow.ChunkedArray[Any]

scipy()#

Get this attribute as a SciPy sparse array (if it is sparse), or a NumPy array if it is dense.

Return type:

numpy.typing.NDArray[Any] | scipy.sparse.csr_array

torch()#
Return type:

torch.Tensor

drop_null()#

Subset this attribute set to only entities for which it is defined.

__len__()#