lenskit.data.schema#

Pydantic models for LensKit data schemas. These models define define the data schema in memory and also define how schemas are serialized to and from configuration files. See Data Model for details.

Note

The schema does not specify data types directly — data types are inferred from the underlying Arrow data structures. This reduces duplication of type information and the opportunity for inconsistency.

Attributes#

Classes#

AllowableTroolean

Stores both whether a feature is allowed and is used in the case of having repeats.

AttrLayout

Possible layouts for entity attributes.

DataSchema

Description of the entities and layout of a dataset.

EntitySchema

Entity class definitions in the dataset schema.

RelationshipSchema

Relationship class definitions in the dataset schema.

ColumnSpec

Functions#

id_col_name(name)

num_col_name(name)

check_name(name)

Check if a name is valid.

Module Contents#

lenskit.data.schema.CURRENT_VERSION = '2025.3'#
lenskit.data.schema.OLDEST_VERSION = '2025.1'#
lenskit.data.schema.LOAD_CONTEXT#
lenskit.data.schema.NAME_PATTERN#
type lenskit.data.schema.Name = Annotated[str, StringConstraints(pattern=NAME_PATTERN)]#
lenskit.data.schema.id_col_name(name)#
Parameters:

name (str)

Return type:

str

lenskit.data.schema.num_col_name(name)#
Parameters:

name (str)

Return type:

str

lenskit.data.schema.check_name(name)#

Check if a name is valid.

Raises:

ValueError – when the name is invalid.

Parameters:

name (str)

Return type:

None

class lenskit.data.schema.AllowableTroolean#

Bases: enum.Enum

Stores both whether a feature is allowed and is used in the case of having repeats. For convenience, in serialized data or configuration files these values may be specified either as strings or as booleans, in which case False is FORBIDDEN and True is ALLOWED. They are always serialized as strings.

FORBIDDEN = 'forbidden'#

The feature is forbidden.

ALLOWED = 'allowed'#

The feature is allowed, but no records using it are present.

PRESENT = 'present'#

The feature is used by instances in the data.

property is_allowed: bool#

Query whether the feature is allowed.

Return type:

bool

property is_forbidden: bool#

Query whether the feature is forbidden.

Return type:

bool

property is_present: bool#

Query whether the feature is present (used in recorded instances).

Return type:

bool

class lenskit.data.schema.AttrLayout#

Bases: enum.Enum

Possible layouts for entity attributes.

SCALAR = 'scalar'#

Scalar (non-list, non-vector) attribute value.

LIST = 'list'#

Homogenous, variable-length list of attribute values.

VECTOR = 'vector'#

Homogenous, fixed-length vector of numeric attribute values.

SPARSE = 'sparse'#

Homogenous, fixed-length sparse vector of numeric attribute values.

class lenskit.data.schema.DataSchema#

Bases: pydantic.BaseModel

Description of the entities and layout of a dataset.

version: str = '2025.3'#

The data layout version.

Note

When a new schema model is created, this defaults to the current version instead of the oldest version.

name: str | None = None#

The dataset name.

default_interaction: Name | None = None#

The default interaction type.

entities: dict[Name, EntitySchema]#

Entity classes defined for this dataset.

relationships: dict[Name, RelationshipSchema]#

Relationship classes defined for this dataset.

classmethod model_validate_json(json_data, *, context=None, **kwargs)#
Parameters:
class lenskit.data.schema.EntitySchema#

Bases: pydantic.BaseModel

Entity class definitions in the dataset schema.

id_type: Literal['int', 'str'] | None = None#

The data type for identifiers in this entity class.

attributes: dict[Name, ColumnSpec]#

Entity attribute definitions.

class lenskit.data.schema.RelationshipSchema#

Bases: pydantic.BaseModel

Relationship class definitions in the dataset schema.

entities: dict[Name, Name | None]#

Define the entity classes participating in the relationship. For aliased entity classes (necessary for self-relationships), the key is the alias, and the value is the original entity class name.

interaction: bool = False#

Whether this relationship class records interactions.

repeats: AllowableTroolean#

Whether this relationship supports repeated interactions.

attributes: dict[Name, ColumnSpec]#

Relationship attribute definitions.

property entity_class_names: list[str]#
Return type:

list[str]

class lenskit.data.schema.ColumnSpec#

Bases: pydantic.BaseModel

layout: AttrLayout#

The attribute layout (whether and how multiple values are supported).

vector_size: int | None = None#

The dimensionality of the vector, for sparse and vector columns.