lenskit.data.schema#
Pydantic models for LensKit data schemas. These models define define the data schema in memory and also define how schemas are serialized to and from configuration files. See Data Model for details.
Note
The schema does not specify data types directly — data types are inferred from the underlying Arrow data structures. This reduces duplication of type information and the opportunity for inconsistency.
Functions
| 
 | Check if a name is valid. | 
| 
 | |
| 
 | 
Classes
| 
 | Stores both whether a feature is allowed and is used in the case of having repeats. | 
| 
 | |
| 
 | |
| 
 | Description of the entities and layout of a dataset. | 
| 
 | Entity class definitions in the dataset schema. | 
| 
 | Relationship class definitions in the dataset schema. | 
- lenskit.data.schema.check_name(name)#
- Check if a name is valid. - Raises:
- ValueError – when the name is invalid. 
- Parameters:
- name (str) 
- Return type:
- None 
 
- class lenskit.data.schema.AllowableTroolean(*values)#
- Bases: - Enum- Stores both whether a feature is allowed and is used in the case of having repeats. For convenience, in serialized data or configuration files these values may be specified either as strings or as booleans, in which case - Falseis- FORBIDDENand- Trueis- ALLOWED. They are always serialized as strings.- FORBIDDEN = 'forbidden'#
- The feature is forbidden. 
 - ALLOWED = 'allowed'#
- The feature is allowed, but no records using it are present. 
 - PRESENT = 'present'#
- The feature is used by instances in the data. 
 
- class lenskit.data.schema.AttrLayout(*values)#
- Bases: - Enum- SCALAR = 'scalar'#
- Scalar (non-list, non-vector) attribute value. 
 - LIST = 'list'#
- Homogenous, variable-length list of attribute values. 
 - VECTOR = 'vector'#
- Homogenous, fixed-length vector of numeric attribute values. 
 - SPARSE = 'sparse'#
- Homogenous, fixed-length sparse vector of numeric attribute values. 
 
- class lenskit.data.schema.DataSchema(**data)#
- Bases: - BaseModel- Description of the entities and layout of a dataset. - Parameters:
- data (Any) 
- version (str) 
- name (str | None) 
- default_interaction (Annotated[str, StringConstraints(strip_whitespace=None, to_upper=None, to_lower=None, strict=None, min_length=None, max_length=None, pattern=re.compile('^[\\w_]+$'))] | None) 
- entities (dict[Annotated[str, StringConstraints(strip_whitespace=None, to_upper=None, to_lower=None, strict=None, min_length=None, max_length=None, pattern=re.compile('^[\\w_]+$'))], EntitySchema]) 
- relationships (dict[Annotated[str, StringConstraints(strip_whitespace=None, to_upper=None, to_lower=None, strict=None, min_length=None, max_length=None, pattern=re.compile('^[\\w_]+$'))], RelationshipSchema]) 
 
 - version: str#
- The data layout version. - Note - When a new schema model is created, this defaults to the current version instead of the oldest version. 
 - entities: dict[Name, EntitySchema]#
- Entity classes defined for this dataset. 
 - relationships: dict[Name, RelationshipSchema]#
- Relationship classes defined for this dataset. 
 - classmethod model_validate_json(json_data, *, context=None, **kwargs)#
- !!! abstract “Usage Documentation”
- [JSON Parsing](../concepts/json.md#json-parsing) 
 - Validate the given JSON data against the Pydantic model. - Parameters:
- json_data (str | bytes | bytearray) – The JSON data to validate. 
- strict – Whether to enforce types strictly. 
- context (Any) – Extra variables to pass to the validator. 
- by_alias – Whether to use the field’s alias when validating against the provided input data. 
- by_name – Whether to use the field’s name when validating against the provided input data. 
- kwargs (Any) 
 
- Returns:
- The validated Pydantic model. 
- Raises:
- ValidationError – If json_data is not a JSON string or the object could not be validated. 
 
 - model_config: ClassVar[ConfigDict] = {}#
- Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. 
 
- class lenskit.data.schema.EntitySchema(**data)#
- Bases: - BaseModel- Entity class definitions in the dataset schema. - Parameters:
- data (Any) 
- id_type (Literal['int', 'str'] | None) 
- attributes (dict[Annotated[str, StringConstraints(strip_whitespace=None, to_upper=None, to_lower=None, strict=None, min_length=None, max_length=None, pattern=re.compile('^[\\w_]+$'))], ColumnSpec]) 
 
 - model_config: ClassVar[ConfigDict] = {}#
- Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. 
 - attributes: dict[Name, ColumnSpec]#
- Entity attribute definitions. 
 
- class lenskit.data.schema.RelationshipSchema(**data)#
- Bases: - BaseModel- Relationship class definitions in the dataset schema. - Parameters:
- data (Any) 
- entities (dict[Annotated[str, StringConstraints(strip_whitespace=None, to_upper=None, to_lower=None, strict=None, min_length=None, max_length=None, pattern=re.compile('^[\\w_]+$'))], Annotated[str, StringConstraints(strip_whitespace=None, to_upper=None, to_lower=None, strict=None, min_length=None, max_length=None, pattern=re.compile('^[\\w_]+$'))] | None]) 
- interaction (bool) 
- repeats (AllowableTroolean) 
- attributes (dict[Annotated[str, StringConstraints(strip_whitespace=None, to_upper=None, to_lower=None, strict=None, min_length=None, max_length=None, pattern=re.compile('^[\\w_]+$'))], ColumnSpec]) 
 
 - model_config: ClassVar[ConfigDict] = {}#
- Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. 
 - entities: dict[Name, Name | None]#
- Define the entity classes participating in the relationship. For aliased entity classes (necessary for self-relationships), the key is the alias, and the value is the original entity class name. 
 - repeats: AllowableTroolean#
- Whether this relationship supports repeated interactions. 
 - attributes: dict[Name, ColumnSpec]#
- Relationship attribute definitions. 
 
- class lenskit.data.schema.ColumnSpec(*, layout=AttrLayout.SCALAR, vector_size=None)#
- Bases: - BaseModel- Parameters:
- layout (AttrLayout) 
- vector_size (int | None) 
 
 - model_config: ClassVar[ConfigDict] = {}#
- Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. 
 - layout: AttrLayout#
- The attribute layout (whether and how multiple values are supported). 
 
