lenskit.data.RelationshipSet#
- class lenskit.data.RelationshipSet(name, vocabularies, schema, table)#
Bases:
objectRepresentation for a set of relationship records. This is the class for accessing general relationships, with arbitrarily many entity classes involved and repeated relationships allowed.
For two-entity relationships without duplicates (including relationships formed by coalescing repeated relationships or interactions),
MatrixRelationshipSetextends this with additional capabilities.Relationship sets can be pickled or serialized, and will not save the entire dataset with them. They are therefore safe to save as component elements during training processes.
Note
Client code does not need to construct this class; obtain instances from a dataset’s
relationships()orinteractions()method.- Stability:
- Caller (see Stability Levels).
- Parameters:
name (str)
vocabularies (dict[str, Vocabulary])
schema (RelationshipSchema)
table (pa.Table)
- __init__(name, vocabularies, schema, table)#
- Parameters:
name (str)
vocabularies (dict[str, Vocabulary])
schema (RelationshipSchema)
table (Table)
Methods
__init__(name, vocabularies, schema, table)arrow(*[, attributes, ids])Get these relationships and their attributes as a PyArrow table.
co_occurrences(entity, *[, group, order])Count co-occurrences of the specified entity.
count()Get a view of this relationship set as an item list collection.
matrix(*[, row_entity, col_entity])Convert this relationship set into a matrix, coalescing duplicate observations.
pandas(*[, attributes, ids])Get these relationship and their attributes as a PyArrow table.
Attributes
attribute_namesentitiesQuery whether these relationships represent interactions.
The name of the relationship class for these relationships.
schema- item_lists()#
Get a view of this relationship set as an item list collection.
Currently only implemented for
MatrixRelationshipSet, callmatrix()first.- Return type:
- co_occurrences(entity, *, group=None, order=None)#
Count co-occurrences of the specified entity. This is useful for counting item co-occurrences for association rules and probabilties, but also has other uses as well.
This method supports both ordered and unordered co-occurrences. Unordered co-occurrences just count the number of times the two items appear together, and the resulting matrix is symmetric.
For ordered co-occurrences, the interactions are ordered by the attribute specified by
order, and the resulting matrixMmay not be symmetric.M[i,j]counts the number of times itemjhas appeared after itemi. The order does not need to be global — an attribute recording order within a group is sufficient.If
groupis specified, it controls the grouping for counting co-occurrences. For example, if a relationship connects theuser,session, anditemclasses, then:rs.co_occurrances("item")counts the number of times each pair of items appear together in a session.rs.co_occurrances("item", group="user")counts the number of times each pair of items were interacted with by the same user, regardless of session.
- Parameters:
entity (str) – The name of the entity to count.
group (str | list[str] | None) – The names of grouping entity classes for counting co-occurrences. The default is to use all entities that are not being counted.
order (str | None) – The name of an attribute to use for ordering interactions to compute sequential co-occurrences.
- Returns:
A sparse matrix with the co-occurrence counts.
- Return type:
- arrow(*, attributes=None, ids=False)#
Get these relationships and their attributes as a PyArrow table.
- pandas(*, attributes=None, ids=False)#
Get these relationship and their attributes as a PyArrow table.
- matrix(*, row_entity=None, col_entity=None)#
Convert this relationship set into a matrix, coalescing duplicate observations.
Changed in version 2025.6: Removed the fixed defaults for
row_entityandcol_entity.- Parameters:
- Return type: