lenskit.data.RelationshipSet#

class lenskit.data.RelationshipSet(name, vocabularies, schema, table)#

Bases: object

Representation for a set of relationship records. This is the class for accessing general relationships, with arbitrarily many entity classes involved and repeated relationships allowed.

For two-entity relationships without duplicates (including relationships formed by coalescing repeated relationships or interactions), MatrixRelationshipSet extends this with additional capabilities.

Relationship sets can be pickled or serialized, and will not save the entire dataset with them. They are therefore safe to save as component elements during training processes.

Note

Client code does not need to construct this class; obtain instances from a dataset’s relationships() or interactions() method.

Stability:
Caller (see Stability Levels).
Parameters:
__init__(name, vocabularies, schema, table)#
Parameters:

Methods

__init__(name, vocabularies, schema, table)

arrow(*[, attributes, ids])

Get these relationships and their attributes as a PyArrow table.

co_occurrences(entity, *[, group, order])

Count co-occurrences of the specified entity.

count()

item_lists()

Get a view of this relationship set as an item list collection.

matrix(*[, row_entity, col_entity])

Convert this relationship set into a matrix, coalescing duplicate observations.

pandas(*[, attributes, ids])

Get these relationship and their attributes as a PyArrow table.

Attributes

attribute_names

entities

is_interaction

Query whether these relationships represent interactions.

name

The name of the relationship class for these relationships.

schema

name: str#

The name of the relationship class for these relationships.

property is_interaction: bool#

Query whether these relationships represent interactions.

item_lists()#

Get a view of this relationship set as an item list collection.

Currently only implemented for MatrixRelationshipSet, call matrix() first.

Return type:

ItemListCollection

co_occurrences(entity, *, group=None, order=None)#

Count co-occurrences of the specified entity. This is useful for counting item co-occurrences for association rules and probabilties, but also has other uses as well.

This method supports both ordered and unordered co-occurrences. Unordered co-occurrences just count the number of times the two items appear together, and the resulting matrix is symmetric.

For ordered co-occurrences, the interactions are ordered by the attribute specified by order, and the resulting matrix M may not be symmetric. M[i,j] counts the number of times item j has appeared after item i. The order does not need to be global — an attribute recording order within a group is sufficient.

If group is specified, it controls the grouping for counting co-occurrences. For example, if a relationship connects the user, session, and item classes, then:

  • rs.co_occurrances("item") counts the number of times each pair of items appear together in a session.

  • rs.co_occurrances("item", group="user") counts the number of times each pair of items were interacted with by the same user, regardless of session.

Parameters:
  • entity (str) – The name of the entity to count.

  • group (str | list[str] | None) – The names of grouping entity classes for counting co-occurrences. The default is to use all entities that are not being counted.

  • order (str | None) – The name of an attribute to use for ordering interactions to compute sequential co-occurrences.

Returns:

A sparse matrix with the co-occurrence counts.

Return type:

coo_array

arrow(*, attributes=None, ids=False)#

Get these relationships and their attributes as a PyArrow table.

Parameters:
  • attributes (str | list[str] | None) – The attributes to select.

  • ids – If True, include ID columns for the entities, instead of just the number columns.

Return type:

Table

pandas(*, attributes=None, ids=False)#

Get these relationship and their attributes as a PyArrow table.

Parameters:
  • attributes (str | list[str] | None) – The attributes to include in the resulting table.

  • ids – If True, include ID columns for the entities, instead of just the number columns.

Return type:

DataFrame

matrix(*, row_entity=None, col_entity=None)#

Convert this relationship set into a matrix, coalescing duplicate observations.

Changed in version 2025.6: Removed the fixed defaults for row_entity and col_entity.

Parameters:
  • row_entity (str | None) – The specified row entity of the matrix. Defaults to the first entity in the relationship’s list of involved entities.

  • col_entity (str | None) – The specified column entity of the matrix. Defaults to the last entity in the relationship’s list of involved entities.

Return type:

MatrixRelationshipSet