lenskit.knn#

k-NN recommender models.

class lenskit.knn.EASEConfig(*, regularization=1)#

Bases: BaseModel

Configuration for EASEScorer.

Parameters:: regularization (Annotated[float, Gt(gt=0)])

regularization: Annotated[float, Gt(gt=0)]#: Regularization term for EASE.

model_config = {}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class lenskit.knn.EASEScorer(config=None, **kwargs)#

Bases: Component[ItemList, …], Trainable

Embarrassingly shallow autoencoder [Ste19].

In addition to its configuation, this component also uses a training environment variable:

Parameters:

config (EASEConfig)
kwargs (Any)

LK_EASE_SOLVER#

Specify the solver to use to invert the Gram-matrix for EASE. Can be either "torch" (works on both CPU and CUDA, and is faster on CPU than SciPy) or "scipy" (uses LAPACK, and may take less memory).

The default behavior is to first try to allocate enough memory to train with PyTorch, and to fall back to SciPy with in-place solving if the Torch allocation fails.

Note

This component requires SciPy 1.17 or later.

items: Vocabulary#: Items known at training time.

weights: ndarray[tuple[int, int], dtype[float32]]#: Item interpolation weight matrix.

is_trained()#: Query if this component has already been trained.

train(data, options=None)#

Train the model to learn its parameters from a training dataset.

Parameters:

data (Dataset) – The training dataset.
options (TrainingOptions | None) – The training options.

class lenskit.knn.ItemKNNConfig(*, max_nbrs=20, min_nbrs=1, min_sim=1e-06, save_nbrs=None, feedback='explicit', block_size=250)#

Bases: BaseModel

Configuration for ItemKNNScorer.

Parameters:

max_nbrs (Annotated[int, Gt(gt=0)])
min_nbrs (Annotated[int, Gt(gt=0)])
min_sim (Annotated[float, Gt(gt=0)])
save_nbrs (Annotated[int, Gt(gt=0)] | None)
feedback (Literal['explicit', 'implicit'])
block_size (int)

max_nbrs: PositiveInt#: The maximum number of neighbors for scoring each item.

min_nbrs: PositiveInt#: The minimum number of neighbors for scoring each item.

min_sim: PositiveFloat#: Minimum similarity threshold for considering a neighbor. Must be positive; if less than the smallest 32-bit normal (\(1.175 \times 10^{-38}\)), is clamped to that value.

save_nbrs: PositiveInt | None#: The number of neighbors to save per item in the trained model (None for unlimited).

feedback: FeedbackType#: The type of input data to use (explicit or implicit). This affects data pre-processing and aggregation.

block_size: int#: The block size for computing item similarity blocks in parallel. Only affects performance, not behavior.

property explicit: bool#: Query whether this is in explicit-feedback mode.

model_config = {'extra': 'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class lenskit.knn.ItemKNNScorer(config=None, **kwargs)#

Bases: Component[ItemList, …], Trainable

Item-item nearest-neighbor collaborative filtering feedback. This item-item implementation is based on the description of item-based CF by Deshpande and Karypis [DK04] and hard-codes several design decisions found to work well in the previous Java-based LensKit code [ELKR11]. In explicit-feedback mode, its output is equivalent to that of the Java version.

Note

This component must be used with queries containing the user’s history, either directly in the input or by wiring its query input to the output of a user history component (e.g., UserTrainingHistoryLookup).

Stability:

Caller (see Stability Levels).

Parameters:

config (ItemKNNConfig)
kwargs (Any)

items: Vocabulary#: Vocabulary of item IDs.

item_means: ndarray[tuple[int], dtype[float32]] | None#: Mean rating for each known item.

item_counts: ndarray[tuple[int], dtype[int32]]#: Number of saved neighbors for each item.

sim_matrix: SparseRowArray#: Similarity matrix (sparse CSR tensor).

is_trained()#: Query if this component has already been trained.

train(data, options=TrainingOptions(retrain=True, device=None, rng=None, environment={}, torch_profiler=None))#

Train a model.

The model-training process depends on save_nbrs and min_sim, but not on other algorithm parameters.

Parameters:

ratings – (user,item,rating) data for computing item similarities.
data (Dataset)
options (TrainingOptions)

class lenskit.knn.UserKNNConfig(*, max_nbrs=20, min_nbrs=1, min_sim=1e-06, feedback='explicit')#

Bases: BaseModel

Configuration for ItemKNNScorer.

Parameters:

max_nbrs (Annotated[int, Gt(gt=0)])
min_nbrs (Annotated[int, Gt(gt=0)])
min_sim (Annotated[float, Gt(gt=0)])
feedback (Literal['explicit', 'implicit'])

max_nbrs: PositiveInt#: The maximum number of neighbors for scoring each item.

min_nbrs: PositiveInt#: The minimum number of neighbors for scoring each item.

min_sim: PositiveFloat#: Minimum similarity threshold for considering a neighbor. Must be positive; if less than the smallest 32-bit normal (\(1.175 \times 10^{-38}\)), is clamped to that value.

feedback: FeedbackType#: The type of input data to use (explicit or implicit). This affects data pre-processing and aggregation.

property explicit: bool#: Query whether this is in explicit-feedback mode.

model_config = {'extra': 'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class lenskit.knn.UserKNNScorer(config=None, **kwargs)#

Bases: Component[ItemList, …], Trainable

User-user nearest-neighbor collaborative filtering with ratings. This user-user implementation is not terribly configurable; it hard-codes design decisions found to work well in the previous Java-based LensKit code.

Note

Stability:

Caller (see Stability Levels).

Parameters:

config (UserKNNConfig)
kwargs (Any)

users: Vocabulary#: The index of user IDs.

items: Vocabulary#: The index of item IDs.

user_means: ndarray[tuple[int], dtype[float32]] | None#: Mean rating for each known user.

user_vectors: csr_array#: Normalized rating matrix (CSR) to find neighbors at prediction time.

user_ratings: SparseRowArray#: Centered but un-normalized rating matrix (COO) to find neighbor ratings.

is_trained()#: Query if this component has already been trained.

train(data, options=TrainingOptions(retrain=True, device=None, rng=None, environment={}, torch_profiler=None))#

“Train” a user-user CF model. This memorizes the rating data in a format that is usable for future computations.

Parameters:

ratings (pandas.DataFrame) – (user, item, rating) data for collaborative filtering.
data (Dataset)
options (TrainingOptions)

Modules

`ease`	EASE scoring model.
`item`	Item-based k-NN collaborative filtering.
`user`	User-based k-NN collaborative filtering.

lenskit.knn#

This Page