k-NN Collaborative Filtering¶
LKPY provides user- and item-based classical k-NN collaborative Filtering implementations. These lightly-configurable implementations are intended to capture the behavior of the Java-based LensKit implementations to provide a good upgrade path and enable basic experiments out of the box.
Item-based k-NN¶
-
class
lenskit.algorithms.item_knn.ItemItem(nnbrs, min_nbrs=1, min_sim=1e-06, save_nbrs=None, center=True, aggregate='weighted-average')¶ Bases:
lenskit.algorithms.PredictorItem-item nearest-neighbor collaborative filtering with ratings. This item-item implementation is not terribly configurable; it hard-codes design decisions found to work well in the previous Java-based LensKit code.
- Parameters
nnbrs (int) – the maximum number of neighbors for scoring each item (
Nonefor unlimited)min_nbrs (int) – the minimum number of neighbors for scoring each item
min_sim (double) – minimum similarity threshold for considering a neighbor
save_nbrs (double) – the number of neighbors to save per item in the trained model (
Nonefor unlimited)center (bool) – whether to normalize (mean-center) rating vectors. Turn this off when working with unary data and other data types that don’t respond well to centering.
aggregate – the type of aggregation to do. Can be
weighted-averageorsum.
-
item_index_¶ the index of item IDs.
- Type
-
item_means_¶ the mean rating for each known item.
- Type
-
item_counts_¶ the number of saved neighbors for each item.
- Type
-
sim_matrix_¶ the similarity matrix.
- Type
-
user_index_¶ the index of known user IDs for the rating matrix.
- Type
-
rating_matrix_¶ the user-item rating matrix for looking up users’ ratings.
- Type
-
fit(ratings, **kwargs)¶ Train a model.
The model-training process depends on
save_nbrsandmin_sim, but not on other algorithm parameters.- Parameters
ratings (pandas.DataFrame) – (user,item,rating) data for computing item similarities.
-
predict_for_user(user, items, ratings=None)¶ Compute predictions for a user and items.
- Parameters
user – the user ID
items (array-like) – the items to predict
ratings (pandas.Series) – the user’s ratings (indexed by item id); if provided, they may be used to override or augment the model’s notion of a user’s preferences.
- Returns
scores for the items, indexed by item id.
- Return type
User-based k-NN¶
-
class
lenskit.algorithms.user_knn.UserUser(nnbrs, min_nbrs=1, min_sim=0, center=True, aggregate='weighted-average')¶ Bases:
lenskit.algorithms.PredictorUser-user nearest-neighbor collaborative filtering with ratings. This user-user implementation is not terribly configurable; it hard-codes design decisions found to work well in the previous Java-based LensKit code.
- Parameters
nnbrs (int) – the maximum number of neighbors for scoring each item (
Nonefor unlimited)min_nbrs (int) – the minimum number of neighbors for scoring each item
min_sim (double) – minimum similarity threshold for considering a neighbor
center (bool) – whether to normalize (mean-center) rating vectors. Turn this off when working with unary data and other data types that don’t respond well to centering.
aggregate – the type of aggregation to do. Can be
weighted-averageorsum.
-
user_index_¶ User index.
- Type
-
item_index_¶ Item index.
- Type
-
user_means_¶ User mean ratings.
- Type
-
rating_matrix_¶ Normalized user-item rating matrix.
- Type
-
transpose_matrix_¶ Transposed un-normalized rating matrix.
- Type
-
fit(ratings, **kwargs)¶ “Train” a user-user CF model. This memorizes the rating data in a format that is usable for future computations.
- Parameters
ratings (pandas.DataFrame) – (user, item, rating) data for collaborative filtering.
- Returns
a memorized model for efficient user-based CF computation.
- Return type
UUModel
-
predict_for_user(user, items, ratings=None)¶ Compute predictions for a user and items.
- Parameters
user – the user ID
items (array-like) – the items to predict
ratings (pandas.Series) – the user’s ratings (indexed by item id); if provided, will be used to recompute the user’s bias at prediction time.
- Returns
scores for the items, indexed by item id.
- Return type