Batch-Running Pipelines#

Offline recommendation experiments require batch-running a pipeline over a set of test users, sessions, or other recommendation requests. LensKit supports this through the facilities in the lenskit.batch module.

By default, the batch facilities operate in parallel over the test users; this can be controlled by environment variables (see Configuring Parallelism) or through an n_jobs keyword argument to the various functions and classes.

Import Protection

Scripts using batch pipeline operations must be protected; see parallel-protecting.

Simple Runs#

If you have a pipeline and want to simply generate recommendations for a batch of test users, you can do this with the recommend() function.

For an example, let’s start with importing things to run a quick batch:

>>> from lenskit.basic import PopScorer
>>> from lenskit.pipeline import topn_pipeline
>>> from lenskit.batch import recommend
>>> from lenskit.data import load_movielens
>>> from lenskit.splitting import sample_users, SampleN
>>> from lenskit.metrics import RunAnalysis, RBP

Load and split some data:

>>> data = load_movielens('data/ml-100k.zip')
>>> split = sample_users(data, 150, SampleN(5, rng=1024), rng=42)

Configure and train the model:

>>> model = PopScorer()
>>> pop_pipe = topn_pipeline(model, n=20)
>>> pop_pipe.train(split.train)

Generate recommendations:

>>> recs = recommend(pop_pipe, split.test.keys(), n_jobs=1)
>>> recs.to_df()
          user_id  item_id     score  rank
0 ...                                    1
...
[3000 rows x 4 columns]

And measure their results:

>>> ra = RunAnalysis()
>>> ra.add_metric(RBP())
>>> scores = ra.measure(recs, split.test)
>>> scores.list_summary()
          mean    median     std
metric
RBP    0.06...   0.02... 0.07...

The predict() function works similarly, but for rating predictions. Instead of a simple list of user IDs, it takes a dictionary mapping user IDs to lists of test items (as ItemList).

General Batch Pipeline Runs#

The recommend() and predict() functions are convenience wrappers around a more general facility, the BatchPipelineRunner.

Batch Queries#

The batch inference functions and methods (recommend(), run(), etc.) accept multiple types of input to specify the set of users or test items.

  • An iterable (e.g. list) of recommendation queries (as RecQuery objects). The queries must have at least one of RecQuery.query_id and RecQuery.user_id set, so that the output can be properly indexed. Queries should all have the identification method (i.e., all queries have a query_id, or all queries have only a user_id).

  • An iterable of 2-element (query, items) tuples. The query is a RecQuery as in the previous method, and the items is an ItemList containing the candidate items (for recommendation) or the items to score (for prediction and scoring). This is the most general form of input.

  • An iterable (e.g. list) of user IDs. These are passed as RecQuery.user_id, and the resulting outputs are mapped to ID.

  • An ItemListCollection. At least one field of the collection key should be user_id, and these user IDs are used as the query user IDs. The item lists themselves are used as in the tuple method above. Results are indexed by the entire key.

  • A mapping (dictionary) of IDs to item lists. This behaves like the item list collection; the IDs are taken to be user IDs.

  • A pandas.DataFrame, which is converted to an item list collection.

Deprecated since version 2025.6: Mappings and data frames are deprecated in favor of other input types.