Evaluating Top-N Rankings#

The lenskit.metrics.ranking module contains the core top-N ranking accuracy metrics (including rank-oblivious list metrics like precision, recall, and hit rate).

Ranking metrics extend the RankingMetricBase base class, often in addition to ListMetric, and return a score given a recommendation list and a test rating list, both as item lists; most metrics require the recommendation item list to be ordered.

All LensKit ranking metrics take n as a constructor argument to control the list of the length that is considered; this allows multiple measurements (e.g. HR@5 and HR@10) to be computed from a single set of rankings.

Metrics can be used on their own, but it is usually easiest to use them with MeasurementCollector to handle some of edge cases around data availability, etc., as well as to support metric-specific aggregation (see Collecting and Aggregating Metrics for more details).

Changed in version 2026.1: The argument for the list length has changed from k to n, for consistency across LensKit. k is kept as a deprecated alias until 2027.1.

Changed in version 2025.1: The top-N accuracy metric interface has changed to use item lists, and to be simpler to implement.

Included Effectiveness Metrics#

List and Set Metrics#

These metrics just look at the recommendation list and do not consider the rank positions of items within it.

Hit

Compute whether or not a list is a hit; any list with at least one

Precision

Compute recommendation precision. This is computed as:

Recall

Compute recommendation recall. This is computed as:

Ranked List Metrics#

These metrics treat the recommendation list as a ranked list of items that may or may not be relevant; some also support different item utilities (e.g. ratings or graded relevance scores).

RecipRank

Compute the reciprocal rank [KV97] of the first relevant

RBP

Evaluate recommendations with rank-biased precision [MZ08].

NDCG

Compute the normalized discounted cumulative gain [JarvelinKekalainen02].

DCG

Compute the _unnormalized_ discounted cumulative gain [JarvelinKekalainen02].

Beyond Accuracy#

These metrics measure non-accuracy properties of recommendation lists, such as popularity/obscurity or diversity.

MeanPopRank

Compute the _obscurity_ (mean popularity rank) of the recommendations.