lenskit.metrics.ranking#
LensKit ranking (and list) metrics.
Classes#
Base class for most ranking metrics, implementing an |
|
Compute the _unnormalized_ discounted cumulative gain [JarvelinKekalainen02]. |
|
Compute the normalized discounted cumulative gain [JarvelinKekalainen02]. |
|
Evaluate diversity using Shannon entropy over item categories. |
|
Evaluate diversity using rank-biased Shannon entropy over item categories. |
|
Measure exposure distribution of recommendations with the Gini coefficient. |
|
Measure item diversity of recommendations with the Gini coefficient. |
|
Compute whether or not a list is a hit; any list with at least one |
|
Evaluate recommendation diversity using intra-list similarity (ILS). |
|
Compute Average Precision (AP) for a single user's recommendations. This is |
|
Compute the _obscurity_ (mean popularity rank) of the recommendations. |
|
Compute recommendation precision. This is computed as: |
|
Compute recommendation recall. This is computed as: |
|
Evaluate recommendations with rank-biased precision [MZ08]. |
|
Compute the reciprocal rank [KV97] of the first relevant |
|
Geometric cascade weighting for result ranks. |
|
Logarithmic weighting for result ranks, as used in NDCG. |
|
Base class for rank weighting models. |
Functions#
|
Compute rank-biased precision given explicit weights. |
Package Contents#
- class lenskit.metrics.ranking.RankingMetricBase(n=None, *, k=None)#
Bases:
lenskit.metrics._base.MetricBase class for most ranking metrics, implementing an
nparameter for truncation.- Parameters:
- Stability:
- Caller (see Stability Levels).
- property k#
- property label#
Default name — class name, optionally @N.
- truncate(items)#
Truncate an item list if it is longer than
n.- Parameters:
items (lenskit.data.ItemList)
- class lenskit.metrics.ranking.DCG(n=None, *, k=None, weight=LogRankWeight(), discount=None, gain=None)#
Bases:
lenskit.metrics.ranking._base.ListMetric,lenskit.metrics.ranking._base.RankingMetricBaseCompute the _unnormalized_ discounted cumulative gain [JarvelinKekalainen02].
Discounted cumultative gain is computed as:
\[\begin{align*} \mathrm{DCG}(L,u) & = \sum_{i=1}^{|L|} \frac{r_{ui}}{d(i)} \end{align*}\]Unrated items are assumed to have a utility of 0; if no rating values are provided in the truth frame, item ratings are assumed to be 1.
This metric does not normalize by ideal DCG. For that, use
NDCG. See Jeunen et al. [JPU24] for an argument for using the unnormalized version.- Parameters:
n (int | None) – The maximum recommendation list length to consider (longer lists are truncated).
discount (Discount | None) – The discount function to use. The default, base-2 logarithm, is the original function used by Järvelin and Kekäläinen [JarvelinKekalainen02].
gain (str | None) – The field on the test data to use for gain values. If
None(the default), all items present in the test data have a gain of 1. If set to a string, it is the name of a field (e.g.'rating'). In all cases, items not present in the truth data have a gain of 0.k (int | None)
- Stability:
- Caller (see Stability Levels).
- property label#
The metric’s default label in output. The base implementation returns the class name by default.
- measure_list(recs, test)#
Compute measurements for a single list.
- Returns:
A float for simple metrics
Intermediate data for decomposed metrics
A dict mapping metric names to values for multi-metric classes
- Parameters:
recs (lenskit.data.ItemList)
test (lenskit.data.ItemList)
- Return type:
- class lenskit.metrics.ranking.NDCG(n=None, *, k=None, weight=LogRankWeight(), discount=None, gain=None)#
Bases:
lenskit.metrics.ranking._base.ListMetric,lenskit.metrics.ranking._base.RankingMetricBaseCompute the normalized discounted cumulative gain [JarvelinKekalainen02].
Discounted cumultative gain is computed as:
\[\begin{align*} \mathrm{DCG}(L,u) & = \sum_{i=1}^{|L|} \frac{r_{ui}}{d(i)} \end{align*}\]Unrated items are assumed to have a utility of 0; if no rating values are provided in the truth frame, item ratings are assumed to be 1.
This is then normalized as follows:
\[\begin{align*} \mathrm{nDCG}(L, u) & = \frac{\mathrm{DCG}(L,u)}{\mathrm{DCG}(L_{\mathrm{ideal}}, u)} \end{align*}\]Note
Negative gains are clipped to zero before computing NDCG. This keeps the metric bounded between 0 and 1 and prevents cases where negative gains can lead to misleading positive scores due to cancellation effects.
- Parameters:
n (int | None) – The maximum recommendation list length to consider (longer lists are truncated).
weight (lenskit.metrics.ranking._weighting.RankWeight) – The rank weighting to use.
discount (Discount | None) – The discount function to use. The default, base-2 logarithm, is the original function used by Järvelin and Kekäläinen [JarvelinKekalainen02]. It is deprecated in favor of the
weightoption.gain (str | None) – The field on the test data to use for gain values. If
None(the default), all items present in the test data have a gain of 1. If set to a string, it is the name of a field (e.g.'rating'). In all cases, items not present in the truth data have a gain of 0.k (int | None)
- Stability:
- Caller (see Stability Levels).
- property label#
The metric’s default label in output. The base implementation returns the class name by default.
- measure_list(recs, test)#
Compute measurements for a single list.
- Returns:
A float for simple metrics
Intermediate data for decomposed metrics
A dict mapping metric names to values for multi-metric classes
- Parameters:
recs (lenskit.data.ItemList)
test (lenskit.data.ItemList)
- Return type:
- class lenskit.metrics.ranking.Entropy(dataset, attribute, n=None)#
Bases:
lenskit.metrics.ranking._base.ListMetric,lenskit.metrics.ranking._base.RankingMetricBaseEvaluate diversity using Shannon entropy over item categories.
This metric measures the diversity of categories in recommendation list. Higher entropy indicates more diverse category distribution.
- Parameters:
dataset (lenskit.data.Dataset) – The LensKit dataset containing item entities and their attributes.
attribute (str) – Name of the attribute to use for categories (e.g., ‘genre’, ‘tag’)
n (int | None) – Recommendation list length to evaluate
- Stability:
- Caller (see Stability Levels).
- property label#
The metric’s default label in output. The base implementation returns the class name by default.
- measure_list(recs, test)#
Compute measurements for a single list.
- Returns:
A float for simple metrics
Intermediate data for decomposed metrics
A dict mapping metric names to values for multi-metric classes
- Parameters:
recs (lenskit.data.ItemList)
test (lenskit.data.ItemList)
- Return type:
- class lenskit.metrics.ranking.RankBiasedEntropy(dataset, attribute, n=None, *, weight=None)#
Bases:
lenskit.metrics.ranking._base.ListMetric,lenskit.metrics.ranking._base.RankingMetricBaseEvaluate diversity using rank-biased Shannon entropy over item categories.
This metric measures the diversity of categories in recommendation list with rank-based weighting, giving more importance to items at the top of the recommendation list.
- Parameters:
dataset (lenskit.data.Dataset) – The LensKit dataset containing item entities and their attributes.
attribute (str) – Name of the attribute to use for categories (e.g., ‘genre’, ‘tag’)
n (int | None) – Recommendation list length to evaluate
weight (lenskit.metrics.ranking._weighting.RankWeight | None) – Rank weighting model. Defaults to GeometricRankWeight(0.85)
- Stability:
- Caller (see Stability Levels).
- property label#
The metric’s default label in output. The base implementation returns the class name by default.
- measure_list(recs, test)#
Compute measurements for a single list.
- Returns:
A float for simple metrics
Intermediate data for decomposed metrics
A dict mapping metric names to values for multi-metric classes
- Parameters:
recs (lenskit.data.ItemList)
test (lenskit.data.ItemList)
- Return type:
- class lenskit.metrics.ranking.ExposureGini(n=None, *, k=None, items, weight=GeometricRankWeight())#
Bases:
GiniBaseMeasure exposure distribution of recommendations with the Gini coefficient.
This uses a weighting model to compute the exposure of each item in each list, and computes the Gini coefficient of the total exposure.
- Parameters:
n (int | None) – The maximum recommendation list length.
items (lenskit.data.Vocabulary | lenskit.data.Dataset) – The item vocabulary or a dataset from which to extract the items.
weight (lenskit.metrics.ranking._weighting.RankWeight) – The rank weighting model to use. Defaults to
GeometricRankWeightwith the specified patience parameter.k (int | None)
- Stability:
- Caller (see Stability Levels).
- measure_list(output, test)#
Compute measurements for a single list.
- Returns:
A float for simple metrics
Intermediate data for decomposed metrics
A dict mapping metric names to values for multi-metric classes
- Parameters:
output (lenskit.data.ItemList)
- Return type:
tuple[numpy.typing.NDArray[numpy.int32], numpy.typing.NDArray[numpy.float64]]
- class lenskit.metrics.ranking.ListGini(n=None, *, k=None, items)#
Bases:
GiniBaseMeasure item diversity of recommendations with the Gini coefficient.
This computes the Gini coefficient of the number of lists that each item appears in.
- Parameters:
n (int | None) – The maximum recommendation list length.
items (lenskit.data.Vocabulary | lenskit.data.Dataset) – The item vocabulary or a dataset from which to extract the items.
k (int | None)
- Stability:
- Caller (see Stability Levels).
- measure_list(output, test)#
Compute measurements for a single list.
- Returns:
A float for simple metrics
Intermediate data for decomposed metrics
A dict mapping metric names to values for multi-metric classes
- Parameters:
output (lenskit.data.ItemList)
- Return type:
- class lenskit.metrics.ranking.Hit(n=None, *, k=None)#
Bases:
lenskit.metrics.ranking._base.ListMetric,lenskit.metrics.ranking._base.RankingMetricBaseCompute whether or not a list is a hit; any list with at least one relevant item in the first \(k\) positions (\(L_{\le k} \cap I_u^{\mathrm{test}} \ne \emptyset\)) is scored as 1, and lists with no relevant items as 0. When averaged over the recommendation lists, this computes the hit rate [DK04].
- Stability:
- Caller (see Stability Levels).
- Parameters:
- property label#
The metric’s default label in output. The base implementation returns the class name by default.
- measure_list(recs, test)#
Compute measurements for a single list.
- Returns:
A float for simple metrics
Intermediate data for decomposed metrics
A dict mapping metric names to values for multi-metric classes
- Parameters:
recs (lenskit.data.ItemList)
test (lenskit.data.ItemList)
- Return type:
- class lenskit.metrics.ranking.ILS(dataset, attribute, n=None)#
Bases:
lenskit.metrics.ranking._base.ListMetric,lenskit.metrics.ranking._base.RankingMetricBaseEvaluate recommendation diversity using intra-list similarity (ILS).
This metric measures the average pairwise cosine similarity between item vectors in a recommendation list. Lower values indicate more diverse recommendations, while higher values indicate less diverse recommendations.
- Parameters:
dataset (lenskit.data.Dataset) – The LensKit dataset containing item entities and their attributes.
attribute (str) – Name of the attribute or vector source (e.g., ‘genre’, ‘tag’).
n (int | None) – Recommendation list length to evaluate.
- Stability:
- Caller (see Stability Levels).
- property label#
The metric’s default label in output. The base implementation returns the class name by default.
- measure_list(recs, test)#
Compute measurements for a single list.
- Returns:
A float for simple metrics
Intermediate data for decomposed metrics
A dict mapping metric names to values for multi-metric classes
- Parameters:
recs (lenskit.data.ItemList)
test (lenskit.data.ItemList)
- Return type:
- class lenskit.metrics.ranking.AveragePrecision(n=None, *, k=None)#
Bases:
lenskit.metrics.ranking._base.ListMetric,lenskit.metrics.ranking._base.RankingMetricBaseCompute Average Precision (AP) for a single user’s recommendations. This is the average of the precision at each relevant item in the ranked list.
- property label#
The metric’s default label in output. The base implementation returns the class name by default.
- measure_list(recs, test)#
Compute measurements for a single list.
- Returns:
A float for simple metrics
Intermediate data for decomposed metrics
A dict mapping metric names to values for multi-metric classes
- Parameters:
recs (lenskit.data.ItemList)
test (lenskit.data.ItemList)
- Return type:
- class lenskit.metrics.ranking.MeanPopRank(data, *, n=None, k=None, count='users')#
Bases:
lenskit.metrics.ranking._base.ListMetric,lenskit.metrics.ranking._base.RankingMetricBaseCompute the _obscurity_ (mean popularity rank) of the recommendations.
Unlike other metrics, this metric requires access to the training dataset in order to compute item popularity metrics. Supply this as a constructor parameter.
This metric represents the popularity rank as a quantile, based on the either the number of distinct users who have interacted with the item, or the total interactions (depending on the options — distinct users is the default).
Let $q_i$ be the _popularity rank_, represented as a quantile, of item $i$. $q_i = 1$ for the most-popular item; $q_i=0$ for an item with no users or interactions (the quantiles are min-max scaled). This metric computes the mean of the quantile popularity ranks for the recommended items:
\[\mathcal{M}(L) = \frac{1}{|L|} \sum_{i \in L} q_i\]This metric is based on the ``obscurity’’ metric of Ekstrand and Mahant [EM17] and the popularity-based item novelty metric of Vargas and Castells [VC11].
- Stability:
- Caller (see Stability Levels).
- Parameters:
data (lenskit.data.Dataset)
n (int | None)
k (int | None)
count (Literal['users', 'interactions'])
- item_ranks: pandas.Series[float]#
- measure_list(recs, test)#
Compute measurements for a single list.
- Returns:
A float for simple metrics
Intermediate data for decomposed metrics
A dict mapping metric names to values for multi-metric classes
- Parameters:
recs (lenskit.data.ItemList)
test (lenskit.data.ItemList)
- Return type:
- class lenskit.metrics.ranking.Precision(n=None, *, k=None)#
Bases:
lenskit.metrics.ranking._base.ListMetric,lenskit.metrics.ranking._base.RankingMetricBaseCompute recommendation precision. This is computed as:
\[\frac{|L \cap I_u^{\mathrm{test}}|}{|L|}\]In the uncommon case that
kis specified andlen(recs) < k, this metric useslen(recs)as the denominator.- Stability:
- Caller (see Stability Levels).
- Parameters:
- property label#
The metric’s default label in output. The base implementation returns the class name by default.
- measure_list(recs, test)#
Compute measurements for a single list.
- Returns:
A float for simple metrics
Intermediate data for decomposed metrics
A dict mapping metric names to values for multi-metric classes
- Parameters:
recs (lenskit.data.ItemList)
test (lenskit.data.ItemList)
- Return type:
- class lenskit.metrics.ranking.Recall(n=None, *, k=None)#
Bases:
lenskit.metrics.ranking._base.ListMetric,lenskit.metrics.ranking._base.RankingMetricBaseCompute recommendation recall. This is computed as:
\[\frac{|L \cap I_u^{\mathrm{test}}|}{\operatorname{min}\{|I_u^{\mathrm{test}}|, k\}}\]- property label#
The metric’s default label in output. The base implementation returns the class name by default.
- measure_list(recs, test)#
Compute measurements for a single list.
- Returns:
A float for simple metrics
Intermediate data for decomposed metrics
A dict mapping metric names to values for multi-metric classes
- Parameters:
recs (lenskit.data.ItemList)
test (lenskit.data.ItemList)
- Return type:
- class lenskit.metrics.ranking.RBP(n=None, *, k=None, weight=None, patience=0.85, normalize=False, weight_field=None)#
Bases:
lenskit.metrics.ranking._base.ListMetric,lenskit.metrics.ranking._base.RankingMetricBaseEvaluate recommendations with rank-biased precision [MZ08].
If \(r_{ui} \in \{0, 1\}\) is binary implicit ratings, and the weighting is the default geometric weight with patience \(p\), the RBP is computed by:
\[\begin{align*} \operatorname{RBP}_p(L, u) & =(1 - p) \sum_i r_{ui} p^i \end{align*}\]The original RBP metric depends on the idea that the rank-biased sum of binary relevance scores in an infinitely-long, perfectly-precise list has is \(1/(1 - p)\). If RBP is used with a non-standard weighting that does not have a defined infinite series sum, then this metric will normalize by the sum of the discounts for the recommendation list.
Moffat and Zobel [MZ08] provide an extended discussion on choosing the patience parameter \(\gamma\). This metric defaults to \(\gamma=0.85\), to provide a relatively shallow curve and reward good items on the first few pages of results (in a 10-per-page setting). Recommendation systems data has no pooling, so the variance of this estimator may be high as they note in the paper; however, RBP with high patience should be no worse than nDCG (and perhaps even better) in this regard.
In recommender evaluation, we usually have a small test set, so the maximum achievable RBP is significantly less than the theoretical maximum, and is a function of the number of test items. With
normalize=True, the RBP metric will be normalized by the maximum achievable with the provided test data, like NDCG.Warning
The additional normalization is experimental, and should not yet be used for published research results.
- Parameters:
n (int | None) – The maximum recommendation list length.
weight (lenskit.metrics.ranking._weighting.RankWeight | None) – The rank weighting model to use. Defaults to
GeometricRankWeightwith the specified patience parameter.patience (float) – The patience parameter \(p\), the probability that the user continues browsing at each point. The default is 0.85.
normalize (bool) – Whether to normalize the RBP scores; if
True, divides the RBP score by the maximum achievable with the test data (as in nDCG).weight_field (str | None) – Name of a field in the item list to use as weights. If provided, weights are read from this field instead of being computed from the rank model.
k (int | None)
- Stability:
- Caller (see Stability Levels).
- property label#
The metric’s default label in output. The base implementation returns the class name by default.
- measure_list(recs, test)#
Compute measurements for a single list.
- Returns:
A float for simple metrics
Intermediate data for decomposed metrics
A dict mapping metric names to values for multi-metric classes
- Parameters:
recs (lenskit.data.ItemList)
test (lenskit.data.ItemList)
- Return type:
- lenskit.metrics.ranking.rank_biased_precision(good, weights, normalization=1.0)#
Compute rank-biased precision given explicit weights.
- Parameters:
good (numpy.ndarray) – Boolean array indicating relevant items at each position.
weights (numpy.ndarray) – Weight for each item position (same length as good).
normalization (float) – Optional normalization factor, defaults to 1.0.
- Returns:
RBP score
- Return type:
- class lenskit.metrics.ranking.RecipRank(n=None, *, k=None)#
Bases:
lenskit.metrics.ranking._base.ListMetric,lenskit.metrics.ranking._base.RankingMetricBaseCompute the reciprocal rank [KV97] of the first relevant item in a list of recommendations. Taking the mean of this metric over the recommendation lists in a run yields the MRR (mean reciprocal rank).
Let \(\kappa\) denote the 1-based rank of the first relevant item in \(L\), with \(\kappa=\infty\) if none of the first \(k\) items in \(L\) are relevant; then the reciprocal rank is \(1 / \kappa\). If no elements are relevant, the reciprocal rank is therefore 0. Deshpande and Karypis [DK04] call this the “reciprocal hit rate”.
- Stability:
- Caller (see Stability Levels).
- Parameters:
- property label#
The metric’s default label in output. The base implementation returns the class name by default.
- measure_list(recs, test)#
Compute measurements for a single list.
- Returns:
A float for simple metrics
Intermediate data for decomposed metrics
A dict mapping metric names to values for multi-metric classes
- Parameters:
recs (lenskit.data.ItemList)
test (lenskit.data.ItemList)
- Return type:
- class lenskit.metrics.ranking.GeometricRankWeight(patience=0.85)#
Bases:
RankWeightGeometric cascade weighting for result ranks.
This is the ranking model used by RBP [MZ08].
For patience \(p\), the discount is given by \(p^{k-1}\). The sum of this infinite series is \(\frac{1}{1 - p}\).
- Parameters:
patience (Annotated[float, Gt(0.0), Lt(1.0)]) – The patience parameter \(p\).
- Stability:
- Caller (see Stability Levels).
- weight(ranks)#
Compute the discount for the specified ranks.
Ranks must start with 1.
- Return type:
lenskit.data.types.NPVector[numpy.float64]
- log_weight(ranks)#
Compute the (natural) log of the discount for the specified ranks.
Ranks must start with 1.
- Return type:
lenskit.data.types.NPVector[numpy.float64]
- class lenskit.metrics.ranking.LogRankWeight(*, base=2, offset=0)#
Bases:
RankWeightLogarithmic weighting for result ranks, as used in NDCG.
This is the ranking model typically used for DCG and NDCG.
Since \(\operatorname{lg} 1 = 0\), simply taking the log will result in division by 0 when weights are applied. The correction for this in the original NDCG paper [JarvelinKekalainen02] is to clip the ranks, so that both of the first two positions have discount \(\operatorname{lg} 2\). A different correction somtimes seen is to compute \(\operatorname{lg} (k+1)\). This discount supports both; the default is to clip, but if the
offsetoption is set to a positive number, it is added to the ranks instead.- Parameters:
base (pydantic.PositiveFloat) – The log base to use.
offset (pydantic.NonNegativeInt) – An offset to add to ranks before computing logs.
- weight(ranks)#
Compute the discount for the specified ranks.
Ranks must start with 1.
- class lenskit.metrics.ranking.RankWeight#
Bases:
abc.ABCBase class for rank weighting models.
This returns multiplicative weights, such that scores should be multiplied by the weights in order to produce weighted scores.
- Stability:
- Caller (see Stability Levels).
- abstractmethod weight(ranks)#
Compute the discount for the specified ranks.
Ranks must start with 1.
- Parameters:
ranks (lenskit.data.types.NPVector[numpy.int32])
- Return type:
lenskit.data.types.NPVector[numpy.float64]
- log_weight(ranks)#
Compute the (natural) log of the discount for the specified ranks.
Ranks must start with 1.
- Parameters:
ranks (lenskit.data.types.NPVector[numpy.int32])
- Return type:
lenskit.data.types.NPVector[numpy.float64]