Matrix Utilities¶
We have some matrix-related utilities, since matrices are used so heavily in recommendation algorithms.
Building Ratings Matrices¶
-
lenskit.matrix.sparse_ratings(ratings, scipy=False)¶ Convert a rating table to a sparse matrix of ratings.
Parameters: - ratings (pandas.DataFrame) – a data table of (user, item, rating) triples.
- scipy – if
True, return a SciPy matrix instead ofCSR.
Returns: a named tuple containing the sparse matrix, user index, and item index.
Return type:
-
class
lenskit.matrix.RatingMatrix¶ A rating matrix with associated indices.
-
matrix¶ The rating matrix, with users on rows and items on columns.
Type: CSR or scipy.sparse.csr_matrix
-
users¶ mapping from user IDs to row numbers.
Type: pandas.Index
-
items¶ mapping from item IDs to column numbers.
Type: pandas.Index
-
Compressed Sparse Row Matrices¶
We use CSR-format sparse matrices in quite a few places. Since SciPy’s sparse matrices are not directly usable from Numba, we have implemented a Numba-compiled CSR representation that can be used from accelerated algorithm implementations.
-
lenskit.matrix.csr_from_coo(rows, cols, vals, shape=None)¶ Create a CSR matrix from data in COO format.
Parameters: - rows (array-like) – the row indices.
- cols (array-like) – the column indices.
- vals (array-like) – the data values; can be
None. - shape (tuple) – the array shape, or
Noneto infer from row & column indices.
-
lenskit.matrix.csr_from_scipy(mat, copy=True)¶ Convert a scipy sparse matrix to an internal CSR.
Parameters: - mat (scipy.sparse.spmatrix) – a SciPy sparse matrix.
- copy (bool) – if
False, reuse the SciPy storage if possible.
Returns: a CSR matrix.
Return type:
-
lenskit.matrix.csr_to_scipy(mat)¶ Convert a CSR matrix to a SciPy
scipy.sparse.csr_matrix.Parameters: mat (CSR) – A CSR matrix. Returns: A SciPy sparse matrix with the same data. It shares storage with matrix.Return type: scipy.sparse.csr_matrix
-
lenskit.matrix.csr_rowinds(csr)¶ Get the row indices for a CSR matrix.
Parameters: csr (CSR) – a CSR matrix. Returns: the row index array for the CSR matrix. Return type: np.ndarray
-
lenskit.matrix.csr_save(csr: numba.jitclass.base.CSR, prefix=None)¶ Extract data needed to save a CSR matrix. This is intended to be used with, for example,
numpy.savez()to save a matrix:np.savez_compressed('file.npz', **csr_save(csr))
The
prefixallows multiple matrices to be saved in a single file:data = {} data.update(csr_save(m1, prefix='m1')) data.update(csr_save(m2, prefix='m2')) np.savez_compressed('file.npz', **data)
Parameters: Returns: a dictionary of data to save the matrix.
Return type:
-
lenskit.matrix.csr_load(data, prefix=None)¶ Rematerialize a CSR matrix from loaded data. The inverse of
csr_save().Parameters: - data (dict-like) – the input data.
- prefix (str) – the prefix for the data keys.
Returns: the matrix described by
data.Return type:
-
class
lenskit.matrix.CSR(nrows, ncols, nnz, ptrs, inds, vals)¶ Simple compressed sparse row matrix. This is like
scipy.sparse.csr_matrix, with a couple of useful differences:- It is a Numba jitclass, so it can be directly used from Numba-optimized functions.
- The value array is optional, for cases in which only the matrix structure is required.
- The value array, if present, is always double-precision.
You generally don’t want to create this class yourself. Instead, use one of the related utility functions.
-
rowptrs¶ the row pointers.
Type: numpy.ndarray
-
colinds¶ the column indices.
Type: numpy.ndarray
-
values¶ the values
Type: numpy.ndarray