lenskit.data.matrix#

Classes for working with matrix data.

Attributes#

Classes#

CSRStructure

Representation of the compressed sparse row structure of a sparse matrix,

COOStructure

Representation of the coordinate structure of a sparse matrix, without any

SparseIndexType

Data type for the index field of a sparse row. Indexes are just stored as

SparseIndexListType

Sparse index lists. These are the row type for structure-only sparse

SparseRowType

Data type for sparse rows stored in Arrow. Sparse rows are stored as lists

SparseRowArray

An array of sparse rows (a compressed sparse row matrix).

Functions#

fast_col_cooc(…)

Compute column co-occurrances (\(M^{\mathrm{T}}M\)) efficiently.

normalize_matrix(matrix, normalize)

Normalize rows of a matrix.

Module Contents#

lenskit.data.matrix.t#
lenskit.data.matrix.M#
lenskit.data.matrix.SPARSE_IDX_EXT_NAME = 'lenskit.sparse_index'#
lenskit.data.matrix.SPARSE_IDX_LIST_EXT_NAME = 'lenskit.sparse_index_list'#
lenskit.data.matrix.SPARSE_ROW_EXT_NAME = 'lenskit.sparse_row'#
class lenskit.data.matrix.CSRStructure#

Bases: NamedTuple

Representation of the compressed sparse row structure of a sparse matrix, without any data values.

Stability:
Caller (see Stability Levels).
rowptrs: numpy.ndarray#
colinds: numpy.ndarray#
shape: tuple[int, int]#
property nrows#
property ncols#
property nnz#
property row_nnzs: lenskit.data.types.NPVector[numpy.int32]#

Array of row sizes (number of nonzeros in each row).

Return type:

lenskit.data.types.NPVector[numpy.int32]

extent(row)#
Parameters:

row (int)

Return type:

tuple[int, int]

row_cs(row)#
Parameters:

row (int)

Return type:

numpy.ndarray

class lenskit.data.matrix.COOStructure#

Bases: NamedTuple

Representation of the coordinate structure of a sparse matrix, without any data values.

Stability:
Caller (see Stability Levels).
row_numbers: numpy.ndarray#
col_numbers: numpy.ndarray#
shape: tuple[int, int]#
property nrows#
property ncols#
property nnz#
class lenskit.data.matrix.SparseIndexType(dimension)#

Bases: pyarrow.ExtensionType

Data type for the index field of a sparse row. Indexes are just stored as ``int32``s; the extension type attaches the row’s dimensionality to the index field (making it easier to pass it to/from Rust, since we often pass arrays and not entire fields).

Stability: Internal

This API is at the internal or experimental stability level: it may change at any time, and breaking changes will not necessarily be described in the release notes. See Stability Levels for details.

Parameters:

dimension (int)

dimension: int#
check_dimension(expected)#

Check that this index type has the expected dimension.

Returns:

The dimension of the index type.

Raises:

ValueError – If the type’s dimension does not match the expected dimension.

Parameters:

expected (int | None)

Return type:

int

__arrow_ext_serialize__()#
Return type:

bytes

classmethod __arrow_ext_deserialize__(storage_type, serialized)#
class lenskit.data.matrix.SparseIndexListType(dimension, large=False)#

Bases: pyarrow.ExtensionType

Sparse index lists. These are the row type for structure-only sparse matrices.

Parameters:
value_type: None = None#
index_type: SparseIndexType#
classmethod from_type(data_type, dimension=None)#

Create a sparse index list type from an Arrow data type, handling legacy struct layouts without the extension types.

Parameters:
  • data_type (pyarrow.DataType) – The Arrow data type to interpret as a row type.

  • dimension (int | None) – The row dimension, if known from an external source. If provided and the data type also includes the dimensionality, both dimensions must match.

Raises:
  • TypeError – If the data type is not a valid sparse row type.

  • ValueError – If there is another error, such as mismatched dimensions.

Return type:

SparseIndexListType

property dimension: int#
Return type:

int

__arrow_ext_serialize__()#
Return type:

bytes

classmethod __arrow_ext_deserialize__(storage_type, serialized)#
__arrow_ext_class__()#
class lenskit.data.matrix.SparseRowType(dimension, value_type=pa.float32(), large=False)#

Bases: pyarrow.ExtensionType

Data type for sparse rows stored in Arrow. Sparse rows are stored as lists of structs with index and column fields.

Stability: Internal

This API is at the internal or experimental stability level: it may change at any time, and breaking changes will not necessarily be described in the release notes. See Stability Levels for details.

Parameters:
value_type: pyarrow.DataType | None#
index_type: SparseIndexType#
classmethod from_type(data_type, dimension=None)#

Create a sparse row type from an Arrow data type, handling legacy struct layouts without the extension types.

Parameters:
  • data_type (pyarrow.DataType) – The Arrow data type to interpret as a row type.

  • dimension (int | None) – The row dimension, if known from an external source. If provided and the data type also includes the dimensionality, both dimensions must match.

Raises:
  • TypeError – If the data type is not a valid sparse row type.

  • ValueError – If there is another error, such as mismatched dimensions.

Return type:

SparseRowType

property dimension: int#
Return type:

int

__arrow_ext_serialize__()#
Return type:

bytes

classmethod __arrow_ext_deserialize__(storage_type, serialized)#
__arrow_ext_class__()#
class lenskit.data.matrix.SparseRowArray#

Bases: pyarrow.ExtensionArray

An array of sparse rows (a compressed sparse row matrix).

Stability: Internal

This API is at the internal or experimental stability level: it may change at any time, and breaking changes will not necessarily be described in the release notes. See Stability Levels for details.

type: SparseRowType | SparseIndexListType#
classmethod from_arrays(offsets, indices, values=None, *, shape=None)#
Parameters:
  • offsets (numpy.typing.ArrayLike)

  • indices (numpy.typing.ArrayLike)

  • values (numpy.typing.ArrayLike | None)

  • shape (tuple[int, int] | None)

Return type:

SparseRowArray

classmethod from_array(array, dimension=None)#

Interpret an Arrow array as a sparse row array, if possible. Handles legacy layouts without the extension types.

Parameters:
  • array (pyarrow.Array) – The array to convert.

  • dimension (int | None) – The dimensionality of the sparse rows, if known from an external source.

Return type:

SparseRowArray

classmethod from_scipy(matrix, *, values=True, large=None)#

Create a sparse row array from a SciPy sparse matrix.

Parameters:
  • csr – The SciPy sparse matrix (in CSR format).

  • values (bool) – Whether to include the values or create a structure-only array.

  • large (bool | None) – True to force creation of a pa.LargeListArray.

  • matrix (scipy.sparse.sparray)

Returns:

The sparse row array.

Return type:

SparseRowArray

to_scipy()#

Convert this sparse row array to a SciPy sparse array.

Return type:

scipy.sparse.csr_array[Any, tuple[int, int]]

to_torch()#

Convert this sparse row array to a Torch sparse tensor.

Return type:

torch.Tensor

to_coo()#

Convert this array to table representing the array in COO format.

Return type:

pyarrow.Table

property dimension: int#

Get the number of columns in the sparse matrix.

Return type:

int

property shape: tuple[int, int]#
Return type:

tuple[int, int]

property has_values: bool#
Return type:

bool

property offsets: pyarrow.Int32Array#
Return type:

pyarrow.Int32Array

property indices: pyarrow.Int32Array#
Return type:

pyarrow.Int32Array

property values: pyarrow.Array | None#
Return type:

pyarrow.Array | None

property nnz: int#
Return type:

int

structure()#

Get the structure of this matrix (without values).

Return type:

SparseRowArray

transpose()#

Get the transpose of this sparse matrix.

Return type:

SparseRowArray

row_extent(row)#

Get the start and end of a row.

Parameters:

row (int)

Return type:

tuple[int, int]

row_indices(row)#

Get the index array for a compressed sparse row.

Parameters:

row (int)

Return type:

pyarrow.Int32Array

lenskit.data.matrix.fast_col_cooc(rows: lenskit.data.types.NPVector[numpy.int32] | pyarrow.Int32Array, cols: lenskit.data.types.NPVector[numpy.int32] | pyarrow.Int32Array, shape: tuple[int, int], *, progress: lenskit.logging.Progress | None = None, include_diagonal: bool = True, ordered: bool = False, dense: Literal[True]) numpy.ndarray[tuple[int, int], numpy.dtype[numpy.float32]]#
lenskit.data.matrix.fast_col_cooc(rows: lenskit.data.types.NPVector[numpy.int32] | pyarrow.Int32Array, cols: lenskit.data.types.NPVector[numpy.int32] | pyarrow.Int32Array, shape: tuple[int, int], *, progress: lenskit.logging.Progress | None = None, include_diagonal: bool = True, ordered: bool = False, dense: Literal[False] = False) scipy.sparse.coo_array
lenskit.data.matrix.fast_col_cooc(rows: lenskit.data.types.NPVector[numpy.int32] | pyarrow.Int32Array, cols: lenskit.data.types.NPVector[numpy.int32] | pyarrow.Int32Array, shape: tuple[int, int], *, progress: lenskit.logging.Progress | None = None, include_diagonal: bool = True, ordered: bool = False, dense: bool = False) Any

Compute column co-occurrances (\(M^{\mathrm{T}}M\)) efficiently.

lenskit.data.matrix.normalize_matrix(matrix, normalize)#

Normalize rows of a matrix.

Parameters:
  • matrix (scipy.sparse.csr_array | numpy.typing.NDArray[numpy.floating[Any]]) – Sparse or dense matrix to normalize

  • normalize (Literal['unit', 'distribution'] | None) – Normalization mode (“unit” for L2, “distribution” for L1)

Returns:

Normalized matrix

Return type:

scipy.sparse.csr_array | numpy.typing.NDArray[numpy.floating[Any]]

Exported Aliases#

class lenskit.data.matrix.Progress#

Re-exported alias for lenskit.logging.Progress.

lenskit.data.matrix.NPVector#

Re-exported alias for lenskit.data.types.NPVector.