lenskit.data.matrix#
Classes for working with matrix data.
Attributes#
Classes#
Representation of the compressed sparse row structure of a sparse matrix, |
|
Representation of the coordinate structure of a sparse matrix, without any |
|
Data type for the index field of a sparse row. Indexes are just stored as |
|
Sparse index lists. These are the row type for structure-only sparse |
|
Data type for sparse rows stored in Arrow. Sparse rows are stored as lists |
|
An array of sparse rows (a compressed sparse row matrix). |
Functions#
Compute column co-occurrances (\(M^{\mathrm{T}}M\)) efficiently. |
|
|
Normalize rows of a matrix. |
Module Contents#
- lenskit.data.matrix.t#
- lenskit.data.matrix.M#
- lenskit.data.matrix.SPARSE_IDX_EXT_NAME = 'lenskit.sparse_index'#
- lenskit.data.matrix.SPARSE_IDX_LIST_EXT_NAME = 'lenskit.sparse_index_list'#
- lenskit.data.matrix.SPARSE_ROW_EXT_NAME = 'lenskit.sparse_row'#
- class lenskit.data.matrix.CSRStructure#
Bases:
NamedTupleRepresentation of the compressed sparse row structure of a sparse matrix, without any data values.
- Stability:
- Caller (see Stability Levels).
- rowptrs: numpy.ndarray#
- colinds: numpy.ndarray#
- property nrows#
- property ncols#
- property nnz#
- property row_nnzs: lenskit.data.types.NPVector[numpy.int32]#
Array of row sizes (number of nonzeros in each row).
- Return type:
lenskit.data.types.NPVector[numpy.int32]
- class lenskit.data.matrix.COOStructure#
Bases:
NamedTupleRepresentation of the coordinate structure of a sparse matrix, without any data values.
- Stability:
- Caller (see Stability Levels).
- row_numbers: numpy.ndarray#
- col_numbers: numpy.ndarray#
- property nrows#
- property ncols#
- property nnz#
- class lenskit.data.matrix.SparseIndexType(dimension)#
Bases:
pyarrow.ExtensionTypeData type for the index field of a sparse row. Indexes are just stored as ``int32``s; the extension type attaches the row’s dimensionality to the index field (making it easier to pass it to/from Rust, since we often pass arrays and not entire fields).
Stability: Internal
This API is at the internal or experimental stability level: it may change at any time, and breaking changes will not necessarily be described in the release notes. See Stability Levels for details.
- Parameters:
dimension (int)
- check_dimension(expected)#
Check that this index type has the expected dimension.
- Returns:
The dimension of the index type.
- Raises:
ValueError – If the type’s dimension does not match the expected dimension.
- Parameters:
expected (int | None)
- Return type:
- classmethod __arrow_ext_deserialize__(storage_type, serialized)#
- class lenskit.data.matrix.SparseIndexListType(dimension, large=False)#
Bases:
pyarrow.ExtensionTypeSparse index lists. These are the row type for structure-only sparse matrices.
- index_type: SparseIndexType#
- classmethod from_type(data_type, dimension=None)#
Create a sparse index list type from an Arrow data type, handling legacy struct layouts without the extension types.
- Parameters:
data_type (pyarrow.DataType) – The Arrow data type to interpret as a row type.
dimension (int | None) – The row dimension, if known from an external source. If provided and the data type also includes the dimensionality, both dimensions must match.
- Raises:
TypeError – If the data type is not a valid sparse row type.
ValueError – If there is another error, such as mismatched dimensions.
- Return type:
- classmethod __arrow_ext_deserialize__(storage_type, serialized)#
- __arrow_ext_class__()#
- class lenskit.data.matrix.SparseRowType(dimension, value_type=pa.float32(), large=False)#
Bases:
pyarrow.ExtensionTypeData type for sparse rows stored in Arrow. Sparse rows are stored as lists of structs with
indexandcolumnfields.Stability: Internal
This API is at the internal or experimental stability level: it may change at any time, and breaking changes will not necessarily be described in the release notes. See Stability Levels for details.
- Parameters:
dimension (int)
value_type (pyarrow.DataType | None)
large (bool)
- value_type: pyarrow.DataType | None#
- index_type: SparseIndexType#
- classmethod from_type(data_type, dimension=None)#
Create a sparse row type from an Arrow data type, handling legacy struct layouts without the extension types.
- Parameters:
data_type (pyarrow.DataType) – The Arrow data type to interpret as a row type.
dimension (int | None) – The row dimension, if known from an external source. If provided and the data type also includes the dimensionality, both dimensions must match.
- Raises:
TypeError – If the data type is not a valid sparse row type.
ValueError – If there is another error, such as mismatched dimensions.
- Return type:
- classmethod __arrow_ext_deserialize__(storage_type, serialized)#
- __arrow_ext_class__()#
- class lenskit.data.matrix.SparseRowArray#
Bases:
pyarrow.ExtensionArrayAn array of sparse rows (a compressed sparse row matrix).
Stability: Internal
This API is at the internal or experimental stability level: it may change at any time, and breaking changes will not necessarily be described in the release notes. See Stability Levels for details.
- type: SparseRowType | SparseIndexListType#
- classmethod from_arrays(offsets, indices, values=None, *, shape=None)#
- Parameters:
- Return type:
- classmethod from_array(array, dimension=None)#
Interpret an Arrow array as a sparse row array, if possible. Handles legacy layouts without the extension types.
- Parameters:
array (pyarrow.Array) – The array to convert.
dimension (int | None) – The dimensionality of the sparse rows, if known from an external source.
- Return type:
- classmethod from_scipy(matrix, *, values=True, large=None)#
Create a sparse row array from a SciPy sparse matrix.
- Parameters:
csr – The SciPy sparse matrix (in CSR format).
values (bool) – Whether to include the values or create a structure-only array.
large (bool | None) –
Trueto force creation of apa.LargeListArray.matrix (scipy.sparse.sparray)
- Returns:
The sparse row array.
- Return type:
- to_scipy()#
Convert this sparse row array to a SciPy sparse array.
- Return type:
scipy.sparse.csr_array[Any, tuple[int, int]]
- to_torch()#
Convert this sparse row array to a Torch sparse tensor.
- Return type:
- to_coo()#
Convert this array to table representing the array in COO format.
- Return type:
- property offsets: pyarrow.Int32Array#
- Return type:
- property indices: pyarrow.Int32Array#
- Return type:
- property values: pyarrow.Array | None#
- Return type:
pyarrow.Array | None
- structure()#
Get the structure of this matrix (without values).
- Return type:
- transpose()#
Get the transpose of this sparse matrix.
- Return type:
- lenskit.data.matrix.fast_col_cooc(rows: lenskit.data.types.NPVector[numpy.int32] | pyarrow.Int32Array, cols: lenskit.data.types.NPVector[numpy.int32] | pyarrow.Int32Array, shape: tuple[int, int], *, progress: lenskit.logging.Progress | None = None, include_diagonal: bool = True, ordered: bool = False, dense: Literal[True]) numpy.ndarray[tuple[int, int], numpy.dtype[numpy.float32]]#
- lenskit.data.matrix.fast_col_cooc(rows: lenskit.data.types.NPVector[numpy.int32] | pyarrow.Int32Array, cols: lenskit.data.types.NPVector[numpy.int32] | pyarrow.Int32Array, shape: tuple[int, int], *, progress: lenskit.logging.Progress | None = None, include_diagonal: bool = True, ordered: bool = False, dense: Literal[False] = False) scipy.sparse.coo_array
- lenskit.data.matrix.fast_col_cooc(rows: lenskit.data.types.NPVector[numpy.int32] | pyarrow.Int32Array, cols: lenskit.data.types.NPVector[numpy.int32] | pyarrow.Int32Array, shape: tuple[int, int], *, progress: lenskit.logging.Progress | None = None, include_diagonal: bool = True, ordered: bool = False, dense: bool = False) Any
Compute column co-occurrances (\(M^{\mathrm{T}}M\)) efficiently.
- lenskit.data.matrix.normalize_matrix(matrix, normalize)#
Normalize rows of a matrix.
- Parameters:
matrix (scipy.sparse.csr_array | numpy.typing.NDArray[numpy.floating[Any]]) – Sparse or dense matrix to normalize
normalize (Literal['unit', 'distribution'] | None) – Normalization mode (“unit” for L2, “distribution” for L1)
- Returns:
Normalized matrix
- Return type:
scipy.sparse.csr_array | numpy.typing.NDArray[numpy.floating[Any]]
Exported Aliases#
- class lenskit.data.matrix.Progress#
Re-exported alias for
lenskit.logging.Progress.
- lenskit.data.matrix.NPVector#
Re-exported alias for
lenskit.data.types.NPVector.