lenskit.data.movielens#

Code to import MovieLens data sets into LensKit.

Attributes#

LOC

Classes#

MLData

Internal class representing an open ML data set.

ML100KLoader

Loader for the ML100K data set.

MLMLoader

Loader for the ML 1M and 10M data sets.

MLModernLoader

Loader for modern MovieLens data sets (20M and later).

Functions#

load_movielens(path)

Load a MovieLens dataset. The appropriate MovieLens format is detected

load_movielens_df(path)

Load the ratings from a MovieLens dataset as a raw data frame. The

Module Contents#

type lenskit.data.movielens.LOC = Path | tuple[ZipFile, str]#
class lenskit.data.movielens.MLData(version, source, prefix='')#

Internal class representing an open ML data set.

Stability: Internal

This API is at the internal or experimental stability level: it may change at any time, and breaking changes will not necessarily be described in the release notes. See Stability Levels for details.

Parameters:
version: str#
source: pathlib.Path | zipfile.ZipFile#
prefix: str = ''#
static version_impl(version)#
Parameters:

version (str)

Return type:

collections.abc.Callable[Ellipsis, MLData]

__enter__()#
__exit__(*args)#
open_file(name, encoding='utf8')#
Parameters:
abstractmethod dataset()#

Load the full dataset.

Return type:

lenskit.data._dataset.Dataset

abstractmethod ratings_df()#

Load the ratings data frame.

Return type:

pandas.DataFrame

class lenskit.data.movielens.ML100KLoader(version, source, prefix='')#

Bases: MLData

Loader for the ML100K data set.

Parameters:
dataset()#

Load the full dataset.

Return type:

lenskit.data._dataset.Dataset

genres()#
Return type:

pandas.Series

movies_df(genres=None)#
Parameters:

genres (list[str] | None)

Return type:

pandas.DataFrame

users_df()#
Return type:

pandas.DataFrame

ratings_df()#

Load the ratings data frame.

Return type:

pandas.DataFrame

class lenskit.data.movielens.MLMLoader(version, source, prefix='')#

Bases: MLData

Loader for the ML 1M and 10M data sets.

Parameters:
dataset()#

Load the full dataset.

Return type:

lenskit.data._dataset.Dataset

movies_df()#
users_df()#
ratings_df()#

Load the ratings data frame.

tagging_df()#
class lenskit.data.movielens.MLModernLoader(version, source, prefix='')#

Bases: MLData

Loader for modern MovieLens data sets (20M and later).

Parameters:
dataset()#

Load the full dataset.

Return type:

lenskit.data._dataset.Dataset

movies_df()#
tagging_df()#
genome_df()#
ratings_df()#

Load the ratings data frame.

lenskit.data.movielens.load_movielens(path)#

Load a MovieLens dataset. The appropriate MovieLens format is detected based on the file contents.

Stability:
Caller (see Stability Levels).
Parameters:

path (str | pathlib.Path) – The path to the dataset, either as an unpacked directory or a zip file.

Returns:

The dataset.

Return type:

lenskit.data._dataset.Dataset

lenskit.data.movielens.load_movielens_df(path)#

Load the ratings from a MovieLens dataset as a raw data frame. The appropriate MovieLens format is detected based on the file contents.

Stability:
Caller (see Stability Levels).
Parameters:

path (str | pathlib.Path) – The path to the dataset, either as an unpacked directory or a zip file.

Returns:

The ratings, with columns user_id, item_id, rating, and timestamp.

Return type:

pandas.DataFrame

Exported Aliases#

lenskit.data.movielens.get_logger()#

Re-exported alias for lenskit.logging.get_logger().

class lenskit.data.movielens.DatasetBuilder#

Re-exported alias for lenskit.data._builder.DatasetBuilder.

class lenskit.data.movielens.Dataset#

Re-exported alias for lenskit.data._dataset.Dataset.