lenskit.pipeline.PipelineBuilder ================================ .. py:class:: lenskit.pipeline.PipelineBuilder(name = None, version = None) :canonical: lenskit.pipeline._builder.PipelineBuilder Builder for LensKit recommendation pipelines. :ref:`Pipelines ` are the core abstraction for using LensKit models and other components to produce recommendations in a useful way. They allow you to wire together components in (mostly) abitrary graphs, train them on data, and serialize the resulting pipelines to disk for use elsewhere. The builder configures and builds pipelines that can then be run. If you have a scoring model and just want to generate recommenations with a default setup and minimal configuration, see :func:`~lenskit.pipeline.topn_pipeline` or :class:`~lenskit.pipeline.RecPipelineBuilder`. :param name: A name for the pipeline. :param version: A numeric version for the pipeline. :Stability: Caller .. py:attribute:: name :type: str | None :value: None The pipeline name. .. py:attribute:: version :type: str | None :value: None The pipeline version string. .. py:method:: from_pipeline(pipeline) :classmethod: Create a builder initialized with a pipeline's internal state. See :meth:`Pipeline.modify` for details — that is the main entry point, and this method exists to be the implementation of that method. .. py:method:: meta(*, include_hash = True) Get the metadata (name, version, hash, etc.) for this pipeline without returning the whole config. :param include_hash: Whether to include a configuration hash in the metadata. .. py:method:: nodes() Get the nodes in the pipeline graph. .. py:method:: node(node: str, *, missing: Literal['error'] = 'error') -> lenskit.pipeline.nodes.Node[object] node(node: str, *, missing: Literal['none'] | None) -> lenskit.pipeline.nodes.Node[object] | None node(node: lenskit.pipeline.nodes.Node[T]) -> lenskit.pipeline.nodes.Node[T] Get the pipeline node with the specified name. If passed a node, it returns the node or fails if the node is not a member of the pipeline. :param node: The name of the pipeline node to look up, or a node to check for membership. :returns: The pipeline node, if it exists. :raises KeyError: The specified node does not exist. .. py:property:: default_node :type: lenskit.pipeline.nodes.Node[Any] | None Get the default node for this pipeline. .. py:method:: create_input[T](name: str, *types: TypeForm[T]) -> lenskit.pipeline.nodes.Node[T] create_input(name: str, *types: Any) -> lenskit.pipeline.nodes.Node[Any] Create an input node for the pipeline. Pipelines expect their inputs to be provided when they are run. :param name: The name of the input. The name must be unique in the pipeline (among both components and inputs). :param types: The allowable types of the input; input data can be of any specified type. If ``None`` is among the allowed types, the input can be omitted. :returns: A pipeline node representing this input. :raises ValueError: a node with the specified ``name`` already exists. .. py:method:: literal[T](value, *, name = None) Create a literal node (a node with a fixed value). .. note:: Literal nodes cannot be serialized witih :meth:`get_config` or :meth:`save_config`. .. py:method:: default_connection(name, node) Set the default wiring for a component input. Components that declare an input parameter with the specified ``name`` but no configured input will be wired to this node. This is intended to be used for things like wiring up `user` parameters to semi-automatically receive the target user's identity and history. .. important:: Defaults are a feature of the builder only, and are resolved in :meth:`build`. They are not included in serialized configuration or resulting pipeline. :param name: The name of the parameter to set a default for. :param node: The node or literal value to wire to this parameter. .. py:method:: default_component(node) Set the default node for the pipeline. If :meth:`Pipeline.run` is called without a node, then it will run this node (and all of its dependencies). .. py:method:: remove_alias(alias, *, exist_ok = False) Remove an alias from the builder. .. py:method:: alias(alias, node, *, replace = False) Create an alias for a node. After aliasing, the node can be retrieved from :meth:`node` using either its original name or its alias. :param alias: The alias to add to the node. :param node: The node (or node name) to alias. :param replace: If ``True``, replace the alias if one already exists. :raises ValueError: if the alias is already used as an alias or node name. .. py:method:: add_component[CFG, T](name: str, cls: lenskit.pipeline.components.ComponentConstructor[CFG, T], config: CFG = None, /, **inputs: lenskit.pipeline.nodes.Node[Any]) -> lenskit.pipeline.nodes.Node[T] add_component(name: str, instance: lenskit.pipeline.components.Component[T] | lenskit.pipeline.components.PipelineFunction[T], /, **inputs: lenskit.pipeline.nodes.Node[Any] | object) -> lenskit.pipeline.nodes.Node[T] Add a component and connect it into the graph. :param name: The name of the component in the pipeline. The name must be unique in the pipeline (among both components and inputs). :param cls: A component class. :param config: The configuration object for the component class. :param instance: A raw function or pre-instantiated component. :param inputs: The component's input wiring. See :ref:`pipeline-connections` for details. :returns: The node representing this component in the pipeline. .. py:method:: replace_component[CFG, T](name: str | lenskit.pipeline.nodes.Node[T], cls: lenskit.pipeline.components.ComponentConstructor[CFG, T], config: CFG = None, /, **inputs: lenskit.pipeline.nodes.Node[Any]) -> lenskit.pipeline.nodes.Node[T] replace_component(name: str | lenskit.pipeline.nodes.Node[T], instance: lenskit.pipeline.components.Component[T] | lenskit.pipeline.components.PipelineFunction[T], /, **inputs: lenskit.pipeline.nodes.Node[Any] | object) -> lenskit.pipeline.nodes.Node[T] Replace a component in the graph. The new component must have a type that is compatible with the old component. Both input and output connections are retained, except for those overridden with with keyword arguments. :param name: The name or node to replace. :param comp: The component or constructor to use instead of the current node's component. :param config: A configuration for the component (if passed as a class or constructor). :param inputs: New input wiring(s) for the new component. .. py:method:: connect(obj, **inputs) Provide additional input connections for a component that has already been added. See :ref:`pipeline-connections` for details. :param obj: The name or node of the component to wire. :param inputs: The component's input wiring. For each keyword argument in the component's function signature, that argument can be provided here with an input that the pipeline will provide to that argument of the component when the pipeline is run. .. py:method:: clear_inputs(node) Remove input wirings for a node. :param node: The node whose input wiring should be removed. .. py:method:: add_run_hook(name, hook, *, priority = 1) Add a hook to be called when the pipeline is run (see :ref:`pipeline-hooks`). :param name: The name of the hook to add a handler for. :param hook: The hook function to run. :param priority: The hook priority. Hooks are run in ascending priority, and hooks with the same priority are run in the order they are added. LensKit's built-in hooks run at priority 0. .. py:method:: validate() Check the built pipeline for errors. .. py:method:: clone() Clone the pipeline builder. The resulting builder starts as a copy of this builder, and any subsequent modifications only the copy to which they are applied. .. py:method:: build_config(*, include_hash = True) Get this pipeline's configuration for serialization. The configuration consists of all inputs and components along with their configurations and input connections. It can be serialized to disk (in JSON, YAML, or a similar format) to save a pipeline. The configuration does **not** include any trained parameter values, although the configuration may include things such as paths to checkpoints to load such parameters, depending on the design of the components in the pipeline. .. note:: Literal nodes (from :meth:`literal`, or literal values wired to inputs) cannot be serialized, and this method will fail if they are present in the pipeline. .. py:method:: config_hash() Get a hash of the pipeline's configuration to uniquely identify it for logging, version control, or other purposes. The hash format and algorithm are not guaranteed, but hashes are stable within a LensKit version. For the same version of LensKit and component code, the same configuration will produce the same hash, so long as there are no literal nodes. Literal nodes will *usually* hash consistently, but since literals other than basic JSON values are hashed by pickling, hash stability depends on the stability of the pickle bytestream. In LensKit 2025.1, the configuration hash is computed by computing the JSON serialization of the pipeline configuration *without* a hash and returning the hex-encoded SHA256 hash of that configuration. .. py:method:: load_config(cfg_file) :classmethod: Load a pipeline from a saved configuration file. :param cfg_file: The path to a TOML, YAML, or JSON file containing the pipeline configuration. :returns: The consructed pipeline. .. seealso:: :meth:`from_config` for the actual pipeline instantiation logic. .. py:method:: from_config(config, *, file_path = None) :classmethod: Reconstruct a pipeline builder from a serialized configuration. :param config: The configuration object, as loaded from JSON, TOML, YAML, or similar. Will be validated into a :class:`PipelineConfig`. :returns: The configured (but not trained) pipeline. :raises PipelineError: If there is a configuration error reconstructing the pipeline. :Warns: **PipelineWarning** -- If the configuration is funny but usable; for example, the configuration includes a hash but the constructed pipeline does not have a matching hash. .. py:method:: apply_config(config, *, extend = False, check_hash = False) Apply a configuration to this builder. :param config: The pipeline configuration to apply. :param extend: Whether the configuration should extend the current pipeline, or fail when there are conflicting definitions. .. py:method:: use_first_of[T](name, primary, fallback) Ergonomic method to create a new node that returns the result of its ``input`` if it is provided and not ``None``, and otherwise returns the result of ``fallback``. This method is used for things like filling in optional pipeline inputs. For example, if you want the pipeline to take candidate items through an ``items`` input, but look them up from the user's history and the training data if ``items`` is not supplied, you would do: .. code:: python pipe = Pipeline() # allow candidate items to be optionally specified items = pipe.create_input('items', list[EntityId], None) # find candidates from the training data (optional) lookup_candidates = pipe.add_component( 'select-candidates', TrainingItemsCandidateSelector(), user=history, ) # if the client provided items as a pipeline input, use those; otherwise # use the candidate selector we just configured. candidates = pipe.use_first_of('candidates', items, lookup_candidates) .. note:: This method does not distinguish between an input being unspecified and explicitly specified as ``None``. .. note:: This method does *not* implement item-level fallbacks, only fallbacks at the level of entire results. For item-level score fallbacks, see :class:`~lenskit.basic.FallbackScorer`. .. note:: If one of the fallback elements is a component ``A`` that depends on another component or input ``B``, and ``B`` is missing or returns ``None`` such that ``A`` would usually fail, then ``A`` will be skipped and the fallback will move on to the next node. This works with arbitrarily-deep transitive chains. :param name: The name of the node. :param primary: The node to use as the primary input, if it is available. :param fallback: The node to use if the primary input does not provide a value. .. py:method:: build(cache = None) Build the pipeline. :param cache: The pipeline cache to use.