itwinai.components

This module provides the base classes to define modular and reproducible ML workflows. The base component classes provide a template to follow for extending existing components or creating new ones.

There are two ways of creating workflows: simple and advanced workflows.

Simple workflows can be obtained by creating a sequence of components wrapped in a Pipeline object, which executes them in cascade, passing the output of a component as the input of the following one. It is responsibility of the user to prevent mismatches among outputs and inputs of component sequences. This pipeline can be configured both in terms of parameters and structure, with a configuration file representing the whole pipeline. This configuration file can be executed using itwinai CLI without the need for python files.

Example:

>>> from itwinai.components import DataGetter, Saver
>>> from itwinai.pipeline import Pipeline
>>>
>>> my_pipe = Pipeline({"getter": DataGetter(...), "data_saver": Saver(...)})
>>> my_pipe.execute()
>>> my_pipe.to_yaml("training_pipe.yaml")
>>>
>>> # The pipeline can be parsed back to Python with:
>>> from itwinai.parser import PipeParser
>>> my_pipe = PipeParser("training_pipe.yaml")
>>> my_pipe.execute()
>>>
>>> # Run the pipeline from configuration file with dynamic override
>>> itwinai exec-pipeline --config training_pipe.yaml >>> --override pipeline.init_args.steps.data_saver.some_param 42

Advanced workflows foresee more complicated connections between the components, thus complicating the definition of a structure structure beforehand without risking of over-constraining the user. Therefore, advanced workflows are defined by explicitly connecting component outputs to to the inputs of other components, without a wrapper Pipeline object. In this case, the configuration files enable the user to persist the parameters passed to the argument parser, enabling reuse through configuration files, with the possibility of dynamic overrides of parameters.

Example:

>>> from jsonargparse import ArgumentParser, ActionConfigFile
>>>
>>> parser = ArgumentParser(description='PyTorch MNIST Example')
>>> parser.add_argument('--batch-size', type=int, default=64,
>>>                     help='input batch size for training (default: 64)')
>>> parser.add_argument('--epochs', type=int, default=10,
>>>                     help='number of epochs to train (default: 10)')
>>> parser.add_argument('--lr', type=float, default=0.01,
>>>                     help='learning rate (default: 0.01)')
>>> parser.add_argument(
>>>     "-c", "--config", action=ActionConfigFile,
>>>     required=True,
>>>     help="Path to a configuration file in json or yaml format."
>>> )
>>> args = parser.parse_args()
>>>
>>> from itwinai.components import (
>>>     DataGetter, Saver, DataSplitter, Trainer
>>> )
>>> getter = DataGetter(...)
>>> splitter = DataSplitter(...)
>>> data_saver = Saver(...)
>>> model_saver = Saver(...)
>>> trainer = Trainer(
>>>     batch_size=args.batch_size, lr=args.lr, epochs=args.epochs
>>> )
>>>
>>> # Compose workflow
>>> my_dataset = getter.execute()
>>> train_set, valid_set, test_set = splitter.execute(my_dataset)
>>> data_saver.execute("train_dataset.pkl", test_set)
>>> _, _, _, trained_model = trainer(train_set, valid_set)
>>> model_saver.execute(trained_model)
>>>
>>> # Run the script using a previous configuration with dynamic override
>>> python my_train.py --config training_pipe.yaml --lr 0.002

itwinai.components.monitor_exec(method: Callable) → Callable[source]: Decorator for BaseComponent’s methods. Prints when the component starts and ends executing, indicating its execution time.

class itwinai.components.BaseComponent(name: str | None = None)[source]

Bases: ABC, Serializable

Base component class. Each component provides a simple interface to foster modularity in machine learning code. Each component class implements the execute method, which received some input ML artifacts (e.g., datasets), performs some operations and returns new artifacts. The components are meant to be assembled in complex ML workflows, represented as pipelines.

Args:

name (Optional[str], optional): unique identifier for a step.
Defaults to None.

parameters: Dict[Any, Any] = None: Dictionary storing constructor arguments. Needed to serialize the class to dictionary. Set by self.save_parameters() method.

property name: str: Name of current component. Defaults to self.__class__.__name__.

abstract execute(*args, **kwargs) → Any[source]: Execute some operations.

cleanup()[source]: Cleanup resources allocated by this component.

class itwinai.components.DataGetter(name: str | None = None)[source]

Bases: BaseComponent

Retrieves a dataset.

abstract execute() → MLDataset[source]

Retrieves a dataset.

Returns:: retrieved dataset.
Return type:: MLDataset

class itwinai.components.DataProcessor(name: str | None = None)[source]

Bases: BaseComponent

Performs dataset pre-processing.

abstract execute(train_dataset: MLDataset, validation_dataset: MLDataset, test_dataset: MLDataset) → Tuple[MLDataset, MLDataset, MLDataset][source]

Trains a machine learning model.

Parameters:

train_dataset (MLDataset) – training dataset.
validation_dataset (MLDataset) – validation dataset.
test_dataset (MLDataset) – test dataset.

Returns:

preprocessed training dataset, validation dataset, test dataset.

Return type:

Tuple[MLDataset, MLDataset, MLDataset]

class itwinai.components.DataSplitter(train_proportion: int | float, validation_proportion: int | float, test_proportion: int | float, name: str | None = None)[source]

Bases: BaseComponent

Splits a dataset into train, validation, and test splits.

property train_proportion: int | float: Training set proportion.

property validation_proportion: int | float: Validation set proportion.

property test_proportion: int | float: Test set proportion.

abstract execute(dataset: MLDataset) → Tuple[MLDataset, MLDataset, MLDataset][source]

Splits a dataset into train, validation and test splits.

Parameters:: dataset (MLDataset) – input dataset.
Returns:: tuple of train, validation and test splits.
Return type:: Tuple[MLDataset, MLDataset, MLDataset]

class itwinai.components.Trainer(name: str | None = None)[source]

Bases: BaseComponent

Trains a machine learning model.

abstract execute(train_dataset: MLDataset, validation_dataset: MLDataset, test_dataset: MLDataset) → Tuple[MLDataset, MLDataset, MLDataset, MLModel][source]

Trains a machine learning model.

Parameters:

train_dataset (MLDataset) – training dataset.
validation_dataset (MLDataset) – validation dataset.
test_dataset (MLDataset) – test dataset.

Returns:

training dataset, validation dataset, test dataset, trained model.

Return type:

Tuple[MLDataset, MLDataset, MLDataset]

class itwinai.components.Predictor(model: MLModel | ModelLoader, name: str | None = None)[source]

Bases: BaseComponent

Applies a pre-trained machine learning model to unseen data.

model: MLModel: Pre-trained ML model used to make predictions.

abstract execute(predict_dataset: MLDataset, model: MLModel | None = None) → MLDataset[source]

Applies a machine learning model on a dataset of samples.

Parameters:

predict_dataset (MLDataset) – dataset for inference.
model (Optional[MLModel], optional) – overrides the internal model, if given. Defaults to None.

Returns:

predictions with the same cardinality of the input dataset.

Return type:

MLDataset

class itwinai.components.Saver(name: str | None = None)[source]

Bases: BaseComponent

Saves artifact to disk.

abstract execute(artifact: MLArtifact) → MLArtifact[source]

Saves an ML artifact to disk.

Parameters:: artifact (MLArtifact) – artifact to save.
Returns:: the same input artifact, after saving it.
Return type:: MLArtifact

class itwinai.components.Adapter(policy: List[Any], name: str | None = None)[source]

Bases: BaseComponent

Connects to components in a sequential pipeline, allowing to control with greater detail how intermediate results are propagated among the components.

Parameters:

policy (List[Any]) – list of the same length of the output of this component, describing how to map the input args to the output.
name (Optional[str], optional) – name of the component. Defaults to None.

The adapter allows to define a policy with which inputs are re-arranged before being propagated to the next component. Some examples: [policy]: (input) -> (output)

[“INPUT_ARG#2”, “INPUT_ARG#1”, “INPUT_ARG#0”]: (11,22,33) -> (33,22,11)
[“INPUT_ARG#0”, “INPUT_ARG#2”, None]: (11, 22, 33) -> (11, 33, None)
[]: (11, 22, 33) -> ()
[42, “INPUT_ARG#2”, “hello”] -> (11,22,33,44,55) -> (42, 33, “hello”)
[None, 33, 3.14]: () -> (None, 33, 3.14)
[None, 33, 3.14]: (“double”, 44, None, True) -> (None, 33, 3.14)

INPUT_PREFIX: str = 'INPUT_ARG#'

policy: List[Any]: Adapter policy.

execute(*args) → Tuple[source]

Produces an output tuple by arranging input arguments according to the policy specified in the constructor.

Parameters:: args (Tuple) – input arguments.
Returns:: input args arranged according to some policy.
Return type:: Tuple