itwinai.componentsο
This module provides the base classes to define modular and reproducible ML workflows. The base component classes provide a template to follow for extending existing components or creating new ones.
There are two ways of creating workflows: simple and advanced workflows.
Simple workflows can be obtained by creating a sequence of components wrapped in a Pipeline object, which executes them in cascade, passing the output of a component as the input of the following one. It is responsibility of the user to prevent mismatches among outputs and inputs of component sequences. This pipeline can be configured both in terms of parameters and structure, with a configuration file representing the whole pipeline. This configuration file can be executed using itwinai CLI without the need for python files.
Example:
>>> from itwinai.components import DataGetter, Saver
>>> from itwinai.pipeline import Pipeline
>>>
>>> my_pipe = Pipeline({"getter": DataGetter(...), "data_saver": Saver(...)})
>>> my_pipe.execute()
>>> my_pipe.to_yaml("training_pipe.yaml")
>>>
>>> # The pipeline can be parsed back to Python with:
>>> from itwinai.parser import PipeParser
>>> my_pipe = PipeParser("training_pipe.yaml")
>>> my_pipe.execute()
>>>
>>> # Run the pipeline from configuration file with dynamic override
>>> itwinai exec-pipeline --config training_pipe.yaml >>> --override pipeline.init_args.steps.data_saver.some_param 42
Advanced workflows foresee more complicated connections between the components, thus complicating the definition of a structure structure beforehand without risking of over-constraining the user. Therefore, advanced workflows are defined by explicitly connecting component outputs to to the inputs of other components, without a wrapper Pipeline object. In this case, the configuration files enable the user to persist the parameters passed to the argument parser, enabling reuse through configuration files, with the possibility of dynamic overrides of parameters.
Example:
>>> from jsonargparse import ArgumentParser, ActionConfigFile
>>>
>>> parser = ArgumentParser(description='PyTorch MNIST Example')
>>> parser.add_argument('--batch-size', type=int, default=64,
>>> help='input batch size for training (default: 64)')
>>> parser.add_argument('--epochs', type=int, default=10,
>>> help='number of epochs to train (default: 10)')
>>> parser.add_argument('--lr', type=float, default=0.01,
>>> help='learning rate (default: 0.01)')
>>> parser.add_argument(
>>> "-c", "--config", action=ActionConfigFile,
>>> required=True,
>>> help="Path to a configuration file in json or yaml format."
>>> )
>>> args = parser.parse_args()
>>>
>>> from itwinai.components import (
>>> DataGetter, Saver, DataSplitter, Trainer
>>> )
>>> getter = DataGetter(...)
>>> splitter = DataSplitter(...)
>>> data_saver = Saver(...)
>>> model_saver = Saver(...)
>>> trainer = Trainer(
>>> batch_size=args.batch_size, lr=args.lr, epochs=args.epochs
>>> )
>>>
>>> # Compose workflow
>>> my_dataset = getter.execute()
>>> train_set, valid_set, test_set = splitter.execute(my_dataset)
>>> data_saver.execute("train_dataset.pkl", test_set)
>>> _, _, _, trained_model = trainer(train_set, valid_set)
>>> model_saver.execute(trained_model)
>>>
>>> # Run the script using a previous configuration with dynamic override
>>> python my_train.py --config training_pipe.yaml --lr 0.002
- itwinai.components.monitor_exec(method: Callable) Callable[source]ο
Decorator for
BaseComponentβs methods. Prints when the component starts and ends executing, indicating its execution time.
- class itwinai.components.BaseComponent(name: str | None = None)[source]ο
Bases:
ABC,SerializableBase component class. Each component provides a simple interface to foster modularity in machine learning code. Each component class implements the execute method, which received some input ML artifacts (e.g., datasets), performs some operations and returns new artifacts. The components are meant to be assembled in complex ML workflows, represented as pipelines.
- Args:
- name (Optional[str], optional): unique identifier for a step.
Defaults to None.
- parameters: Dict[Any, Any] = Noneο
Dictionary storing constructor arguments. Needed to serialize the class to dictionary. Set by
self.save_parameters()method.
- property name: strο
Name of current component. Defaults to
self.__class__.__name__.
- class itwinai.components.DataGetter(name: str | None = None)[source]ο
Bases:
BaseComponentRetrieves a dataset.
- class itwinai.components.DataProcessor(name: str | None = None)[source]ο
Bases:
BaseComponentPerforms dataset pre-processing.
- class itwinai.components.DataSplitter(train_proportion: int | float, validation_proportion: int | float, test_proportion: int | float, name: str | None = None)[source]ο
Bases:
BaseComponentSplits a dataset into train, validation, and test splits.
- property train_proportion: int | floatο
Training set proportion.
- property validation_proportion: int | floatο
Validation set proportion.
- property test_proportion: int | floatο
Test set proportion.
- class itwinai.components.Trainer(name: str | None = None)[source]ο
Bases:
BaseComponentTrains a machine learning model.
- class itwinai.components.Predictor(model: MLModel | ModelLoader, name: str | None = None)[source]ο
Bases:
BaseComponentApplies a pre-trained machine learning model to unseen data.
- class itwinai.components.Saver(name: str | None = None)[source]ο
Bases:
BaseComponentSaves artifact to disk.
- abstract execute(artifact: MLArtifact) MLArtifact[source]ο
Saves an ML artifact to disk.
- Parameters:
artifact (MLArtifact) β artifact to save.
- Returns:
the same input artifact, after saving it.
- Return type:
- class itwinai.components.Adapter(policy: List[Any], name: str | None = None)[source]ο
Bases:
BaseComponentConnects to components in a sequential pipeline, allowing to control with greater detail how intermediate results are propagated among the components.
- Parameters:
policy (List[Any]) β list of the same length of the output of this component, describing how to map the input args to the output.
name (Optional[str], optional) β name of the component. Defaults to None.
The adapter allows to define a policy with which inputs are re-arranged before being propagated to the next component. Some examples: [policy]: (input) -> (output)
[βINPUT_ARG#2β, βINPUT_ARG#1β, βINPUT_ARG#0β]: (11,22,33) -> (33,22,11)
[βINPUT_ARG#0β, βINPUT_ARG#2β, None]: (11, 22, 33) -> (11, 33, None)
[]: (11, 22, 33) -> ()
[42, βINPUT_ARG#2β, βhelloβ] -> (11,22,33,44,55) -> (42, 33, βhelloβ)
[None, 33, 3.14]: () -> (None, 33, 3.14)
[None, 33, 3.14]: (βdoubleβ, 44, None, True) -> (None, 33, 3.14)
- INPUT_PREFIX: str = 'INPUT_ARG#'ο
- policy: List[Any]ο
Adapter policy.