Data Loading¶

Utilities¶

from_feather(file_name, nuisance_filter=None)[source]¶

Load a (theta, x) pair from a feather file.

Parameters:

file_name (Path) – Path to the .feather file.
nuisance_filter (Tensor | ndarray | None) – fnmatch patterns for parameters to exclude from theta. None returns all parameters.

Return type:

tuple[ndarray[tuple[Any, ...], dtype[float32]], ndarray[tuple[Any, ...], dtype[float32]]]

Returns:

Tuple of (theta, x) as float32 numpy arrays.

Raises:

FileNotFoundError – If file_name does not exist.

to_feather(file_name, theta_values, x_values)[source]¶

Write a (theta, x) pair to a feather file.

Parameters:

file_name (Path) – Destination path. Must end in .feather.
theta_values (ndarray[tuple[Any, ...], dtype[float32]]) – Parameter array of shape (n_samples, n_params).
x_values (ndarray[tuple[Any, ...], dtype[float32]]) – Observable array of shape (n_samples, n_bins).

Raises:

ValueError – If file_name does not have a .feather suffix.

Return type:

None

Lightning Module¶

class SBIDataModule(dataset, config)[source]¶

Lightning data module over a pre-loaded (theta, x) dataset.

The dataset is expected to have been pre-loaded into CPU RAM via to_tensor_dataset() before this module is constructed.

Under DDP, Lightning automatically wraps each DataLoader’s sampler in a DistributedSampler, which partitions the index space across ranks. Because the underlying TensorDataset tensors are kept in CPU shared memory (no .to(device) call on the dataset itself), each rank reads only its own slice — no data is copied between processes.

Note

The random split uses a fixed seed of 42 so that all DDP ranks produce identical train / validation index sets. If you change this seed, change it consistently across all ranks.

Parameters:

dataset (TensorDataset)
config (TrainingConfig)

setup(stage=None)[source]¶

Called at the beginning of fit (train + validate), validate, test, or predict. This is a good hook when you need to build models dynamically or adjust something about them. This hook is called on every process when using DDP.

Return type:: None
Parameters:: stage (str | None)

Args:: stage: either 'fit', 'validate', 'test', or 'predict'

Example:

class LitModel(...):
    def __init__(self):
        self.l1 = None

    def prepare_data(self):
        download_data()
        tokenize()

        # don't do this
        self.something = else

    def setup(self, stage):
        data = load_data(...)
        self.l1 = nn.Linear(28, data.num_classes)

train_dataloader()[source]¶

Training data loader

Return type:: DataLoader

val_dataloader()[source]¶

Validation data loader

Return type:: DataLoader