Data Module

Overview

The data module handles data loading, preprocessing, and transformations.

Submodules

conform

neurolit.data.conform.options_parse()[source]

Command line option parser.

Returns:

object holding options

Return type:

options

neurolit.data.conform.map_image(img: SpatialImage, out_affine: ndarray, out_shape: ndarray, ras2ras: ndarray | None = None, order: int = 1, dtype: type | None = None) ndarray[source]

Map image to new voxel space (RAS orientation).

Parameters:
  • img (nib.analyze.SpatialImage) – the src 3D image with data and affine set

  • out_affine (np.ndarray) – trg image affine

  • out_shape (np.ndarray) – the trg shape information

  • ras2ras (Optional[np.ndarray]) – an additional mapping that should be applied (default=id to just reslice)

  • order (int) – order of interpolation (0=nearest,1=linear(default),2=quadratic,3=cubic)

  • dtype (Optional[Type]) – target dtype of the resulting image (relevant for reorientation, default=same as img)

Returns:

mapped image data array

Return type:

np.ndarray

neurolit.data.conform.getscale(data: ndarray, dst_min: float, dst_max: float, f_low: float = 0.0, f_high: float = 0.999) tuple[float, float][source]

Get offset and scale of image intensities to robustly rescale to range dst_min..dst_max.

Equivalent to how mri_convert conforms images.

Parameters:
  • data (np.ndarray) – image data (intensity values)

  • dst_min (float) – future minimal intensity value

  • dst_max (float) – future maximal intensity value

  • f_low (float) – robust cropping at low end (0.0 no cropping, default)

  • f_high (float) – robust cropping at higher end (0.999 crop one thousandth of high intensity voxels, default)

Returns:

  • float src_min – (adjusted) offset

  • float – scale factor

neurolit.data.conform.scalecrop(data: ndarray, dst_min: float, dst_max: float, src_min: float, scale: float) ndarray[source]

Crop the intensity ranges to specific min and max values.

Parameters:
  • data (np.ndarray) – Image data (intensity values)

  • dst_min (float) – future minimal intensity value

  • dst_max (float) – future maximal intensity value

  • src_min (float) – minimal value to consider from source (crops below)

  • scale (float) – scale value by which source will be shifted

Returns:

scaled image data

Return type:

np.ndarray

neurolit.data.conform.rescale(data: ndarray, dst_min: float, dst_max: float, f_low: float = 0.0, f_high: float = 0.999) ndarray[source]

Rescale image intensity values (0-255).

Parameters:
  • data (np.ndarray) – image data (intensity values)

  • dst_min (float) – future minimal intensity value

  • dst_max (float) – future maximal intensity value

  • f_low (float) – robust cropping at low end (0.0 no cropping, default)

  • f_high (float) – robust cropping at higher end (0.999 crop one thousandth of high intensity voxels, default)

Returns:

scaled image data

Return type:

np.ndarray

neurolit.data.conform.find_min_size(img: SpatialImage, max_size: float = 1) float[source]

Find minimal voxel size <= 1mm.

Parameters:
  • img (nib.analyze.SpatialImage) – loaded source image

  • max_size (float) – maximal voxel size in mm (default: 1.0)

Returns:

Rounded minimal voxel size

Return type:

float

Notes

This function only needs the header (not the data).

neurolit.data.conform.find_img_size_by_fov(img: SpatialImage, vox_size: float, min_dim: int = 256) int[source]

Find the cube dimension (>= 256) to cover the field of view of img.

If vox_size is one, the img_size MUST always be min_dim (the FreeSurfer standard).

Parameters:
  • img (nib.analyze.SpatialImage) – loaded source image

  • vox_size (float) – the target voxel size in mm

  • min_dim (int) – minimal image dimension in voxels (default 256)

Returns:

The number of voxels needed to cover field of view.

Return type:

int

Notes

This function only needs the header (not the data).

neurolit.data.conform.is_orientation(affine: ndarray, target_orientation: str = 'lia', eps: float = 1e-06) bool[source]

Check whether affine follows a target orientation code.

neurolit.data.conform.conformed_vox_img_size(img: SpatialImage, vox_size, img_size, threshold_1mm: float | None = None, vox_eps: float = 0.0001, **kwargs) tuple[ndarray | None, ndarray | None][source]

Extract target voxel size and image size with FastSurfer-compatible semantics.

neurolit.data.conform.conform(img: SpatialImage, order: int = 1, vox_size=1.0, img_size=256, dtype: type | None = None, orientation: str | None = 'lia', threshold_1mm: float | None = None, rescale: int | float | None = 255, conform_vox_size=None, conform_to_1mm_threshold: float | None = None) MGHImage[source]

Python version of mri_convert -c.

mri_convert -c by default turns image intensity values into UCHAR, reslices images to standard position, fills up slices to standard 256x256x256 format and enforces 1mm or minimum isotropic voxel sizes.

Parameters:
  • img (nib.analyze.SpatialImage) – loaded source image

  • order (int) – interpolation order (0=nearest,1=linear(default),2=quadratic,3=cubic)

  • conform_vox_size (VoxSizeOption) – conform image the image to voxel size 1. (default), a specific smaller voxel size (0-1, for high-res), or automatically determine the ‘minimum voxel size’ from the image (value ‘min’). This assumes the smallest of the three voxel sizes.

  • dtype (Optional[Type]) – the dtype to enforce in the image (default: UCHAR, as mri_convert -c)

  • conform_to_1mm_threshold (Optional[float]) – the threshold above which the image is conformed to 1mm (default: ignore).

Returns:

conformed image

Return type:

nib.MGHImage

Notes

Unlike mri_convert -c, we first interpolate (float image), and then rescale to uchar. mri_convert is doing it the other way around. However, we compute the scale factor from the input to increase similarity.

neurolit.data.conform.is_conform(img: ~nibabel.spatialimages.SpatialImage, vox_size=1.0, img_size=256, eps: float = 1e-06, check_dtype: bool = True, dtype: type | None = <object object>, orientation: str | None = 'lia', verbose: bool = True, threshold_1mm: float | None = None, conform_vox_size=None, conform_to_1mm_threshold: float | None = None) bool[source]

Check if an image is already conformed or not.

Dimensions: 256x256x256, Voxel size: 1x1x1, LIA orientation, and data type UCHAR.

Parameters:
  • img (nib.analyze.SpatialImage) – Loaded source image

  • conform_vox_size (VoxSizeOption) – which voxel size to conform to. Can either be a float between 0.0 and 1.0 or ‘min’ check, whether the image is conformed to the minimal voxels size, i.e. conforming to smaller, but isotropic voxel sizes for high-res (default: 1.0).

  • eps (float) – allowed deviation from zero for LIA orientation check (default: 1e-06). Small inaccuracies can occur through the inversion operation. Already conformed images are thus sometimes not correctly recognized. The epsilon accounts for these small shifts.

  • check_dtype (bool) – specifies whether the UCHAR dtype condition is checked for; this is not done when the input is a segmentation (default: True).

  • dtype (Optional[Type]) – specifies the intended target dtype (default: uint8 = UCHAR)

  • verbose (bool) – if True, details of which conformance conditions are violated (if any) are displayed (default: True).

  • conform_to_1mm_threshold (Optional[float]) – the threshold above which the image is conformed to 1mm (default: ignore).

Returns:

Whether the image is already conformed.

Return type:

bool

Notes

This function only needs the header (not the data).

neurolit.data.conform.get_conformed_vox_img_size(img: SpatialImage, conform_vox_size, conform_to_1mm_threshold: float | None = None) tuple[float, int][source]

Extract the voxel size and the image size.

This function only needs the header (not the data).

Parameters:
  • img (nib.analyze.SpatialImage) – Loaded source image

  • conform_vox_size (VoxSizeOption) – [MISSING]

  • conform_to_1mm_threshold (Optional[float]) – [MISSING]

Return type:

[MISSING]

neurolit.data.conform.check_affine_in_nifti(img: Nifti1Image | Nifti2Image) bool[source]

Check the affine in nifti Image.

Sets affine with qform, if it exists and differs from sform. If qform does not exist, voxel sizes between header information and information in affine are compared. In case these do not match, the function returns False (otherwise True).

Parameters:
  • img (Union[nib.Nifti1Image, nib.Nifti2Image]) – loaded nifti-image

  • logger (Optional[logging.Logger]) – Logger object or None (default) to log or print an info message to stdout (for None)

Returns:

  • True, if – affine was reset to qform voxel sizes in affine are equivalent to voxel sizes in header

  • False, if – voxel sizes in affine and header differ

Image conforming utilities for standardizing brain MRI images.

Key Functions:

  • conform_image(): Conform image to standard space

  • check_orientation(): Verify image orientation

  • resample_image(): Resample to target resolution

datasets

neurolit.data.datasets.get_test_sample()[source]
neurolit.data.datasets.get_dataset(csv_file, transforms=None, size='standard')[source]

Get a dataset from a CSV file.

Parameters:
  • csv_file (str) – Path to CSV file containing file paths

  • transforms (callable, optional) – Transforms to apply to the data. Defaults to None.

  • size (str, optional) – If “small”, only loads first 3 samples. Defaults to “standard”.

Returns:

Dataset containing the loaded files

Return type:

CacheDataset

neurolit.data.datasets.get_base_dataset(size='big', transforms=None)[source]

Get training and validation datasets from CSV files.

Parameters:
  • size (str, optional) – Size of training dataset to use. Must be one of [“big”, “small”, “standard”]. “big” uses 1268 subjects, “small” uses 120 subjects. Defaults to “big”.

  • transforms (callable, optional) – Transforms to apply to the data. Defaults to None.

Returns:

  • train_dataset (CacheDataset) – Training dataset

  • val_dataset (CacheDataset) – Validation dataset

class neurolit.data.datasets.SlicedDataset(dataset, thickness, ax, slice_per_img=None, transform=None)[source]

Bases: Dataset

A dataset that extracts slices from 3D images with a specified thickness.

This dataset wraps another dataset containing 3D images and provides access to 2D slices with a configurable thickness along a specified axis. Each slice is returned with the slice thickness as the channel dimension.

Parameters:
  • dataset (Dataset) – Base dataset containing 3D images

  • thickness (int) – Thickness of slices to extract (must be odd)

  • ax (int) – Axis along which to extract slices (0=sagittal, 1=coronal, 2=axial)

  • slice_per_img (int, optional) – Number of slices to extract per image. If None, extracts all possible slices. Defaults to None.

  • transform (callable, optional) – Transform to apply to extracted slices. Defaults to None.

dataset

The base dataset

Type:

Dataset

thickness

Slice thickness

Type:

int

ax

Slicing axis

Type:

int

slice_per_img

Number of slices per image

Type:

list

transform

Transform function

Type:

callable

slice_per_img_cumsum

Cumulative sum of slices per image

Type:

ndarray

len

Total number of slices across all images

Type:

int

__init__(dataset, thickness, ax, slice_per_img=None, transform=None)[source]
get_slice_axis(slice_index)[source]

Get slice indices for extracting a slice at the given index.

Parameters:

slice_index (int) – Index of slice to extract

Returns:

Tuple of slice objects for indexing the image array

Return type:

tuple

__getitem__(index)[source]

Get a slice from the dataset at the specified index.

Maps the flat index to an image and slice index, extracts the slice, and ensures the slice thickness becomes the channel dimension.

Parameters:

index (int) – Index of slice to retrieve

Returns:

Dictionary containing the slice under ‘image’ key and any additional metadata from the base dataset

Return type:

dict

PyTorch dataset classes for brain MRI data.

Key Classes:

  • BrainDataset: Dataset for brain MRI images

  • InpaintingDataset: Dataset for inpainting tasks

transforms

class neurolit.data.transforms.Subsampled(*args, **kwargs)[source]

Bases: MapTransform

Transform that subsamples input data and pads to a specified size.

Parameters:
  • keys (list) – List of keys to apply transform to

  • spatial_size (tuple) – Target spatial size for output (h,w,d)

  • size_reduction (int, optional) – Factor by which to subsample. Defaults to 2.

__init__(keys: list[str], spatial_size: tuple[int, int, int], size_reduction: int = 2) None[source]
class neurolit.data.transforms.ScaleAugmentation(*args, **kwargs)[source]

Bases: MapTransform

Transform that randomly scales the voxel size metadata.

Parameters:
  • keys (list) – List of keys to apply transform to

  • scale_range (tuple) – Range of possible scale factors (min, max)

__init__(keys: list[str], scale_range: tuple[float, float]) None[source]
class neurolit.data.transforms.Identityd(*args, **kwargs)[source]

Bases: MapTransform

Transform that returns data unchanged.

Used as a placeholder when no transform is needed.

Data augmentation and transformation utilities.

Key Classes:

  • Compose: Compose multiple transforms

  • RandomFlip: Random horizontal/vertical flip

  • RandomRotation: Random rotation

  • Normalize: Intensity normalization

  • ToTensor: Convert to PyTorch tensor

Examples

Conforming Images

from neurolit.data.conform import conform_image

# Conform a single image
conform_image(
    input_path='raw_T1w.nii.gz',
    output_path='T1w_conformed.nii.gz',
    target_spacing=(1.0, 1.0, 1.0),
    target_size=(256, 256, 256)
)

Using Datasets

from neurolit.data.datasets import BrainDataset
from torch.utils.data import DataLoader

# Create dataset
dataset = BrainDataset(
    data_dir='training_data',
    transform=None
)

# Create data loader
loader = DataLoader(
    dataset,
    batch_size=16,
    shuffle=True,
    num_workers=4
)

# Iterate over batches
for batch in loader:
    images = batch['image']
    # Process batch...

Applying Transforms

from neurolit.data.transforms import Compose, RandomFlip, Normalize, ToTensor

# Define transform pipeline
transform = Compose([
    RandomFlip(p=0.5),
    Normalize(mean=0.5, std=0.5),
    ToTensor()
])

# Apply to dataset
dataset = BrainDataset(
    data_dir='training_data',
    transform=transform
)

Custom Transforms

from neurolit.data.transforms import Compose

class CustomTransform:
    def __call__(self, image):
        # Your custom transformation
        return modified_image

# Use in pipeline
transform = Compose([
    CustomTransform(),
    Normalize(),
    ToTensor()
])

Batch Conforming

from neurolit.data.conform import conform_image
from pathlib import Path

input_dir = Path('raw_data')
output_dir = Path('conformed_data')
output_dir.mkdir(exist_ok=True)

for img_path in input_dir.glob('*.nii.gz'):
    output_path = output_dir / img_path.name
    conform_image(
        input_path=str(img_path),
        output_path=str(output_path)
    )
    print(f"Conformed {img_path.name}")