grain.experimental.BestFitPackIterDataset#
- class grain.experimental.BestFitPackIterDataset(parent, *, length_struct, num_packing_bins, seed=0, shuffle_bins=True, shuffle_bins_group_by_feature=None, meta_features=(), pack_alignment_struct=None, padding_struct=None, max_sequences_per_bin=None)#
Implements best-fit packing of sequences.
The best-fit algorithm attempts to pack elements more efficiently than first-fit by placing each new element into the bin that will leave the smallest remaining space (i.e., the “tightest” fit). This can lead to less overall padding compared to the simpler first-fit approach, especially when element sizes vary significantly.
- Parameters:
parent (IterDataset)
length_struct (Any)
num_packing_bins (int)
seed (int)
shuffle_bins (bool)
shuffle_bins_group_by_feature (str | None)
meta_features (Sequence[str])
pack_alignment_struct (Any)
padding_struct (Any)
max_sequences_per_bin (int | None)
- __init__(parent, *, length_struct, num_packing_bins, seed=0, shuffle_bins=True, shuffle_bins_group_by_feature=None, meta_features=(), pack_alignment_struct=None, padding_struct=None, max_sequences_per_bin=None)#
Creates a dataset that packs sequences using the best-fit strategy.
- Parameters:
parent (IterDataset) – Parent dataset with variable length sequences.
length_struct (Any) – Target sequence length for each feature.
num_packing_bins (int) – Number of bins to pack sequences into.
seed (int) – Random seed for shuffling bins.
shuffle_bins (bool) – Whether to shuffle bins after packing.
shuffle_bins_group_by_feature (str | None) – Feature to group by for shuffling.
meta_features (Sequence[str]) – Meta features that do not need packing logic.
pack_alignment_struct (Any) – Optional per-feature alignment values.
padding_struct (Any) – Optional per-feature padding values.
max_sequences_per_bin (int | None) – Optional maximum number of input sequences that can be packed into a bin
Methods
__init__(parent, *, length_struct, ...[, ...])Creates a dataset that packs sequences using the best-fit strategy.
apply(transformations)Returns a dataset with the given transformation(s) applied.
batch(batch_size, *[, drop_remainder, batch_fn])Returns a dataset of elements batched along a new first dimension.
filter(transform)Returns a dataset containing only the elements that match the filter.
map(transform)Returns a dataset containing the elements transformed by
transform.map_with_index(transform)Returns a dataset of the elements transformed by the
transform.mp_prefetch([options, worker_init_fn, ...])Returns a dataset prefetching elements in multiple processes.
pipe(func, /, *args, **kwargs)Syntactic sugar for applying a callable to this dataset.
prefetch(multiprocessing_options)Deprecated, use
mp_prefetchinstead.random_map(transform, *[, seed])Returns a dataset containing the elements transformed by
transform.seed(seed)Returns a dataset that uses the seed for default seed generation.
Attributes
parents