grain.experimental.BestFitPackIterDataset

grain.experimental.BestFitPackIterDataset#

class grain.experimental.BestFitPackIterDataset(parent, *, length_struct, num_packing_bins, seed=0, shuffle_bins=True, shuffle_bins_group_by_feature=None, meta_features=(), pack_alignment_struct=None, padding_struct=None, max_sequences_per_bin=None)#

Implements best-fit packing of sequences.

The best-fit algorithm attempts to pack elements more efficiently than first-fit by placing each new element into the bin that will leave the smallest remaining space (i.e., the “tightest” fit). This can lead to less overall padding compared to the simpler first-fit approach, especially when element sizes vary significantly.

Parameters:

parent (IterDataset)
length_struct (Any)
num_packing_bins (int)
seed (int)
shuffle_bins (bool)
shuffle_bins_group_by_feature (str | None)
meta_features (Sequence[str])
pack_alignment_struct (Any)
padding_struct (Any)
max_sequences_per_bin (int | None)

__init__(parent, *, length_struct, num_packing_bins, seed=0, shuffle_bins=True, shuffle_bins_group_by_feature=None, meta_features=(), pack_alignment_struct=None, padding_struct=None, max_sequences_per_bin=None)#

Creates a dataset that packs sequences using the best-fit strategy.

Parameters:

parent (IterDataset) – Parent dataset with variable length sequences.
length_struct (Any) – Target sequence length for each feature.
num_packing_bins (int) – Number of bins to pack sequences into.
seed (int) – Random seed for shuffling bins.
shuffle_bins (bool) – Whether to shuffle bins after packing.
shuffle_bins_group_by_feature (str | None) – Feature to group by for shuffling.
meta_features (Sequence[str]) – Meta features that do not need packing logic.
pack_alignment_struct (Any) – Optional per-feature alignment values.
padding_struct (Any) – Optional per-feature padding values.
max_sequences_per_bin (int | None) – Optional maximum number of input sequences that can be packed into a bin

Methods

`__init__`(parent, *, length_struct, ...[, ...])	Creates a dataset that packs sequences using the best-fit strategy.
`apply`(transformations)	Returns a dataset with the given transformation(s) applied.
`batch`(batch_size, *[, drop_remainder, batch_fn])	Returns a dataset of elements batched along a new first dimension.
`filter`(transform)	Returns a dataset containing only the elements that match the filter.
`map`(transform)	Returns a dataset containing the elements transformed by `transform`.
`map_with_index`(transform)	Returns a dataset of the elements transformed by the `transform`.
`mp_prefetch`([options, worker_init_fn, ...])	Returns a dataset prefetching elements in multiple processes.
`pipe`(func, /, args, *kwargs)	Syntactic sugar for applying a callable to this dataset.
`prefetch`(multiprocessing_options)	Deprecated, use `mp_prefetch` instead.
`random_map`(transform, *[, seed])	Returns a dataset containing the elements transformed by `transform`.
`seed`(seed)	Returns a dataset that uses the seed for default seed generation.

Attributes

parents

grain.experimental.BestFitPackIterDataset

Contents

grain.experimental.BestFitPackIterDataset#