grain.experimental.multithread_prefetch

grain.experimental.multithread_prefetch#

grain.experimental.multithread_prefetch(ds, num_threads, buffer_size, sequential_slice=False)#

Uses a pool of threads to prefetch elements ahead of time.

This is a thread-based alternative to multiprocess_prefetch intended to be used with free-threaded Python.

It works by sharding the input dataset into num_threads shards, and interleaving them. Each shard is read by a separate thread inside InterleaveIterDataset.

Parameters:
  • ds (IterDataset[T]) – The parent dataset to prefetch from.

  • num_threads (int) – The number of threads to use for prefetching. If 0, prefetching is disabled and this is a no-op.

  • buffer_size (int) – The size of the prefetch buffer for each thread.

  • sequential_slice (bool) – Whether to use sequential slicing.

Returns:

An IterDataset that prefetches elements from ds using multiple threads.

Return type:

IterDataset[T]