Template Class ChunkDataset

Inheritance Relationships

Base Type

  • public torch::data::datasets::StatefulDataset< ChunkDataset< ChunkReader, ChunkSampler, ExampleSampler >, ChunkReader::BatchType, size_t > (Template Class StatefulDataset)

Class Documentation

template<typename ChunkReader, typename ChunkSampler = samplers::RandomSampler, typename ExampleSampler = samplers::RandomSampler>
class torch::data::datasets::ChunkDataset : public torch::data::datasets::StatefulDataset<ChunkDataset<ChunkReader, ChunkSampler, ExampleSampler>, ChunkReader::BatchType, size_t>

A stateful dataset that support hierarchical sampling and prefetching of entre chunks.

Unlike regular dataset, chunk dataset require two samplers to operate and keeps an internal state. ChunkSampler selects, which chunk to load next, while the ExampleSampler determins the order of Examples that are returned in each get_batch call. The hierarchical sampling approach used here is inspired by this paper

Public Types

using BatchType = torch::optional<typename ChunkReader::BatchType>
using UnwrappedBatchType = typename ChunkReader::BatchType
using BatchRequestType = size_t
using ChunkSamplerType = ChunkSampler
using ExampleSamplerType = ExampleSampler

Public Functions

ChunkDataset(ChunkReader chunk_reader, ChunkSampler chunk_sampler, ExampleSampler example_sampler, ChunkDatasetOptions options, std::function<void(UnwrappedBatchType&)> preprocessing_policy = std::function<void(UnwrappedBatchType&)>())
~ChunkDataset() override
BatchType get_batch(size_t batch_size) override

Default get_batch method of BatchDataset.

This method returns Example batches created from the preloaded chunks. The implemenation is dataset agnostic and does not need overriding in different chunk datasets.

BatchType get_batch()

Helper method around get_batch as batch_size is not strictly necessary.

void reset() override

This will clear any internal state and starts the internal prefetching mechanism for the chunk dataset.

optional<size_t> size() const override

size is not used for chunk dataset.

ChunkSamplerType &chunk_sampler()
void save(serialize::OutputArchive &archive) const override

Saves the statefulDataset’s state to OutputArchive.

void load(serialize::InputArchive &archive) override

Deserializes the statefulDataset’s state from the archive.


Access comprehensive developer documentation for PyTorch

View Docs


Get in-depth tutorials for beginners and advanced developers

View Tutorials


Find development resources and get your questions answered

View Resources