Shortcuts

Template Class ChunkDataset

Inheritance Relationships

Base Type

  • public torch::data::datasets::StatefulDataset< ChunkDataset< ChunkReader, samplers::RandomSampler, samplers::RandomSampler >, ChunkReader::BatchType, size_t > (Template Class StatefulDataset)

Class Documentation

template<typename ChunkReader, typename ChunkSampler = samplers::RandomSampler, typename ExampleSampler = samplers::RandomSampler>
class ChunkDataset : public torch::data::datasets::StatefulDataset<ChunkDataset<ChunkReader, samplers::RandomSampler, samplers::RandomSampler>, ChunkReader::BatchType, size_t>

A stateful dataset that support hierarchical sampling and prefetching of entre chunks.

Unlike regular dataset, chunk dataset require two samplers to operate and keeps an internal state. ChunkSampler selects, which chunk to load next, while the ExampleSampler determins the order of Examples that are returned in each get_batch call. The hierarchical sampling approach used here is inspired by this paper http://martin.zinkevich.org/publications/nips2010.pdf

Public Types

using BatchType = std::optional<typename ChunkReader::BatchType>
using UnwrappedBatchType = typename ChunkReader::BatchType
using BatchRequestType = size_t
using ChunkSamplerType = ChunkSampler
using ExampleSamplerType = ExampleSampler

Public Functions

inline ChunkDataset(ChunkReader chunk_reader, ChunkSampler chunk_sampler, ExampleSampler example_sampler, ChunkDatasetOptions options, std::function<void(UnwrappedBatchType&)> preprocessing_policy = std::function<void(UnwrappedBatchType&)>())
inline ~ChunkDataset() override
inline BatchType get_batch(size_t batch_size) override

Default get_batch method of BatchDataset.

This method returns Example batches created from the preloaded chunks. The implemenation is dataset agnostic and does not need overriding in different chunk datasets.

inline BatchType get_batch()

Helper method around get_batch as batch_size is not strictly necessary.

inline virtual void reset() override

This will clear any internal state and starts the internal prefetching mechanism for the chunk dataset.

inline virtual std::optional<size_t> size() const override

size is not used for chunk dataset.

inline ChunkSamplerType &chunk_sampler()
inline virtual void save(serialize::OutputArchive &archive) const override

Saves the statefulDataset’s state to OutputArchive.

inline virtual void load(serialize::InputArchive &archive) override

Deserializes the statefulDataset’s state from the archive.

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources