Template Class DataLoaderBase
Defined in File base.h
Page Contents
Class Documentation
-
template<typename Dataset, typename Batch, typename BatchRequest>
class DataLoaderBase -
Public Functions
-
inline DataLoaderBase(DataLoaderOptions options, std::unique_ptr<Dataset> main_thread_dataset = nullptr)
Constructs a new DataLoader from a
dataset
to sample from,options
to configure the DataLoader with, and asampler
that specifies the sampling strategy.
-
DataLoaderBase(const DataLoaderBase&) = delete
-
DataLoaderBase(DataLoaderBase&&) = delete
-
DataLoaderBase &operator=(const DataLoaderBase&) = delete
-
DataLoaderBase &operator=(DataLoaderBase&&) = delete
-
inline Iterator<Batch> begin()
Returns an iterator into the DataLoader.
The lifetime of the iterator is bound to the DataLoader. In C++ standards language, the category of the iterator is
OutputIterator
. See https://en.cppreference.com/w/cpp/named_req/OutputIterator for what this means. In short: you may increment the iterator and dereference it, but cannot go back, or step forward more than one position at a time. When the DataLoader is exhausted, it will compare equal with the special “sentinel” iterator returned byDataLoader::end()
. Most of the time, you should only use range-for loops to loop over the DataLoader, but standard algorithms likestd::copy(dataloader.begin(), dataloader.end(), output_iterator)
are supported too.
-
inline Iterator<Batch> end()
Returns a special “sentinel” iterator that compares equal with a non-sentinel iterator once the DataLoader is exhausted.
-
inline void join()
Joins the DataLoader’s worker threads and drains internal queues.
This function may only be invoked from the main thread (in which the DataLoader lives).
-
inline const FullDataLoaderOptions &options() const noexcept
Returns the options with which the DataLoader was configured.
Protected Functions
-
virtual std::optional<BatchRequestType> get_batch_request() = 0
Subclass hook for getting the next batch request.
The stateless case will ask the sampler for a new batch request (e.g. a vector of indices), while the stateful one will simply return the batch size.
-
inline virtual void reset()
Resets the internal state of the DataLoader, optionally pre-fetching new jobs.
-
inline void prefetch(size_t requested_jobs)
Schedules
requested_jobs
many new batches to be fetched.The actual number of jobs scheduled may be less if the DataLoader exhausts.
-
inline std::optional<BatchType> next()
Returns the next batch of data, or an empty
optional
if the DataLoader is exhausted.This operation will block until a batch is available if one is still expected.
-
inline void worker_thread(Dataset &dataset)
The function that worker threads run.
-
template<typename T>
inline void push_job(T value) Convenience method that calls
shuttle_.push_job()
with the next sequence number.
-
inline std::optional<Result> pop_result()
Convenience method that gets the next result from the sequencer.
Protected Attributes
-
const FullDataLoaderOptions options_
The options the DataLoader was configured with.
-
std::unique_ptr<Dataset> main_thread_dataset_
The dataset for the main thread, only has a value if the number of worker threads was configured as zero, meaning the main thread has to do all the work (synchronously).
NOTE: Really want this to be on the heap when empty, therefore
unique_ptr
and notoptional
.
-
size_t sequence_number_ = 0
The sequence number for the next batch to be retrieved from the dataset.
-
std::vector<std::thread> workers_
The worker threads, running the
worker_thread()
method.
-
detail::DataShuttle<Job, Result> shuttle_
The
DataShuttle
which takes care of the life cycle of a job.
-
struct Job : public torch::data::DataLoaderBase<Dataset, Batch, BatchRequest>::Sequenced
A
Job
is either aBatchRequest
(new indices to fetch data at) or aQuitWorker
object, to indicate the worker should shut down.
-
struct Result : public torch::data::DataLoaderBase<Dataset, Batch, BatchRequest>::Sequenced
The finished result of a job.
Public Functions
-
inline Result(std::optional<Batch> &&b, size_t sqn)
-
inline Result(std::optional<Batch> &&b, size_t sqn)
-
struct Sequenced
Simple mix-in to give something a sequence number.
Subclassed by torch::data::DataLoaderBase< Dataset, Batch, BatchRequest >::Job, torch::data::DataLoaderBase< Dataset, Batch, BatchRequest >::Result
-
inline DataLoaderBase(DataLoaderOptions options, std::unique_ptr<Dataset> main_thread_dataset = nullptr)