Shortcuts

CSVParser

class torchdata.datapipes.iter.CSVParser(source_datapipe: IterDataPipe[Tuple[str, IO]], *, skip_lines: int = 0, decode: bool = True, encoding: str = 'utf-8', errors: str = 'ignore', return_path: bool = False, as_tuple: bool = False, **fmtparams)

Accepts a DataPipe consists of tuples of file name and CSV data stream, reads and returns the contents within the CSV files one row at a time (functional name: parse_csv). Each output is a List by default, but it depends on fmtparams.

Parameters:
  • source_datapipe – source DataPipe with tuples of file name and CSV data stream

  • skip_lines – number of lines to skip at the beginning of each file

  • strip_newline – if True, the new line character will be stripped

  • decode – if True, this will decode the contents of the file based on the specified encoding

  • encoding – the character encoding of the files (default=’utf-8’)

  • errors – the error handling scheme used while decoding

  • return_path – if True, each line will return a tuple of path and contents, rather than just the contents

  • as_tuple – if True, each line will return a tuple instead of a list

Example

>>> from torchdata.datapipes.iter import IterableWrapper, FileOpener
>>> import os
>>> def get_name(path_and_stream):
>>>     return os.path.basename(path_and_stream[0]), path_and_stream[1]
>>> datapipe1 = IterableWrapper(["1.csv", "empty.csv", "empty2.csv"])
>>> datapipe2 = FileOpener(datapipe1, mode="b")
>>> datapipe3 = datapipe2.map(get_name)
>>> csv_parser_dp = datapipe3.parse_csv()
>>> list(csv_parser_dp)
[['key', 'item'], ['a', '1'], ['b', '2'], []]

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources