CSVDictParser¶
- class torchdata.datapipes.iter.CSVDictParser(source_datapipe: IterDataPipe[Tuple[str, IO]], *, skip_lines: int = 0, decode: bool = True, encoding: str = 'utf-8', errors: str = 'ignore', return_path: bool = False, **fmtparams)¶
Accepts a DataPipe consists of tuples of file name and CSV data stream, reads and returns the contents within the CSV files one row at a time (functional name:
parse_csv_as_dict
).Each output is a Dict by default, but it depends on
fmtparams
. The first row of each file, unless skipped, will be used as the header; the contents of the header row will be used as keys for the Dicts generated from the remaining rows.- Parameters:
source_datapipe – source DataPipe with tuples of file name and CSV data stream
skip_lines – number of lines to skip at the beginning of each file
strip_newline – if
True
, the new line character will be strippeddecode – if
True
, this will decode the contents of the file based on the specifiedencoding
encoding – the character encoding of the files (default=’utf-8’)
errors – the error handling scheme used while decoding
return_path – if
True
, each line will return a tuple of path and contents, rather than just the contents
Example
>>> from torchdata.datapipes.iter import FileLister, FileOpener >>> import os >>> def get_name(path_and_stream): >>> return os.path.basename(path_and_stream[0]), path_and_stream[1] >>> datapipe1 = FileLister(".", "*.csv") >>> datapipe2 = FileOpener(datapipe1, mode="b") >>> datapipe3 = datapipe2.map(get_name) >>> csv_dict_parser_dp = datapipe3.parse_csv_as_dict() >>> list(csv_dict_parser_dp) [{'key': 'a', 'item': '1'}, {'key': 'b', 'item': '2'}]