LineReader¶

class torchdata.datapipes.iter.LineReader(source_datapipe: IterDataPipe[Tuple[str, IO]], *, skip_lines: int = 0, strip_newline: bool = True, decode: bool = False, encoding='utf-8', errors: str = 'ignore', return_path: bool = True)¶

Accepts a DataPipe consisting of tuples of file name and string data stream, and for each line in the stream, yields a tuple of file name and the line (functional name: readlines).

Parameters:

source_datapipe – a DataPipe with tuples of file name and string data stream
skip_lines – number of lines to skip at the beginning of each file
strip_newline – if True, the new line character will be stripped
decode – if True, this will decode the contents of the file based on the specified encoding
encoding – the character encoding of the files (default=’utf-8’)
errors – the error handling scheme used while decoding
return_path – if True, each line will return a tuple of path and contents, rather than just the contents

Example

>>> from torchdata.datapipes.iter import IterableWrapper
>>> import io
>>> text1 = "Line1\nLine2"
>>> text2 = "Line2,1\r\nLine2,2\r\nLine2,3"
>>> source_dp = IterableWrapper([("file1", io.StringIO(text1)), ("file2", io.StringIO(text2))])
>>> line_reader_dp = source_dp.readlines()
>>> list(line_reader_dp)
[('file1', 'Line1'), ('file1', 'Line2'), ('file2', 'Line2,1'), ('file2', 'Line2,2'), ('file2', 'Line2,3')]

LineReader¶

Docs

Tutorials

Resources