Shortcuts

LineReader

class torchdata.datapipes.iter.LineReader(source_datapipe: IterDataPipe[Tuple[str, IO]], *, skip_lines: int = 0, strip_newline: bool = True, decode: bool = False, encoding='utf-8', errors: str = 'ignore', return_path: bool = True)

Accepts a DataPipe consisting of tuples of file name and string data stream, and for each line in the stream, yields a tuple of file name and the line (functional name: readlines).

Parameters:
  • source_datapipe – a DataPipe with tuples of file name and string data stream

  • skip_lines – number of lines to skip at the beginning of each file

  • strip_newline – if True, the new line character will be stripped

  • decode – if True, this will decode the contents of the file based on the specified encoding

  • encoding – the character encoding of the files (default=’utf-8’)

  • errors – the error handling scheme used while decoding

  • return_path – if True, each line will return a tuple of path and contents, rather than just the contents

Example

>>> from torchdata.datapipes.iter import IterableWrapper
>>> import io
>>> text1 = "Line1\nLine2"
>>> text2 = "Line2,1\r\nLine2,2\r\nLine2,3"
>>> source_dp = IterableWrapper([("file1", io.StringIO(text1)), ("file2", io.StringIO(text2))])
>>> line_reader_dp = source_dp.readlines()
>>> list(line_reader_dp)
[('file1', 'Line1'), ('file1', 'Line2'), ('file2', 'Line2,1'), ('file2', 'Line2,2'), ('file2', 'Line2,3')]

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources