LineReader¶
- class torchdata.datapipes.iter.LineReader(source_datapipe: IterDataPipe[Tuple[str, IO]], *, skip_lines: int = 0, strip_newline: bool = True, decode: bool = False, encoding='utf-8', errors: str = 'ignore', return_path: bool = True)¶
Accepts a DataPipe consisting of tuples of file name and string data stream, and for each line in the stream, yields a tuple of file name and the line (functional name:
readlines
).- Parameters:
source_datapipe – a DataPipe with tuples of file name and string data stream
skip_lines – number of lines to skip at the beginning of each file
strip_newline – if
True
, the new line character will be strippeddecode – if
True
, this will decode the contents of the file based on the specifiedencoding
encoding – the character encoding of the files (default=’utf-8’)
errors – the error handling scheme used while decoding
return_path – if
True
, each line will return a tuple of path and contents, rather than just the contents
Example
>>> from torchdata.datapipes.iter import IterableWrapper >>> import io >>> text1 = "Line1\nLine2" >>> text2 = "Line2,1\r\nLine2,2\r\nLine2,3" >>> source_dp = IterableWrapper([("file1", io.StringIO(text1)), ("file2", io.StringIO(text2))]) >>> line_reader_dp = source_dp.readlines() >>> list(line_reader_dp) [('file1', 'Line1'), ('file1', 'Line2'), ('file2', 'Line2,1'), ('file2', 'Line2,2'), ('file2', 'Line2,3')]