Shortcuts

torcharrow.DataFrame

torcharrow.DataFrame is a Python DataFrame library (built on the Apache Arrow columnar memory format) for loading, joining, aggregating, filtering, and otherwise manipulating data. torcharrow.DataFrame also provides a Pandas-like API that naturally fits into the Python ML ecosystem, and will be familiar to data scientist and ML engineers, so they can use it to express tabular data workflows in ML, such as feature engineering, training and inference preprocessing.

DataFrame Class and General APIs

class torcharrow.DataFrame
DataFrame.columns

The column labels of the DataFrame.

DataFrame.dtype

the data type of a torcharrow.Column

DataFrame.device

the device on which a torcharrow.Column is or will be allocated.

DataFrame.length

Return number of rows including null values

DataFrame.head

Return the first n rows.

DataFrame.tail

Return the last n rows.

DataFrame.describe

Generate descriptive statistics.

DataFrame.drop

Returns DataFrame without the removed columns.

DataFrame.rename

Returns DataFrame with column names remapped.

DataFrame.reorder

(EXPERIMENTAL API) Returns DataFrame with the columns in the prescribed order.

DataFrame.append

Returns column/dataframe with values appended.

DataFrame.isin

Check whether each element in the dataframe is contained in values.

Functional API

DataFrame.map

Maps rows according to input correspondence.

DataFrame.filter

Select rows where predicate is True.

DataFrame.flatmap

Maps rows to list of rows according to input correspondence dtype required if result type != item type.

DataFrame.transform

Like map() but invokes the callable on mini-batches of rows at a time.

Relational API

DataFrame.select

Analogous to SQL's SELECT.

DataFrame.where

Analogous to SQL's where (NOT Pandas where)

DataFrame.sort

Sort a column/a dataframe in ascending or descending order.

Data Cleaning

DataFrame.fill_null

Fill null values using the specified method.

DataFrame.drop_null

Return a column/frame with rows removed where a row has any or all nulls.

DataFrame.drop_duplicates

(EXPERIMENTAL API) Remove duplicate values from row/frame but keep the first, last, none

Conversions

DataFrame.to_arrow

Convert self to arrow table

DataFrame.to_tensor

Convert to PyTorch containers (Tensor, PackedList, PackedMap, etc)

DataFrame.to_pylist

Convert to plain Python container (list of scalars or containers)

DataFrame.to_pandas

Convert self to Pandas DataFrame

Statistics

DataFrame.min

Return the minimum of the non-null values for each column.

DataFrame.max

Return the maximal of the non-null values for each column.

DataFrame.sum

Return the sum of the non-null values for each column.

DataFrame.mean

Return the mean of the non-null values for each column.

DataFrame.std

Return the standard deviation of the non-null values for each column.

DataFrame.median

Return the median of the non-null values for each column.

DataFrame.all

Return whether all non-null elements are True

DataFrame.any

Return whether any non-null element is True

Arithmtic Operations

DataFrame.log

Return a DataFrame with natural logarithm value of each element.

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources