torcharrow.DataFrame
torcharrow.DataFrame
is a Python DataFrame library (built on the Apache Arrow columnar memory format)
for loading, joining, aggregating, filtering, and otherwise manipulating data.
torcharrow.DataFrame
also provides a Pandas-like API that naturally fits into the Python ML ecosystem,
and will be familiar to data scientist and ML engineers, so they can use it to express tabular data workflows
in ML, such as feature engineering, training and inference preprocessing.
DataFrame Class and General APIs
- DataFrame.dtype
the data type of a
torcharrow.Column
- DataFrame.device
the device on which a
torcharrow.Column
is or will be allocated.
Return the first n rows. |
|
Return the last n rows. |
|
Generate descriptive statistics. |
|
Returns DataFrame without the removed columns. |
|
Returns DataFrame with column names remapped. |
|
(EXPERIMENTAL API) Returns DataFrame with the columns in the prescribed order. |
|
Returns column/dataframe with values appended. |
|
Check whether each element in the dataframe is contained in values. |
Functional API
Maps rows according to input correspondence. |
|
Select rows where predicate is True. |
|
Maps rows to list of rows according to input correspondence dtype required if result type != item type. |
|
Like map() but invokes the callable on mini-batches of rows at a time. |
Relational API
Analogous to SQL's SELECT. |
|
Analogous to SQL's where (NOT Pandas where) |
|
Sort a column/a dataframe in ascending or descending order. |
Data Cleaning
Fill null values using the specified method. |
|
Return a column/frame with rows removed where a row has any or all nulls. |
|
(EXPERIMENTAL API) Remove duplicate values from row/frame but keep the first, last, none |
Conversions
Convert self to arrow table |
|
Convert to PyTorch containers (Tensor, PackedList, PackedMap, etc) |
|
Convert to plain Python container (list of scalars or containers) |
|
Convert self to Pandas DataFrame |
Statistics
Return the minimum of the non-null values for each column. |
|
Return the maximal of the non-null values for each column. |
|
Return the sum of the non-null values for each column. |
|
Return the mean of the non-null values for each column. |
|
Return the standard deviation of the non-null values for each column. |
|
Return the median of the non-null values for each column. |
|
Return whether all non-null elements are True |
|
Return whether any non-null element is True |
Arithmtic Operations
Return a DataFrame with natural logarithm value of each element. |