torcharrow.DataFrame¶
torcharrow.DataFrame
is a Python DataFrame library (built on the Apache Arrow columnar memory format)
for loading, joining, aggregating, filtering, and otherwise manipulating data.
torcharrow.DataFrame
also provides a Pandas-like API that naturally fits into the Python ML ecosystem,
and will be familiar to data scientist and ML engineers, so they can use it to express tabular data workflows
in ML, such as feature engineering, training and inference preprocessing.
DataFrame Class and General APIs¶
- class torcharrow.DataFrame¶
- DataFrame.columns¶
The column labels of the DataFrame.
- DataFrame.dtype¶
the data type of a
torcharrow.Column
- DataFrame.device¶
the device on which a
torcharrow.Column
is or will be allocated.
- DataFrame.length¶
Return number of rows including null values
Return the first n rows. |
|
Return the last n rows. |
|
Generate descriptive statistics. |
|
Returns DataFrame without the removed columns. |
|
Returns DataFrame with column names remapped. |
|
(EXPERIMENTAL API) Returns DataFrame with the columns in the prescribed order. |
|
Returns column/dataframe with values appended. |
|
Check whether each element in the dataframe is contained in values. |
Functional API¶
Maps rows according to input correspondence. |
|
Select rows where predicate is True. |
|
Maps rows to list of rows according to input correspondence dtype required if result type != item type. |
|
Like map() but invokes the callable on mini-batches of rows at a time. |
Relational API¶
Analogous to SQL's SELECT. |
|
Analogous to SQL's where (NOT Pandas where) |
|
Sort a column/a dataframe in ascending or descending order. |
Data Cleaning¶
Fill null values using the specified method. |
|
Return a column/frame with rows removed where a row has any or all nulls. |
|
(EXPERIMENTAL API) Remove duplicate values from row/frame but keep the first, last, none |
Conversions¶
Convert self to arrow table |
|
Convert to PyTorch containers (Tensor, PackedList, PackedMap, etc) |
|
Convert to plain Python container (list of scalars or containers) |
|
Convert self to Pandas DataFrame |
Statistics¶
Return the minimum of the non-null values for each column. |
|
Return the maximal of the non-null values for each column. |
|
Return the sum of the non-null values for each column. |
|
Return the mean of the non-null values for each column. |
|
Return the standard deviation of the non-null values for each column. |
|
Return the median of the non-null values for each column. |
|
Return whether all non-null elements are True |
|
Return whether any non-null element is True |
Arithmtic Operations¶
Return a DataFrame with natural logarithm value of each element. |