torcharrow.functional¶

Velox Core Functions¶

Velox core functions are included in torcharrow.functional.

Here is an example usage of Velox string function lpad:

>>> import torcharrow as ta
>>> from torcharrow import functional
>>> col = ta.column(["abc", "x", "yz"])
# Velox's lpad function: https://facebookincubator.github.io/velox/functions/string.html#lpad
>>> functional.lpad(col, 5, "123")
0  '12abc'
1  '1231x'
2  '123yz'
dtype: String(nullable=True), length: 3, null_count: 0, device: cpu

Here is another example usage of Velox array function array_except:

>>> col1 = ta.column([[1, 2, 3], [1, 2, 3], [1, 2, 2], [1, 2, 2]])
>>> col2 = ta.column([[4, 5, 6], [1, 2], [1, 1, 2], [1, 3, 4]])
# Velox's array_except function: https://facebookincubator.github.io/velox/functions/array.html#array_except
>>> functional.array_except(col1, col2)
0  [1, 2, 3]
1  [3]
2  []
3  [2]
dtype: List(Int64(nullable=True), nullable=True), length: 4, null_count: 0

Text Operations¶

add_tokens

Append or prepend a list of tokens/indices to a column.

Recommendation Operations¶

`bucketize`	Apply bucketization for input feature.
`sigrid_hash`	Apply hashing to an index, or a list of indicies.
`firstx`	Returns the first x values of the head of the input column
`has_id_overlap`	Returns 1.0 if the two input columns overlap, otherwise 0.0
`id_overlap_count`	Returns the number of overlaps between two lists of ids
`get_max_count`	If there are items that overlap between input_ids and matching_ids contribute the maximum number of instances of overlapped ids to the max count.
`get_jaccard_similarity`	Return the jaccard_similarity between input_ids and matching_ids.
`get_cosine_similarity`	Return the cosine between the vector defined by input_ids weighted by input_id_scores and the vector defined by matching_ids weighted by matching_id_scores
`get_score_sum`	Return the sum of all the scores in matching_id_scores that has a corresponding id in matching_ids that is also in input_ids.
`get_score_min`	Return the min among of all the scores in matching_id_scores that has a corresponding id in matching_ids that is also in input_ids.
`get_score_max`	Return the min among of all the scores in matching_id_scores that has a corresponding id in matching_ids that is also in input_ids.

High-level Operations¶

scale_to_0_1

Return the column data scaled to range [0,1].

torcharrow.functional¶

Velox Core Functions¶

Text Operations¶

Recommendation Operations¶

High-level Operations¶

Docs

Tutorials

Resources