title	titleSuffix	description	author	ms.author	ms.service	ms.subservice	ms.custom	ms.topic	ms.date
Details and usage for all mapping data flow functions	Azure Data Factory & Azure Synapse	Learn about details of usage and functionality for all expression functions in mapping data flow.	kromerm	makromer	data-factory	data-flows	synapse	conceptual	03/05/2022

Data transformation expression usage in mapping data flow

[!INCLUDEappliesto-adf-asa-md]

[!INCLUDEdata-flow-preamble]

The following articles provide details about usage of all expressions and functions supported by Azure Data Factory and Azure Synapse Analytics in mapping data flows. For summaries of each type of function supported, reference the following articles:

Aggregate functions
Array functions
Cached lookup functions
Conversion functions
Date and time functions
Expression functions
Map functions
Metafunctions
Window functions

Alphabetical listing of all functions

Following is an alphabetical listing of all functions available in mapping data flows.

A

`abs`

abs(<value1> : number) => number

Absolute value of a number.

abs(-20) -> 20
abs(10) -> 10

`acos`

acos(<value1> : number) => double

Calculates a cosine inverse value.

acos(1) -> 0.0

`add`

add(<value1> : any, <value2> : any) => any

Adds a pair of strings or numbers. Adds a date to a number of days. Adds a duration to a timestamp. Appends one array of similar type to another. Same as the + operator.

add(10, 20) -> 30
10 + 20 -> 30
add('ice', 'cream') -> 'icecream'
'ice' + 'cream' + ' cone' -> 'icecream cone'
add(toDate('2012-12-12'), 3) -> toDate('2012-12-15')
toDate('2012-12-12') + 3 -> toDate('2012-12-15')
[10, 20] + [30, 40] -> [10, 20, 30, 40]
toTimestamp('2019-02-03 05:19:28.871', 'yyyy-MM-dd HH:mm:ss.SSS') + (days(1) + hours(2) - seconds(10)) -> toTimestamp('2019-02-04 07:19:18.871', 'yyyy-MM-dd HH:mm:ss.SSS')

`addDays`

addDays(<date/timestamp> : datetime, <days to add> : integral) => datetime

Add days to a date or timestamp. Same as the + operator for date.

addDays(toDate('2016-08-08'), 1) -> toDate('2016-08-09')

`addMonths`

addMonths(<date/timestamp> : datetime, <months to add> : integral, [<value3> : string]) => datetime

Add months to a date or timestamp. You can optionally pass a timezone.

addMonths(toDate('2016-08-31'), 1) -> toDate('2016-09-30')
addMonths(toTimestamp('2016-09-30 10:10:10'), -1) -> toTimestamp('2016-08-31 10:10:10')

`and`

and(<value1> : boolean, <value2> : boolean) => boolean

Logical AND operator. Same as &&.

and(true, false) -> false
true && false -> false

`approxDistinctCount`

approxDistinctCount(<value1> : any, [ <value2> : double ]) => long

Gets the approximate aggregate count of distinct values for a column. The optional second parameter is to control the estimation error.

approxDistinctCount(ProductID, .05) => long

`array`

array([<value1> : any], ...) => array

Creates an array of items. All items should be of the same type. If no items are specified, an empty string array is the default. Same as a [] creation operator.

array('Seattle', 'Washington')
['Seattle', 'Washington']
['Seattle', 'Washington'][1]
'Washington'

`assertErrorMessages`

assertErrorMessages() => map

Returns a map of all error messages for the row with assert ID as the key.

Examples

assertErrorMessages() => ['assert1': 'This row failed on assert1.', 'assert2': 'This row failed on assert2.']. In this example, at(assertErrorMessages(), 'assert1') would return 'This row failed on assert1.'

`asin`

asin(<value1> : number) => double

Calculates an inverse sine value.

asin(0) -> 0.0

`associate`

reassociate(<value1> : map, <value2> : binaryFunction) => map

Creates a map of key/values. All the keys & values should be of the same type. If no items are specified, it's defaulted to a map of string to string type. Same as a [ -> ] creation operator. Keys and values should alternate with each other.

associate('fruit', 'apple', 'vegetable', 'carrot' )=> ['fruit' -> 'apple', 'vegetable' -> 'carrot']

`at`

at(<value1> : array/map, <value2> : integer/key type) => array

Finds the element at an array index. The index is 1-based. Out of bounds index results in a null value. Finds a value in a map given a key. If the key is not found, it returns null.

at(['apples', 'pears'], 1) => 'apples'
at(['fruit' -> 'apples', 'vegetable' -> 'carrot'], 'fruit') => 'apples'

`atan`

atan(<value1> : number) => double

Calculates an inverse tangent value.

atan(0) -> 0.0

`atan2`

atan2(<value1> : number, <value2> : number) => double

Returns the angle in radians between the positive x-axis of a plane and the point given by the coordinates.

atan2(0, 0) -> 0.0

`avg`

avg(<value1> : number) => number

Gets the average of values of a column.

avg(sales)

`avgIf`

avgIf(<value1> : boolean, <value2> : number) => number

Based on a criteria gets the average of values of a column.

avgIf(region == 'West', sales)

B

`between`

between(<value1> : any, <value2> : any, <value3> : any) => boolean

Checks if the first value is in between two other values inclusively. Numeric, string and datetime values can be compared

between(10, 5, 24)
true
between(currentDate(), currentDate() + 10, currentDate() + 20)
false

`bitwiseAnd`

bitwiseAnd(<value1> : integral, <value2> : integral) => integral

Bitwise And operator across integral types. Same as & operator

bitwiseAnd(0xf4, 0xef)
0xe4
(0xf4 & 0xef)
0xe4

`bitwiseOr`

bitwiseOr(<value1> : integral, <value2> : integral) => integral

Bitwise Or operator across integral types. Same as | operator

bitwiseOr(0xf4, 0xef)
0xff
(0xf4 | 0xef)
0xff

`bitwiseXor`

bitwiseXor(<value1> : any, <value2> : any) => any

Bitwise Or operator across integral types. Same as | operator

bitwiseXor(0xf4, 0xef)
0x1b
(0xf4 ^ 0xef)
0x1b
(true ^ false)
true
(true ^ true)
false

`blake2b`

blake2b(<value1> : integer, <value2> : any, ...) => string

Calculates the Blake2 digest of set of column of varying primitive datatypes given a bit length, which can only be multiples of 8 between 8 & 512. It can be used to calculate a fingerprint for a row

blake2b(256, 'gunchus', 8.2, 'bojjus', true, toDate('2010-4-4'))
'c9521a5080d8da30dffb430c50ce253c345cc4c4effc315dab2162dac974711d'

`blake2bBinary`

blake2bBinary(<value1> : integer, <value2> : any, ...) => binary

Calculates the Blake2 digest of set of column of varying primitive datatypes given a bit length, which can only be multiples of 8 between 8 & 512. It can be used to calculate a fingerprint for a row

blake2bBinary(256, 'gunchus', 8.2, 'bojjus', true, toDate('2010-4-4'))
unHex('c9521a5080d8da30dffb430c50ce253c345cc4c4effc315dab2162dac974711d')

`byItem`

byItem(<parent column> : any, <column name> : string) => any

Find a sub item within a structure or array of structure. If there are multiple matches, the first match is returned. If no match it returns a NULL value. The returned value has to be type converted by one of the type conversion actions(? date, ? string ...). Column names known at design time should be addressed just by their name. Computed inputs aren't supported but you can use parameter substitutions

byItem( byName('customer'), 'orderItems') ? (itemName as string, itemQty as integer)
byItem( byItem( byName('customer'), 'orderItems'), 'itemName') ? string

`byName`

byName(<column name> : string, [<stream name> : string]) => any

Selects a column value by name in the stream. You can pass an optional stream name as the second argument. If there are multiple matches, the first match is returned. If no match it returns a NULL value. The returned value has to be type converted by one of the type conversion functions(TO_DATE, TO_STRING ...). Column names known at design time should be addressed just by their name. Computed inputs aren't supported but you can use parameter substitutions.

toString(byName('parent'))
toLong(byName('income'))
toBoolean(byName('foster'))
toLong(byName($debtCol))
toString(byName('Bogus Column'))
toString(byName('Bogus Column', 'DeriveStream'))

`byNames`

byNames(<column names> : array, [<stream name> : string]) => any

Select an array of columns by name in the stream. You can pass an optional stream name as the second argument. If there are multiple matches, the first match is returned. If there are no matches for a column, the entire output is a NULL value. The returned value requires a type conversion function (toDate, toString, ...). Column names known at design time should be addressed just by their name. Computed inputs aren't supported but you can use parameter substitutions.

toString(byNames(['parent', 'child']))
byNames(['parent']) ? string
toLong(byNames(['income']))
byNames(['income']) ? long
toBoolean(byNames(['foster']))
toLong(byNames($debtCols))
toString(byNames(['a Column']))
toString(byNames(['a Column'], 'DeriveStream'))
byNames(['orderItem']) ? (itemName as string, itemQty as integer)

`byOrigin`

byOrigin(<column name> : string, [<origin stream name> : string]) => any

Selects a column value by name in the origin stream. The second argument is the origin stream name. If there are multiple matches, the first match is returned. If no match it returns a NULL value. The returned value has to be type converted by one of the type conversion functions(TO_DATE, TO_STRING ...). Column names known at design time should be addressed just by their name. Computed inputs aren't supported but you can use parameter substitutions.

toString(byOrigin('ancestor', 'ancestorStream'))

`byOrigins`

byOrigins(<column names> : array, [<origin stream name> : string]) => any

Selects an array of columns by name in the stream. The second argument is the stream where it originated from. If there are multiple matches, the first match is returned. If no match it returns a NULL value. The returned value has to be type converted by one of the type conversion functions(TO_DATE, TO_STRING ...) Column names known at design time should be addressed just by their name. Computed inputs aren't supported but you can use parameter substitutions.

toString(byOrigins(['ancestor1', 'ancestor2'], 'ancestorStream'))

`byPath`

byPath(<value1> : string, [<streamName> : string]) => any

Finds a hierarchical path by name in the stream. You can pass an optional stream name as the second argument. If no such path is found, it returns null. Column names/paths known at design time should be addressed just by their name or dot notation path. Computed inputs aren't supported but you can use parameter substitutions.

byPath('grandpa.parent.child') => column

`byPosition`

byPosition(<position> : integer) => any

Selects a column value by its relative position(1 based) in the stream. If the position is out of bounds, it returns a NULL value. The returned value has to be type converted by one of the type conversion functions(TO_DATE, TO_STRING ...) Computed inputs aren't supported but you can use parameter substitutions.

toString(byPosition(1))
toDecimal(byPosition(2), 10, 2)
toBoolean(byName(4))
toString(byName($colName))
toString(byPosition(1234))

C

`case`

case(<condition> : boolean, <true_expression> : any, <false_expression> : any, ...) => any

Based on alternating conditions applies one value or the other. If the number of inputs are even, the other is defaulted to NULL for last condition.

case(10 + 20 == 30, 'dumbo', 'gumbo') -> 'dumbo'
case(10 + 20 == 25, 'bojjus', 'do' < 'go', 'gunchus') -> 'gunchus'
isNull(case(10 + 20 == 25, 'bojjus', 'do' > 'go', 'gunchus')) -> true
case(10 + 20 == 25, 'bojjus', 'do' > 'go', 'gunchus', 'dumbo') -> 'dumbo'

`cbrt`

cbrt(<value1> : number) => double

Calculates the cube root of a number.

cbrt(8) -> 2.0

`ceil`

ceil(<value1> : number) => number

Returns the smallest integer not smaller than the number.

ceil(-0.1) -> 0

`coalesce`

coalesce(<value1> : any, ...) => any

Returns the first not null value from a set of inputs. All inputs should be of the same type.

coalesce(10, 20) -> 10
coalesce(toString(null), toString(null), 'dumbo', 'bo', 'go') -> 'dumbo'

`collect`

collect(<value1> : any) => array

Collects all values of the expression in the aggregated group into an array. Structures can be collected and transformed to alternate structures during this process. The number of items will be equal to the number of rows in that group and can contain null values. The number of collected items should be small.

collect(salesPerson)
collect(firstName + lastName))
collect(@(name = salesPerson, sales = salesAmount) )

`columnNames`

columnNames(<value1> : string, i><value1> : boolean) => array

Gets the names of all output columns for a stream. You can pass an optional stream name as the first argument. The second argument is also optional, with false as the default. If you set the second argument to true(), ADF will return only columns that are drifted via schema drift.

columnNames()
columnNames('DeriveStream')
columnNames('DeriveStream', true())
columnNames('', true())

`columns`

columns([<stream name> : string]) => any

Gets the values of all output columns for a stream. You can pass an optional stream name as the second argument.

columns()
columns('DeriveStream')

`compare`

compare(<value1> : any, <value2> : any) => integer

Compares two values of the same type. Returns negative integer if value1 < value2, 0 if value1 == value2, positive value if value1 > value2.

(compare(12, 24) < 1) -> true
(compare('dumbo', 'dum') > 0) -> true

`concat`

concat(<this> : string, <that> : string, ...) => string

Concatenates a variable number of strings together. Same as the + operator with strings.

concat('dataflow', 'is', 'awesome') -> 'dataflowisawesome'
'dataflow' + 'is' + 'awesome' -> 'dataflowisawesome'
isNull('sql' + null) -> true

`concatWS`

concatWS(<separator> : string, <this> : string, <that> : string, ...) => string

Concatenates a variable number of strings together with a separator. The first parameter is the separator.

concatWS(' ', 'dataflow', 'is', 'awesome') -> 'dataflow is awesome'
isNull(concatWS(null, 'dataflow', 'is', 'awesome')) -> true
concatWS(' is ', 'dataflow', 'awesome') -> 'dataflow is awesome'

`contains`

contains(<value1> : array, <value2> : unaryfunction) => boolean

Returns true if any element in the provided array evaluates as true in the provided predicate. Contains expects a reference to one element in the predicate function as #item.

contains([1, 2, 3, 4], #item == 3) -> true
contains([1, 2, 3, 4], #item > 5) -> false

`cos`

cos(<value1> : number) => double

Calculates a cosine value.

cos(10) -> -0.8390715290764524

`cosh`

cosh(<value1> : number) => double

Calculates a hyperbolic cosine of a value.

cosh(0) -> 1.0

`count`

count([<value1> : any]) => long

Gets the aggregate count of values. If the optional column(s) is specified, it ignores NULL values in the count.

count(custId)
count(custId, custName)
count()
count(iif(isNull(custId), 1, NULL))

`countAll`

countAll([<value1> : any]) => long

Gets the aggregate count of values including nulls.

countAll(custId)
countAll()

`countDistinct`

countDistinct(<value1> : any, [<value2> : any], ...) => long

Gets the aggregate count of distinct values of a set of columns.

countDistinct(custId, custName)

`countAllDistinct`

countAllDistinct(<value1> : any, [<value2> : any], ...) => long

Gets the aggregate count of distinct values of a set of columns including nulls.

countAllDistinct(custId, custName)

`countIf`

countIf(<value1> : boolean, [<value2> : any]) => long

Based on a criteria gets the aggregate count of values. If the optional column is specified, it ignores NULL values in the count.

countIf(state == 'CA' && commission < 10000, name)

`covariancePopulation`

covariancePopulation(<value1> : number, <value2> : number) => double

Gets the population covariance between two columns.

covariancePopulation(sales, profit)

`covariancePopulationIf`

covariancePopulationIf(<value1> : boolean, <value2> : number, <value3> : number) => double

Based on a criteria, gets the population covariance of two columns.

covariancePopulationIf(region == 'West', sales)

`covarianceSample`

covarianceSample(<value1> : number, <value2> : number) => double

Gets the sample covariance of two columns.

covarianceSample(sales, profit)

`covarianceSampleIf`

covarianceSampleIf(<value1> : boolean, <value2> : number, <value3> : number) => double

Based on a criteria, gets the sample covariance of two columns.

covarianceSampleIf(region == 'West', sales, profit)

`crc32`

crc32(<value1> : any, ...) => long

Calculates the CRC32 hash of set of column of varying primitive datatypes given a bit length, which can only be of values 0(256), 224, 256, 384, 512. It can be used to calculate a fingerprint for a row.

crc32(256, 'gunchus', 8.2, 'bojjus', true, toDate('2010-4-4')) -> 3630253689L

`cumeDist`

cumeDist() => integer

The CumeDist function computes the position of a value relative to all values in the partition. The result is the number of rows preceding or equal to the current row in the ordering of the partition divided by the total number of rows in the window partition. Any tie values in the ordering will evaluate to the same position.

cumeDist()

`currentDate`

currentDate([<value1> : string]) => date

Gets the current date when this job starts to run. You can pass an optional timezone in the form of 'GMT', 'PST', 'UTC', 'America/Cayman'. The local timezone is used as the default. Refer to Java's SimpleDateFormat class for available formats. https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html.

currentDate() == toDate('2250-12-31') -> false
currentDate('PST') == toDate('2250-12-31') -> false
currentDate('America/New_York') == toDate('2250-12-31') -> false

`currentTimestamp`

currentTimestamp() => timestamp

Gets the current timestamp when the job starts to run with local time zone.

currentTimestamp() == toTimestamp('2250-12-31 12:12:12') -> false

`currentUTC`

currentUTC([<value1> : string]) => timestamp

Gets the current timestamp as UTC. If you want your current time to be interpreted in a different timezone than your cluster time zone, you can pass an optional timezone in the form of 'GMT', 'PST', 'UTC', 'America/Cayman'. It's defaulted to the current timezone. Refer to Java's SimpleDateFormat class for available formats. https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html. To convert the UTC time to a different timezone use fromUTC().

currentUTC() == toTimestamp('2050-12-12 19:18:12') -> false
currentUTC() != toTimestamp('2050-12-12 19:18:12') -> true
fromUTC(currentUTC(), 'Asia/Seoul') != toTimestamp('2050-12-12 19:18:12') -> true

D

`dayOfMonth`

dayOfMonth(<value1> : datetime) => integer

Gets the day of the month given a date.

dayOfMonth(toDate('2018-06-08')) -> 8

`dayOfWeek`

dayOfWeek(<value1> : datetime) => integer

Gets the day of the week given a date. 1 - Sunday, 2 - Monday ..., 7 - Saturday.

dayOfWeek(toDate('2018-06-08')) -> 6

`dayOfYear`

dayOfYear(<value1> : datetime) => integer

Gets the day of the year given a date.

dayOfYear(toDate('2016-04-09')) -> 100

`days`

days(<value1> : integer) => long

Duration in milliseconds for number of days.

days(2) -> 172800000L

`degrees`

degrees(<value1> : number) => double

Converts radians to degrees.

degrees(3.141592653589793) -> 180

`denseRank`

denseRank() => integer

Computes the rank of a value in a group of values specified in a window's order by clause. The result is one plus the number of rows preceding or equal to the current row in the ordering of the partition. The values won't produce gaps in the sequence. Dense Rank works even when data isn't sorted and looks for change in values.

denseRank()

`distinct`

distinct(<value1> : array) => array

Returns a distinct set of items from an array.

distinct([10, 20, 30, 10]) => [10, 20, 30]

`divide`

divide(<value1> : any, <value2> : any) => any

Divides pair of numbers. Same as the / operator.

divide(20, 10) -> 2
20 / 10 -> 2

`dropLeft`

dropLeft(<value1> : string, <value2> : integer) => string

Removes as many characters from the left of the string. If the drop requested exceeds the length of the string, an empty string is returned.

dropLeft('bojjus', 2) => 'jjus'
dropLeft('cake', 10) => ''

`dropRight`

dropRight(<value1> : string, <value2> : integer) => string

Removes as many characters from the right of the string. If the drop requested exceeds the length of the string, an empty string is returned.

dropRight('bojjus', 2) => 'bojj'
dropRight('cake', 10) => ''

E

`endsWith`

endsWith(<string> : string, <substring to check> : string) => boolean

Checks if the string ends with the supplied string.

endsWith('dumbo', 'mbo') -> true

`equals`

equals(<value1> : any, <value2> : any) => boolean

Comparison equals operator. Same as == operator.

equals(12, 24) -> false
12 == 24 -> false
'bad' == 'bad' -> true
isNull('good' == toString(null)) -> true
isNull(null == null) -> true

`equalsIgnoreCase`

equalsIgnoreCase(<value1> : string, <value2> : string) => boolean

Comparison equals operator ignoring case. Same as <=> operator.

'abc'<=>'Abc' -> true
equalsIgnoreCase('abc', 'Abc') -> true

`escape`

escape(<string_to_escape> : string, <format> : string) => string

Escapes a string according to a format. Literal values for acceptable format are 'json', 'xml', 'ecmascript', 'html', 'java'.

`except`

except(<value1> : array, <value2> : array) => array

Returns a difference set of one array from another dropping duplicates.

except([10, 20, 30], [20, 40]) => [10, 30]

`expr`

expr(<expr> : string) => any

Results in an expression from a string. This is the same as writing this expression in a non-literal form. This can be used to pass parameters as string representations.

expr('price * discount') => any

F

`factorial`

factorial(<value1> : number) => long

Calculates the factorial of a number.

factorial(5) -> 120

`false`

false() => boolean

Always returns a false value. Use the function syntax(false()) if there's a column named 'false'.

(10 + 20 > 30) -> false
(10 + 20 > 30) -> false()

`filter`

filter(<value1> : array, <value2> : unaryfunction) => array

Filters elements out of the array that don't meet the provided predicate. Filter expects a reference to one element in the predicate function as #item.

filter([1, 2, 3, 4], #item > 2) -> [3, 4]
filter(['a', 'b', 'c', 'd'], #item == 'a' || #item == 'b') -> ['a', 'b']

`find`

find(<value1> : array, <value2> : unaryfunction) => any

Find the first item from an array that matches the condition. It takes a filter function where you can address the item in the array as #item. For deeply nested maps you can refer to the parent maps using the #item_n(#item_1, #item_2...) notation.

find([10, 20, 30], #item > 10) -> 20
find(['azure', 'data', 'factory'], length(#item) > 4) -> 'azure'
find([ @( name = 'Daniel', types = [ @(mood = 'jovial', behavior = 'terrific'), @(mood = 'grumpy', behavior = 'bad') ] ), @( name = 'Mark', types = [ @(mood = 'happy', behavior = 'awesome'), @(mood = 'calm', behavior = 'reclusive') ] ) ], contains(#item.types, #item.mood=='happy') /*Filter out the happy kid*/ )
@( name = 'Mark', types = [ @(mood = 'happy', behavior = 'awesome'), @(mood = 'calm', behavior = 'reclusive') ] )

`first`

first(<value1> : any, [<value2> : boolean]) => any

Gets the first value of a column group. If the second parameter ignoreNulls is omitted, it's assumed false.

first(sales)
first(sales, false)

`flatten`

flatten(<array> : array, <value2> : array ..., <value2> : boolean) => array

Flattens array or arrays into a single array. Arrays of atomic items are returned unaltered. The last argument is optional and is defaulted to false to flatten recursively more than one level deep.

flatten([['bojjus', 'girl'], ['gunchus', 'boy']]) => ['bojjus', 'girl', 'gunchus', 'boy']
flatten([[['bojjus', 'gunchus']]] , true) => ['bojjus', 'gunchus']

`floor`

floor(<value1> : number) => number

Returns the largest integer not greater than the number.

floor(-0.1) -> -1

`fromBase64`

fromBase64(<value1> : string, <encoding type> : string) => string

Decodes the given base64-encoded string. You can optionally pass the encoding type.

fromBase64('Z3VuY2h1cw==') -> 'gunchus'
fromBase64('SGVsbG8gV29ybGQ=', 'Windows-1252') -> 'Hello World'

`fromUTC`

fromUTC(<value1> : timestamp, [<value2> : string]) => timestamp

Converts to the timestamp from UTC. You can optionally pass the timezone in the form of 'GMT', 'PST', 'UTC', 'America/Cayman'. It's defaulted to the current timezone. Refer to Java's SimpleDateFormat class for available formats. https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html.

fromUTC(currentTimestamp()) == toTimestamp('2050-12-12 19:18:12') -> false
fromUTC(currentTimestamp(), 'Asia/Seoul') != toTimestamp('2050-12-12 19:18:12') -> true

G

`greater`

greater(<value1> : any, <value2> : any) => boolean

Comparison greater operator. Same as > operator.

greater(12, 24) -> false
('dumbo' > 'dum') -> true
(toTimestamp('2019-02-05 08:21:34.890', 'yyyy-MM-dd HH:mm:ss.SSS') > toTimestamp('2019-02-03 05:19:28.871', 'yyyy-MM-dd HH:mm:ss.SSS')) -> true

`greaterOrEqual`

greaterOrEqual(<value1> : any, <value2> : any) => boolean

Comparison greater than or equal operator. Same as >= operator.

greaterOrEqual(12, 12) -> true
('dumbo' >= 'dum') -> true

`greatest`

greatest(<value1> : any, ...) => any

Returns the greatest value among the list of values as input skipping null values. Returns null if all inputs are null.

greatest(10, 30, 15, 20) -> 30
greatest(10, toInteger(null), 20) -> 20
greatest(toDate('2010-12-12'), toDate('2011-12-12'), toDate('2000-12-12')) -> toDate('2011-12-12')
greatest(toTimestamp('2019-02-03 05:19:28.871', 'yyyy-MM-dd HH:mm:ss.SSS'), toTimestamp('2019-02-05 08:21:34.890', 'yyyy-MM-dd HH:mm:ss.SSS')) -> toTimestamp('2019-02-05 08:21:34.890', 'yyyy-MM-dd HH:mm:ss.SSS')

H

`hasColumn`

hasColumn(<column name> : string, [<stream name> : string]) => boolean

Checks for a column value by name in the stream. You can pass an optional stream name as the second argument. Column names known at design time should be addressed just by their name. Computed inputs aren't supported but you can use parameter substitutions.

hasColumn('parent')

`hasError`

hasError([<value1> : string]) => boolean

Checks if the assert with provided ID is marked as error.

Examples

hasError('assert1')
hasError('assert2')

`hasPath`

hasPath(<value1> : string, [<streamName> : string]) => boolean

Checks if a certain hierarchical path exists by name in the stream. You can pass an optional stream name as the second argument. Column names/paths known at design time should be addressed just by their name or dot notation path. Computed inputs aren't supported but you can use parameter substitutions.

hasPath('grandpa.parent.child') => boolean

`hex`

hex(<value1>: binary) => string

Returns a hex string representation of a binary value

hex(toBinary([toByte(0x1f), toByte(0xad), toByte(0xbe)])) -> '1fadbe'

`hour`

hour(<value1> : timestamp, [<value2> : string]) => integer

Gets the hour value of a timestamp. You can pass an optional timezone in the form of 'GMT', 'PST', 'UTC', 'America/Cayman'. The local timezone is used as the default. Refer to Java's SimpleDateFormat class for available formats. https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html.

hour(toTimestamp('2009-07-30 12:58:59')) -> 12
hour(toTimestamp('2009-07-30 12:58:59'), 'PST') -> 12

`hours`

hours(<value1> : integer) => long

Duration in milliseconds for number of hours.

hours(2) -> 7200000L

I

`iif`

iif(<condition> : boolean, <true_expression> : any, [<false_expression> : any]) => any

Based on a condition applies one value or the other. If other is unspecified, it's considered NULL. Both the values must be compatible(numeric, string...).

iif(10 + 20 == 30, 'dumbo', 'gumbo') -> 'dumbo'
iif(10 > 30, 'dumbo', 'gumbo') -> 'gumbo'
iif(month(toDate('2018-12-01')) == 12, 345.12, 102.67) -> 345.12

`iifNull`

iifNull(<value1> : any, [<value2> : any], ...) => any

Given two or more inputs, returns the first not null item. This function is equivalent to coalesce.

iifNull(10, 20) -> 10
iifNull(null, 20, 40) -> 20
iifNull('azure', 'data', 'factory') -> 'azure'
iifNull(null, 'data', 'factory') -> 'data'

`in`

in(<array of items> : array, <item to find> : any) => boolean

Checks if an item is in the array.

in([10, 20, 30], 10) -> true
in(['good', 'kid'], 'bad') -> false

`initCap`

initCap(<value1> : string) => string

Converts the first letter of every word to uppercase. Words are identified as separated by whitespace.

initCap('cool iceCREAM') -> 'Cool Icecream'

`instr`

instr(<string> : string, <substring to find> : string) => integer

Finds the position(1 based) of the substring within a string. 0 is returned if not found.

instr('dumbo', 'mbo') -> 3
instr('microsoft', 'o') -> 5
instr('good', 'bad') -> 0

`intersect`

intersect(<value1> : array, <value2> : array) => array

Returns an intersection set of distinct items from 2 arrays.

intersect([10, 20, 30], [20, 40]) => [20]

`isBitSet`

isBitSet (<value1> : array, <value2>:integer ) => boolean

Checks if a bit position is set in this bitset

isBitSet(toBitSet([10, 32, 98]), 10) => true

`isBoolean`

isBoolean(<value1>: string) => boolean

Checks if the string value is a boolean value according to the rules of toBoolean()

isBoolean('true') -> true
isBoolean('no') -> true
isBoolean('microsoft') -> false

`isByte`

isByte(<value1> : string) => boolean

Checks if the string value is a byte value given an optional format according to the rules of toByte()

isByte('123') -> true
isByte('chocolate') -> false

`isDate`

isDate (<value1> : string, [<format>: string]) => boolean

Checks if the input date string is a date using an optional input date format. Refer to Java's SimpleDateFormat for available formats. If the input date format is omitted, default format is yyyy-[M]M-[d]d. Accepted formats are [ yyyy, yyyy-[M]M, yyyy-[M]M-[d]d, yyyy-[M]M-[d]dT* ]

isDate('2012-8-18') -> true
isDate('12/18--234234' -> 'MM/dd/yyyy') -> false

`isDecimal`

isDecimal (<value1> : string) => boolean

Checks if the string value is a decimal value given an optional format according to the rules of toDecimal()

isDecimal('123.45') -> true
isDecimal('12/12/2000') -> false

`isDelete`

isDelete([<value1> : integer]) => boolean

Checks if the row is marked for delete. For transformations taking more than one input stream you can pass the (1-based) index of the stream. The stream index should be either 1 or 2 and the default value is 1.

isDelete()
isDelete(1)

`isDistinct`

isDistinct(<value1> : any , <value1> : any) => boolean

Finds if a column or set of columns is distinct. It doesn't count null as a distinct value

isDistinct(custId, custName) => boolean

`isDouble`

isDouble (<value1> : string, [<format>: string]) => boolean

Checks if the string value is a double value given an optional format according to the rules of toDouble()

isDouble('123') -> true
isDouble('$123.45' -> '$###.00') -> true
isDouble('icecream') -> false

`isError`

isError([<value1> : integer]) => boolean

Checks if the row is marked as error. For transformations taking more than one input stream you can pass the (1-based) index of the stream. The stream index should be either 1 or 2 and the default value is 1.

isError()
isError(1)

`isFloat`

isFloat (<value1> : string, [<format>: string]) => boolean

Checks if the string value is a float value given an optional format according to the rules of toFloat()

isFloat('123') -> true
isFloat('$123.45' -> '$###.00') -> true
isFloat('icecream') -> false

`isIgnore`

isIgnore([<value1> : integer]) => boolean

Checks if the row is marked to be ignored. For transformations taking more than one input stream you can pass the (1-based) index of the stream. The stream index should be either 1 or 2 and the default value is 1.

isIgnore()
isIgnore(1)

`isInsert`

isInsert([<value1> : integer]) => boolean

Checks if the row is marked for insert. For transformations taking more than one input stream you can pass the (1-based) index of the stream. The stream index should be either 1 or 2 and the default value is 1.

isInsert()
isInsert(1)

`isInteger`

isInteger (<value1> : string, [<format>: string]) => boolean

Checks if the string value is an integer value given an optional format according to the rules of toInteger()

isInteger('123') -> true
isInteger('$123' -> '$###') -> true
isInteger('microsoft') -> false

`isLong`

isLong (<value1> : string, [<format>: string]) => boolean

Checks if the string value is a long value given an optional format according to the rules of toLong()

isLong('123') -> true
isLong('$123' -> '$###') -> true
isLong('gunchus') -> false

`isMatch`

isMatch([<value1> : integer]) => boolean

Checks if the row is matched at lookup. For transformations taking more than one input stream you can pass the (1-based) index of the stream. The stream index should be either 1 or 2 and the default value is 1.

isMatch()
isMatch(1)

`isNan`

isNan (<value1> : integral) => boolean

Check if this isn't a number.

isNan(10.2) => false

`isNull`

isNull(<value1> : any) => boolean

Checks if the value is NULL.

isNull(NULL()) -> true
isNull('') -> false

`isShort`

isShort (<value1> : string, [<format>: string]) => boolean

Checks if the string value is a short value given an optional format according to the rules of toShort()

isShort('123') -> true
isShort('$123' -> '$###') -> true
isShort('microsoft') -> false

`isTimestamp`

isTimestamp (<value1> : string, [<format>: string]) => boolean

Checks if the input date string is a timestamp using an optional input timestamp format. Refer to Java's SimpleDateFormat for available formats. If the timestamp is omitted the default pattern yyyy-[M]M-[d]d hh:mm:ss[.f...] is used. You can pass an optional timezone in the form of 'GMT', 'PST', 'UTC', 'America/Cayman'. Timestamp supports up to millisecond accuracy with value of 999 Refer to Java's SimpleDateFormat for available formats.

isTimestamp('2016-12-31 00:12:00') -> true
isTimestamp('2016-12-31T00:12:00' -> 'yyyy-MM-dd\\'T\\'HH:mm:ss' -> 'PST') -> true
isTimestamp('2012-8222.18') -> false

`isUpdate`

isUpdate([<value1> : integer]) => boolean

Checks if the row is marked for update. For transformations taking more than one input stream you can pass the (1-based) index of the stream. The stream index should be either 1 or 2 and the default value is 1.

isUpdate()
isUpdate(1)

`isUpsert`

isUpsert([<value1> : integer]) => boolean

Checks if the row is marked for insert. For transformations taking more than one input stream you can pass the (1-based) index of the stream. The stream index should be either 1 or 2 and the default value is 1.

isUpsert()
isUpsert(1)

J

`jaroWinkler`

jaroWinkler(<value1> : string, <value2> : string) => double

Gets the JaroWinkler distance between two strings.

jaroWinkler('frog', 'frog') => 1.0

K

`keyValues`

keyValues(<value1> : array, <value2> : array) => map

Creates a map of key/values. The first parameter is an array of keys and second is the array of values. Both arrays should have equal length.

keyValues(['bojjus', 'appa'], ['gunchus', 'ammi']) => ['bojjus' -> 'gunchus', 'appa' -> 'ammi']

`kurtosis`

kurtosis(<value1> : number) => double

Gets the kurtosis of a column.

kurtosis(sales)

`kurtosisIf`

kurtosisIf(<value1> : boolean, <value2> : number) => double

Based on a criteria, gets the kurtosis of a column.

kurtosisIf(region == 'West', sales)

L

`lag`

lag(<value> : any, [<number of rows to look before> : number], [<default value> : any]) => any

Gets the value of the first parameter evaluated n rows before the current row. The second parameter is the number of rows to look back and the default value is 1. If there aren't as many rows a value of null is returned unless a default value is specified.

lag(amount, 2)
lag(amount, 2000, 100)

`last`

last(<value1> : any, [<value2> : boolean]) => any

Gets the last value of a column group. If the second parameter ignoreNulls is omitted, it's assumed false.

last(sales)
last(sales, false)

`lastDayOfMonth`

lastDayOfMonth(<value1> : datetime) => date

Gets the last date of the month given a date.

lastDayOfMonth(toDate('2009-01-12')) -> toDate('2009-01-31')

`lead`

lead(<value> : any, [<number of rows to look after> : number], [<default value> : any]) => any

Gets the value of the first parameter evaluated n rows after the current row. The second parameter is the number of rows to look forward and the default value is 1. If there aren't as many rows a value of null is returned unless a default value is specified.

lead(amount, 2)
lead(amount, 2000, 100)

`least`

least(<value1> : any, ...) => any

Comparison lesser than or equal operator. Same as <= operator.

least(10, 30, 15, 20) -> 10
least(toDate('2010-12-12'), toDate('2011-12-12'), toDate('2000-12-12')) -> toDate('2000-12-12')

`left`

left(<string to subset> : string, <number of characters> : integral) => string

Extracts a substring start at index 1 with number of characters. Same as SUBSTRING(str, 1, n).

left('bojjus', 2) -> 'bo'
left('bojjus', 20) -> 'bojjus'

`length`

length(<value1> : string) => integer

Returns the length of the string.

length('dumbo') -> 5

`lesser`

lesser(<value1> : any, <value2> : any) => boolean

Comparison less operator. Same as < operator.

lesser(12, 24) -> true
('abcd' < 'abc') -> false
(toTimestamp('2019-02-03 05:19:28.871', 'yyyy-MM-dd HH:mm:ss.SSS') < toTimestamp('2019-02-05 08:21:34.890', 'yyyy-MM-dd HH:mm:ss.SSS')) -> true

`lesserOrEqual`

lesserOrEqual(<value1> : any, <value2> : any) => boolean

Comparison lesser than or equal operator. Same as <= operator.

lesserOrEqual(12, 12) -> true
('dumbo' <= 'dum') -> false

`levenshtein`

levenshtein(<from string> : string, <to string> : string) => integer

Gets the levenshtein distance between two strings.

levenshtein('boys', 'girls') -> 4

`like`

like(<string> : string, <pattern match> : string) => boolean

The pattern is a string that is matched literally. The exceptions are the following special symbols: _ matches any one character in the input (similar to . in posix regular expressions) % matches zero or more characters in the input (similar to .* in posix regular expressions). The escape character is ''. If an escape character precedes a special symbol or another escape character, the following character is matched literally. It's invalid to escape any other character.

like('icecream', 'ice%') -> true

`locate`

locate(<substring to find> : string, <string> : string, [<from index - 1-based> : integral]) => integer

Finds the position(1 based) of the substring within a string starting a certain position. If the position is omitted, it's considered from the beginning of the string. 0 is returned if not found.

locate('mbo', 'dumbo') -> 3
locate('o', 'microsoft', 6) -> 7
locate('bad', 'good') -> 0

`log`

log(<value1> : number, [<value2> : number]) => double

Calculates log value. An optional base can be supplied else a Euler number if used.

log(100, 10) -> 2

`log10`

log10(<value1> : number) => double

Calculates log value based on 10 base.

log10(100) -> 2

`lookup`

lookup(key, key2, ...) => complex[]

Looks up the first row from the cached sink using the specified keys that match the keys from the cached sink.

cacheSink#lookup(movieId)

`lower`

lower(<value1> : string) => string

Lowercases a string.

lower('GunChus') -> 'gunchus'

`lpad`

lpad(<string to pad> : string, <final padded length> : integral, <padding> : string) => string

Left pads the string by the supplied padding until it is of a certain length. If the string is equal to or greater than the length, then it's trimmed to the length.

lpad('dumbo', 10, '-') -> '-----dumbo'
lpad('dumbo', 4, '-') -> 'dumb'

`ltrim`

ltrim(<string to trim> : string, [<trim characters> : string]) => string

Left trims a string of leading characters. If second parameter is unspecified, it trims whitespace. Else it trims any character specified in the second parameter.

ltrim(' dumbo ') -> 'dumbo '
ltrim('!--!du!mbo!', '-!') -> 'du!mbo!'

M

`map`

map(<value1> : array, <value2> : unaryfunction) => any

Maps each element of the array to a new element using the provided expression. Map expects a reference to one element in the expression function as #item.

map([1, 2, 3, 4], #item + 2) -> [3, 4, 5, 6]
map(['a', 'b', 'c', 'd'], #item + '_processed') -> ['a_processed', 'b_processed', 'c_processed', 'd_processed']

`mapAssociation`

mapAssociation(<value1> : map, <value2> : binaryFunction) => array

Transforms a map by associating the keys to new values. Returns an array. It takes a mapping function where you can address the item as #key and current value as #value.

mapAssociation(['bojjus' -> 'gunchus', 'appa' -> 'ammi'], @(key = #key, value = #value)) => [@(key = 'bojjus', value = 'gunchus'), @(key = 'appa', value = 'ammi')]

`mapIf`

mapIf (<value1> : array, <value2> : binaryfunction, <value3>: binaryFunction) => any

Conditionally maps an array to another array of same or smaller length. The values can be of any datatype including structTypes. It takes a mapping function where you can address the item in the array as #item and current index as #index. For deeply nested maps you can refer to the parent maps using the #item_[n](#item_1, #index_1...) notation.

mapIf([10, 20, 30], #item > 10, #item + 5) -> [25, 35]
mapIf(['icecream', 'cake', 'soda'], length(#item) > 4, upper(#item)) -> ['ICECREAM', 'CAKE']

`mapIndex`

mapIndex(<value1> : array, <value2> : binaryfunction) => any

Maps each element of the array to a new element using the provided expression. Map expects a reference to one element in the expression function as #item and a reference to the element index as #index.

mapIndex([1, 2, 3, 4], #item + 2 + #index) -> [4, 6, 8, 10]

`mapLoop`

mapLoop(<value1> : integer, <value2> : unaryfunction) => any

Loops through from 1 to length to create an array of that length. It takes a mapping function where you can address the index in the array as #index. For deeply nested maps you can refer to the parent maps using the #index_n(#index_1, #index_2...) notation.

mapLoop(3, #index * 10) -> [10, 20, 30]

`max`

max(<value1> : any) => any

Gets the maximum value of a column.

max(sales)

`maxIf`

maxIf(<value1> : boolean, <value2> : any) => any

Based on a criteria, gets the maximum value of a column.

maxIf(region == 'West', sales)

`md5`

md5(<value1> : any, ...) => string

Calculates the MD5 digest of set of column of varying primitive datatypes and returns a 32-character hex string. It can be used to calculate a fingerprint for a row.

md5(5, 'gunchus', 8.2, 'bojjus', true, toDate('2010-4-4')) -> '4ce8a880bd621a1ffad0bca905e1bc5a'

`mean`

mean(<value1> : number) => number

Gets the mean of values of a column. Same as AVG.

mean(sales)

`meanIf`

meanIf(<value1> : boolean, <value2> : number) => number

Based on a criteria gets the mean of values of a column. Same as avgIf.

meanIf(region == 'West', sales)

`millisecond`

millisecond(<value1> : timestamp, [<value2> : string]) => integer

Gets the millisecond value of a date. You can pass an optional timezone in the form of 'GMT', 'PST', 'UTC', 'America/Cayman'. The local timezone is used as the default. Refer to Java's SimpleDateFormat class for available formats. https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html.

millisecond(toTimestamp('2009-07-30 12:58:59.871', 'yyyy-MM-dd HH:mm:ss.SSS')) -> 871

`milliseconds`

milliseconds(<value1> : integer) => long

Duration in milliseconds for number of milliseconds.

milliseconds(2) -> 2L

`min`

min(<value1> : any) => any

Gets the minimum value of a column.

min(sales)

`minIf`

minIf(<value1> : boolean, <value2> : any) => any

Based on a criteria, gets the minimum value of a column.

minIf(region == 'West', sales)

`minus`

minus(<value1> : any, <value2> : any) => any

Subtracts numbers. Subtract number of days from a date. Subtract duration from a timestamp. Subtract two timestamps to get difference in milliseconds. Same as the - operator.

minus(20, 10) -> 10
20 - 10 -> 10
minus(toDate('2012-12-15'), 3) -> toDate('2012-12-12')
toDate('2012-12-15') - 3 -> toDate('2012-12-12')
toTimestamp('2019-02-03 05:19:28.871', 'yyyy-MM-dd HH:mm:ss.SSS') + (days(1) + hours(2) - seconds(10)) -> toTimestamp('2019-02-04 07:19:18.871', 'yyyy-MM-dd HH:mm:ss.SSS')
toTimestamp('2019-02-03 05:21:34.851', 'yyyy-MM-dd HH:mm:ss.SSS') - toTimestamp('2019-02-03 05:21:36.923', 'yyyy-MM-dd HH:mm:ss.SSS') -> -2072

`minute`

minute(<value1> : timestamp, [<value2> : string]) => integer

Gets the minute value of a timestamp. You can pass an optional timezone in the form of 'GMT', 'PST', 'UTC', 'America/Cayman'. The local timezone is used as the default. Refer to Java's SimpleDateFormat class for available formats. https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html.

minute(toTimestamp('2009-07-30 12:58:59')) -> 58
minute(toTimestamp('2009-07-30 12:58:59'), 'PST') -> 58

`minutes`

minutes(<value1> : integer) => long

Duration in milliseconds for number of minutes.

minutes(2) -> 120000L

`mlookup`

mlookup(key, key2, ...) => complex[]

Looks up the all matching rows from the cached sink using the specified keys that match the keys from the cached sink.

cacheSink#mlookup(movieId)

`mod`

mod(<value1> : any, <value2> : any) => any

Modulus of pair of numbers. Same as the % operator.

mod(20, 8) -> 4
20 % 8 -> 4

`month`

month(<value1> : datetime) => integer

Gets the month value of a date or timestamp.

month(toDate('2012-8-8')) -> 8

`monthsBetween`

monthsBetween(<from date/timestamp> : datetime, <to date/timestamp> : datetime, [<roundoff> : boolean], [<time zone> : string]) => double

Gets the number of months between two dates. You can round off the calculation. You can pass an optional timezone in the form of 'GMT', 'PST', 'UTC', 'America/Cayman'. The local timezone is used as the default. Refer to Java's SimpleDateFormat class for available formats. https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html.

monthsBetween(toTimestamp('1997-02-28 10:30:00'), toDate('1996-10-30')) -> 3.94959677

`multiply`

multiply(<value1> : any, <value2> : any) => any

Multiplies pair of numbers. Same as the * operator.

multiply(20, 10) -> 200
20 * 10 -> 200

N

`negate`

negate(<value1> : number) => number

Negates a number. Turns positive numbers to negative and vice versa.

negate(13) -> -13

`nextSequence`

nextSequence() => long

Returns the next unique sequence. The number is consecutive only within a partition and is prefixed by the partitionId.

nextSequence() == 12313112 -> false

`normalize`

normalize(<String to normalize> : string) => string

Normalizes the string value to separate accented unicode characters.

regexReplace(normalize('bo²s'), `\p{M}`, '') -> 'boys'

`not`

not(<value1> : boolean) => boolean

Logical negation operator.

not(true) -> false
not(10 == 20) -> true

`notEquals`

notEquals(<value1> : any, <value2> : any) => boolean

Comparison not equals operator. Same as != operator.

12 != 24 -> true
'bojjus' != 'bo' + 'jjus' -> false

`notNull`

notNull(<value1> : any) => boolean

Checks if the value isn't NULL.

notNull(NULL()) -> false
notNull('') -> true

`nTile`

nTile([<value1> : integer]) => integer

The NTile function divides the rows for each window partition into n buckets ranging from 1 to at most n. Bucket values will differ by at most 1. If the number of rows in the partition doesn't divide evenly into the number of buckets, then the remainder values are distributed one per bucket, starting with the first bucket. The NTile function is useful for the calculation of tertiles, quartiles, deciles, and other common summary statistics. The function calculates two variables during initialization: The size of a regular bucket will have one extra row added to it. Both variables are based on the size of the current partition. During the calculation process the function keeps track of the current row number, the current bucket number, and the row number at which the bucket will change (bucketThreshold). When the current row number reaches bucket threshold, the bucket value is increased by one and the threshold is increased by the bucket size (plus one extra if the current bucket is padded).

nTile()
nTile(numOfBuckets)

`null`

null() => null

Returns a NULL value. Use the function syntax(null()) if there's a column named 'null'. Any operation that uses will result in a NULL.

isNull('dumbo' + null) -> true
isNull(10 * null) -> true
isNull('') -> false
isNull(10 + 20) -> false
isNull(10/0) -> true

O

`or`

or(<value1> : boolean, <value2> : boolean) => boolean

Logical OR operator. Same as ||.

or(true, false) -> true
true || false -> true

`originColumns`

originColumns(<streamName> : string) => any

Gets all output columns for an origin stream where columns were created. Must be enclosed in another function.

array(toString(originColumns('source1')))

`output`

output() => any

Returns the first row of the results of the cache sink

cacheSink#output()

`outputs`

output() => any

Returns the entire output row set of the results of the cache sink

cacheSink#outputs()

P

`partitionId`

partitionId() => integer

Returns the current partition ID the input row is in.

partitionId()

`pMod`

pMod(<value1> : any, <value2> : any) => any

Positive Modulus of pair of numbers.

pmod(-20, 8) -> 4

`power`

power(<value1> : number, <value2> : number) => double

Raises one number to the power of another.

power(10, 2) -> 100

R

`radians`

radians(<value1> : number) => double

Converts degrees to radians

radians(180) => 3.141592653589793

`random`

random(<value1> : integral) => long

Returns a random number given an optional seed within a partition. The seed should be a fixed value and is used with the partitionId to produce random values

random(1) == 1 -> false

`rank`

rank() => integer

Computes the rank of a value in a group of values specified in a window's order by clause. The result is one plus the number of rows preceding or equal to the current row in the ordering of the partition. The values will produce gaps in the sequence. Rank works even when data isn't sorted and looks for change in values.

rank()

`reassociate`

reassociate(<value1> : map, <value2> : binaryFunction) => map

Transforms a map by associating the keys to new values. It takes a mapping function where you can address the item as #key and current value as #value.

reassociate(['fruit' -> 'apple', 'vegetable' -> 'tomato'], substring(#key, 1, 1) + substring(#value, 1, 1)) => ['fruit' -> 'fa', 'vegetable' -> 'vt']

`reduce`

reduce(<value1> : array, <value2> : any, <value3> : binaryfunction, <value4> : unaryfunction) => any

Accumulates elements in an array. Reduce expects a reference to an accumulator and one element in the first expression function as #acc and #item and it expects the resulting value as #result to be used in the second expression function.

toString(reduce(['1', '2', '3', '4'], '0', #acc + #item, #result)) -> '01234'

`regexExtract`

regexExtract(<string> : string, <regex to find> : string, [<match group 1-based index> : integral]) => string

Extract a matching substring for a given regex pattern. The last parameter identifies the match group and is defaulted to 1 if omitted. Use <regex>(back quote) to match a string without escaping.

regexExtract('Cost is between 600 and 800 dollars', '(\\d+) and (\\d+)', 2) -> '800'
regexExtract('Cost is between 600 and 800 dollars', `(\d+) and (\d+)`, 2) -> '800'

`regexMatch`

regexMatch(<string> : string, <regex to match> : string) => boolean

Checks if the string matches the given regex pattern. Use <regex>(back quote) to match a string without escaping.

regexMatch('200.50', '(\\d+).(\\d+)') -> true
regexMatch('200.50', `(\d+).(\d+)`) -> true

`regexReplace`

regexReplace(<string> : string, <regex to find> : string, <substring to replace> : string) => string

Replace all occurrences of a regex pattern with another substring in the given string Use <regex>(back quote) to match a string without escaping.

regexReplace('100 and 200', '(\\d+)', 'bojjus') -> 'bojjus and bojjus'
regexReplace('100 and 200', `(\d+)`, 'gunchus') -> 'gunchus and gunchus'

`regexSplit`

regexSplit(<string to split> : string, <regex expression> : string) => array

Splits a string based on a delimiter based on regex and returns an array of strings.

regexSplit('bojjusAgunchusBdumbo', `[CAB]`) -> ['bojjus', 'gunchus', 'dumbo']
regexSplit('bojjusAgunchusBdumboC', `[CAB]`) -> ['bojjus', 'gunchus', 'dumbo', '']
(regexSplit('bojjusAgunchusBdumboC', `[CAB]`)[1]) -> 'bojjus'
isNull(regexSplit('bojjusAgunchusBdumboC', `[CAB]`)[20]) -> true

`replace`

replace(<string> : string, <substring to find> : string, [<substring to replace> : string]) => string

Replace all occurrences of a substring with another substring in the given string. If the last parameter is omitted, it's default to empty string.

replace('doggie dog', 'dog', 'cat') -> 'catgie cat'
replace('doggie dog', 'dog', '') -> 'gie '
replace('doggie dog', 'dog') -> 'gie '

`reverse`

reverse(<value1> : string) => string

Reverses a string.

reverse('gunchus') -> 'suhcnug'

`right`

right(<string to subset> : string, <number of characters> : integral) => string

Extracts a substring with number of characters from the right. Same as SUBSTRING(str, LENGTH(str) - n, n).

right('bojjus', 2) -> 'us'
right('bojjus', 20) -> 'bojjus'

`rlike`

rlike(<string> : string, <pattern match> : string) => boolean

Checks if the string matches the given regex pattern.

rlike('200.50', `(\d+).(\d+)`) -> true
rlike('bogus', `M[0-9]+.*`) -> false

`round`

round(<number> : number, [<scale to round> : number], [<rounding option> : integral]) => double

Rounds a number given an optional scale and an optional rounding mode. If the scale is omitted, it's defaulted to 0. If the mode is omitted, it's defaulted to ROUND_HALF_UP(5). The values for rounding include 1 - ROUND_UP 2 - ROUND_DOWN 3 - ROUND_CEILING 4 - ROUND_FLOOR 5 - ROUND_HALF_UP 6 - ROUND_HALF_DOWN 7 - ROUND_HALF_EVEN 8 - ROUND_UNNECESSARY.

round(100.123) -> 100.0
round(2.5, 0) -> 3.0
round(5.3999999999999995, 2, 7) -> 5.40

`rowNumber`

rowNumber() => integer

Assigns a sequential row numbering for rows in a window starting with 1.

rowNumber()

`rpad`

rpad(<string to pad> : string, <final padded length> : integral, <padding> : string) => string

Right pads the string by the supplied padding until it is of a certain length. If the string is equal to or greater than the length, then it's trimmed to the length.

rpad('dumbo', 10, '-') -> 'dumbo-----'
rpad('dumbo', 4, '-') -> 'dumb'
rpad('dumbo', 8, '<>') -> 'dumbo<><'

`rtrim`

rtrim(<string to trim> : string, [<trim characters> : string]) => string

Right trims a string of trailing characters. If second parameter is unspecified, it trims whitespace. Else it trims any character specified in the second parameter.

rtrim(' dumbo ') -> ' dumbo'
rtrim('!--!du!mbo!', '-!') -> '!--!du!mbo'

S

`second`

second(<value1> : timestamp, [<value2> : string]) => integer

Gets the second value of a date. You can pass an optional timezone in the form of 'GMT', 'PST', 'UTC', 'America/Cayman'. The local timezone is used as the default. Refer to Java's SimpleDateFormat class for available formats. https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html.

second(toTimestamp('2009-07-30 12:58:59')) -> 59

`seconds`

seconds(<value1> : integer) => long

Duration in milliseconds for number of seconds.

seconds(2) -> 2000L

`setBitSet`

setBitSet (<value1>: array, <value2>:array) => array

Sets bit positions in this bitset

setBitSet(toBitSet([10, 32]), [98]) => [4294968320L, 17179869184L]

`sha1`

sha1(<value1> : any, ...) => string

Calculates the SHA-1 digest of set of column of varying primitive datatypes and returns a 40-character hex string. It can be used to calculate a fingerprint for a row.

sha1(5, 'gunchus', 8.2, 'bojjus', true, toDate('2010-4-4')) -> '46d3b478e8ec4e1f3b453ac3d8e59d5854e282bb'

`sha2`

sha2(<value1> : integer, <value2> : any, ...) => string

Calculates the SHA-2 digest of set of column of varying primitive datatypes given a bit length, which can only be of values 0(256), 224, 256, 384, 512. It can be used to calculate a fingerprint for a row.

sha2(256, 'gunchus', 8.2, 'bojjus', true, toDate('2010-4-4')) -> 'afe8a553b1761c67d76f8c31ceef7f71b66a1ee6f4e6d3b5478bf68b47d06bd3'

`sin`

sin(<value1> : number) => double

Calculates a sine value.

sin(2) -> 0.9092974268256817

`sinh`

sinh(<value1> : number) => double

Calculates a hyperbolic sine value.

sinh(0) -> 0.0

`size`

size(<value1> : any) => integer

Finds the size of an array or map type

size(['element1', 'element2']) -> 2
size([1,2,3]) -> 3

`skewness`

skewness(<value1> : number) => double

Gets the skewness of a column.

skewness(sales)

`skewnessIf`

skewnessIf(<value1> : boolean, <value2> : number) => double

Based on a criteria, gets the skewness of a column.

skewnessIf(region == 'West', sales)

`slice`

slice(<array to slice> : array, <from 1-based index> : integral, [<number of items> : integral]) => array

Extracts a subset of an array from a position. Position is 1 based. If the length is omitted, it's defaulted to end of the string.

slice([10, 20, 30, 40], 1, 2) -> [10, 20]
slice([10, 20, 30, 40], 2) -> [20, 30, 40]
slice([10, 20, 30, 40], 2)[1] -> 20
isNull(slice([10, 20, 30, 40], 2)[0]) -> true
isNull(slice([10, 20, 30, 40], 2)[20]) -> true
slice(['a', 'b', 'c', 'd'], 8) -> []

`sort`

sort(<value1> : array, <value2> : binaryfunction) => array

Sorts the array using the provided predicate function. Sort expects a reference to two consecutive elements in the expression function as #item1 and #item2.

sort([4, 8, 2, 3], compare(#item1, #item2)) -> [2, 3, 4, 8]
sort(['a3', 'b2', 'c1'], iif(right(#item1, 1) >= right(#item2, 1), 1, -1)) -> ['c1', 'b2', 'a3']

`soundex`

soundex(<value1> : string) => string

Gets the soundex code for the string.

soundex('genius') -> 'G520'

`split`

split(<string to split> : string, <split characters> : string) => array

Splits a string based on a delimiter and returns an array of strings.

split('bojjus,guchus,dumbo', ',') -> ['bojjus', 'guchus', 'dumbo']
split('bojjus,guchus,dumbo', '|') -> ['bojjus,guchus,dumbo']
split('bojjus, guchus, dumbo', ', ') -> ['bojjus', 'guchus', 'dumbo']
split('bojjus, guchus, dumbo', ', ')[1] -> 'bojjus'
isNull(split('bojjus, guchus, dumbo', ', ')[0]) -> true
isNull(split('bojjus, guchus, dumbo', ', ')[20]) -> true
split('bojjusguchusdumbo', ',') -> ['bojjusguchusdumbo']

`sqrt`

sqrt(<value1> : number) => double

Calculates the square root of a number.

sqrt(9) -> 3

`startsWith`

startsWith(<string> : string, <substring to check> : string) => boolean

Checks if the string starts with the supplied string.

startsWith('dumbo', 'du') -> true

`stddev`

stddev(<value1> : number) => double

Gets the standard deviation of a column.

stdDev(sales)

`stddevIf`

stddevIf(<value1> : boolean, <value2> : number) => double

Based on a criteria, gets the standard deviation of a column.

stddevIf(region == 'West', sales)

`stddevPopulation`

stddevPopulation(<value1> : number) => double

Gets the population standard deviation of a column.

stddevPopulation(sales)

`stddevPopulationIf`

stddevPopulationIf(<value1> : boolean, <value2> : number) => double

Based on a criteria, gets the population standard deviation of a column.

stddevPopulationIf(region == 'West', sales)

`stddevSample`

stddevSample(<value1> : number) => double

Gets the sample standard deviation of a column.

stddevSample(sales)

`stddevSampleIf`

stddevSampleIf(<value1> : boolean, <value2> : number) => double

Based on a criteria, gets the sample standard deviation of a column.

stddevSampleIf(region == 'West', sales)

`subDays`

subDays(<date/timestamp> : datetime, <days to subtract> : integral) => datetime

Subtract days from a date or timestamp. Same as the - operator for date.

subDays(toDate('2016-08-08'), 1) -> toDate('2016-08-07')

`subMonths`

subMonths(<date/timestamp> : datetime, <months to subtract> : integral) => datetime

Subtract months from a date or timestamp.

subMonths(toDate('2016-09-30'), 1) -> toDate('2016-08-31')

`substring`

substring(<string to subset> : string, <from 1-based index> : integral, [<number of characters> : integral]) => string

Extracts a substring of a certain length from a position. Position is 1 based. If the length is omitted, it's defaulted to end of the string.

substring('Cat in the hat', 5, 2) -> 'in'
substring('Cat in the hat', 5, 100) -> 'in the hat'
substring('Cat in the hat', 5) -> 'in the hat'
substring('Cat in the hat', 100, 100) -> ''

`sum`

sum(<value1> : number) => number

Gets the aggregate sum of a numeric column.

sum(col)

`sumDistinct`

sumDistinct(<value1> : number) => number

Gets the aggregate sum of distinct values of a numeric column.

sumDistinct(col)

`sumDistinctIf`

sumDistinctIf(<value1> : boolean, <value2> : number) => number

Based on criteria gets the aggregate sum of a numeric column. The condition can be based on any column.

sumDistinctIf(state == 'CA' && commission < 10000, sales)
sumDistinctIf(true, sales)

`sumIf`

sumIf(<value1> : boolean, <value2> : number) => number

Based on criteria gets the aggregate sum of a numeric column. The condition can be based on any column.

sumIf(state == 'CA' && commission < 10000, sales)
sumIf(true, sales)

T

`tan`

tan(<value1> : number) => double

Calculates a tangent value.

tan(0) -> 0.0

`tanh`

tanh(<value1> : number) => double

Calculates a hyperbolic tangent value.

tanh(0) -> 0.0

`toBase64`

toBase64(<value1> : string, <encoding type> : string]) => string

Encodes the given string in base64. You can optionally pass the encoding type

toBase64('bojjus') -> 'Ym9qanVz'
toBase64('± 25000, € 5.000,- |', 'Windows-1252') -> 'sSAyNTAwMCwggCA1LjAwMCwtIHw='

`toBinary`

toBinary(<value1> : any) => binary

Converts any numeric/date/timestamp/string to binary representation.

toBinary(3) -> [0x11]

`toBoolean`

toBoolean(<value1> : string) => boolean

Converts a value of ('t', 'true', 'y', 'yes', '1') to true and ('f', 'false', 'n', 'no', '0') to false and NULL for any other value.

toBoolean('true') -> true
toBoolean('n') -> false
isNull(toBoolean('truthy')) -> true

`toByte`

toByte(<value> : any, [<format> : string], [<locale> : string]) => byte

Converts any numeric or string to a byte value. An optional Java decimal format can be used for the conversion.

toByte(123)
123
toByte(0xFF)
-1
toByte('123')
123

`toDate`

toDate(<string> : any, [<date format> : string]) => date

Converts input date string to date using an optional input date format. Refer to Java's SimpleDateFormat class for available formats. If the input date format is omitted, default format is yyyy-[M]M-[d]d. Accepted formats are :[ yyyy, yyyy-[M]M, yyyy-[M]M-[d]d, yyyy-[M]M-[d]dT* ].

toDate('2012-8-18') -> toDate('2012-08-18')
toDate('12/18/2012', 'MM/dd/yyyy') -> toDate('2012-12-18')

`toDecimal`

toDecimal(<value> : any, [<precision> : integral], [<scale> : integral], [<format> : string], [<locale> : string]) => decimal(10,0)

Converts any numeric or string to a decimal value. If precision and scale aren't specified, it's defaulted to (10,2). An optional Java decimal format can be used for the conversion. An optional locale format in the form of BCP47 language like en-US, de, zh-CN.

toDecimal(123.45) -> 123.45
toDecimal('123.45', 8, 4) -> 123.4500
toDecimal('$123.45', 8, 4,'$###.00') -> 123.4500
toDecimal('Ç123,45', 10, 2, 'Ç###,##', 'de') -> 123.45

`toDouble`

toDouble(<value> : any, [<format> : string], [<locale> : string]) => double

Converts any numeric or string to a double value. An optional Java decimal format can be used for the conversion. An optional locale format in the form of BCP47 language like en-US, de, zh-CN.

toDouble(123.45) -> 123.45
toDouble('123.45') -> 123.45
toDouble('$123.45', '$###.00') -> 123.45
toDouble('Ç123,45', 'Ç###,##', 'de') -> 123.45

`toFloat`

toFloat(<value> : any, [<format> : string], [<locale> : string]) => float

Converts any numeric or string to a float value. An optional Java decimal format can be used for the conversion. Truncates any double.

toFloat(123.45) -> 123.45f
toFloat('123.45') -> 123.45f
toFloat('$123.45', '$###.00') -> 123.45f

`toInteger`

toInteger(<value> : any, [<format> : string], [<locale> : string]) => integer

Converts any numeric or string to an integer value. An optional Java decimal format can be used for the conversion. Truncates any long, float, double.

toInteger(123) -> 123
toInteger('123') -> 123
toInteger('$123', '$###') -> 123

`toLong`

toLong(<value> : any, [<format> : string], [<locale> : string]) => long

Converts any numeric or string to a long value. An optional Java decimal format can be used for the conversion. Truncates any float, double.

toLong(123) -> 123
toLong('123') -> 123
toLong('$123', '$###') -> 123

`toShort`

toShort(<value> : any, [<format> : string], [<locale> : string]) => short

Converts any numeric or string to a short value. An optional Java decimal format can be used for the conversion. Truncates any integer, long, float, double.

toShort(123) -> 123
toShort('123') -> 123
toShort('$123', '$###') -> 123

`toString`

toString(<value> : any, [<number format/date format> : string], [<date locale> : string]) => string

Converts a primitive datatype to a string. For numbers and date a format can be specified. If unspecified the system default is picked.Java decimal format is used for numbers. Refer to Java SimpleDateFormat for all possible date formats; the default format is yyyy-MM-dd. For date or timestamp a locale can be optionally specified.

toString(10) -> '10'
toString('engineer') -> 'engineer'
toString(123456.789, '##,###.##') -> '123,456.79'
toString(123.78, '000000.000') -> '000123.780'
toString(12345, '##0.#####E0') -> '12.345E3'
toString(toDate('2018-12-31')) -> '2018-12-31'
isNull(toString(toDate('2018-12-31', 'MM/dd/yy'))) -> true
toString(4 == 20) -> 'false'
toString(toDate('12/31/18', 'MM/dd/yy', 'es-ES'), 'MM/dd/yy', 'de-DE')

`toTimestamp`

toTimestamp(<string> : any, [<timestamp format> : string], [<time zone> : string]) => timestamp

Converts a string to a timestamp given an optional timestamp format. If the timestamp is omitted the default pattern yyyy-[M]M-[d]d hh:mm:ss[.f...] is used. You can pass an optional timezone in the form of 'GMT', 'PST', 'UTC', 'America/Cayman'. Timestamp supports up to millisecond accuracy with value of 999. Refer to Java's SimpleDateFormat class for available formats. https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html.

toTimestamp('2016-12-31 00:12:00') -> toTimestamp('2016-12-31 00:12:00')
toTimestamp('2016-12-31T00:12:00', 'yyyy-MM-dd\'T\'HH:mm:ss', 'PST') -> toTimestamp('2016-12-31 00:12:00')
toTimestamp('12/31/2016T00:12:00', 'MM/dd/yyyy\'T\'HH:mm:ss') -> toTimestamp('2016-12-31 00:12:00')
millisecond(toTimestamp('2019-02-03 05:19:28.871', 'yyyy-MM-dd HH:mm:ss.SSS')) -> 871

`toUTC`

toUTC(<value1> : timestamp, [<value2> : string]) => timestamp

Converts the timestamp to UTC. You can pass an optional timezone in the form of 'GMT', 'PST', 'UTC', 'America/Cayman'. It's defaulted to the current timezone. Refer to Java's SimpleDateFormat class for available formats. https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html.

toUTC(currentTimestamp()) == toTimestamp('2050-12-12 19:18:12') -> false
toUTC(currentTimestamp(), 'Asia/Seoul') != toTimestamp('2050-12-12 19:18:12') -> true

`translate`

translate(<string to translate> : string, <lookup characters> : string, <replace characters> : string) => string

Replace one set of characters by another set of characters in the string. Characters have 1 to 1 replacement.

translate('(bojjus)', '()', '[]') -> '[bojjus]'
translate('(gunchus)', '()', '[') -> '[gunchus'

`trim`

trim(<string to trim> : string, [<trim characters> : string]) => string

Trims a string of leading and trailing characters. If second parameter is unspecified, it trims whitespace. Else it trims any character specified in the second parameter.

trim(' dumbo ') -> 'dumbo'
trim('!--!du!mbo!', '-!') -> 'du!mbo'

`true`

true() => boolean

Always returns a true value. Use the function syntax(true()) if there's a column named 'true'.

(10 + 20 == 30) -> true
(10 + 20 == 30) -> true()

`typeMatch`

typeMatch(<type> : string, <base type> : string) => boolean

Matches the type of the column. Can only be used in pattern expressions.number matches short, integer, long, double, float or decimal, integral matches short, integer, long, fractional matches double, float, decimal and datetime matches date or timestamp type.

typeMatch(type, 'number')
typeMatch('date', 'datetime')

U

`unescape`

unescape(<string_to_escape> : string, <format> : string) => string

Unescapes a string according to a format. Literal values for acceptable format are 'json', 'xml', 'ecmascript', 'html', 'java'.

unescape('{\\\\\"value\\\\\": 10}', 'json')
'{\\\"value\\\": 10}'

`unfold`

unfold (<value1>: array) => any

Unfolds an array into a set of rows and repeats the values for the remaining columns in every row.

unfold(addresses) => any
unfold( @(name = salesPerson, sales = salesAmount) ) => any

`unhex`

unhex(<value1>: string) => binary

Unhexes a binary value from its string representation. This can be used with sha2, md5 to convert from string to binary representation

unhex('1fadbe') -> toBinary([toByte(0x1f), toByte(0xad), toByte(0xbe)])
unhex(md5(5, 'gunchus', 8.2, 'bojjus', true, toDate('2010-4-4'))) -> toBinary([toByte(0x4c),toByte(0xe8),toByte(0xa8),toByte(0x80),toByte(0xbd),toByte(0x62),toByte(0x1a),toByte(0x1f),toByte(0xfa),toByte(0xd0),toByte(0xbc),toByte(0xa9),toByte(0x05),toByte(0xe1),toByte(0xbc),toByte(0x5a)])

`union`

union(<value1>: array, <value2> : array) => array

Returns a union set of distinct items from 2 arrays.

union([10, 20, 30], [20, 40]) => [10, 20, 30, 40]

`upper`

upper(<value1> : string) => string

Uppercases a string.

upper('bojjus') -> 'BOJJUS'

`uuid`

uuid() => string

Returns the generated UUID.

uuid()

V

`variance`

variance(<value1> : number) => double

Gets the variance of a column.

variance(sales)

`varianceIf`

varianceIf(<value1> : boolean, <value2> : number) => double

Based on a criteria, gets the variance of a column.

varianceIf(region == 'West', sales)

`variancePopulation`

variancePopulation(<value1> : number) => double

Gets the population variance of a column.

variancePopulation(sales)

`variancePopulationIf`

variancePopulationIf(<value1> : boolean, <value2> : number) => double

Based on a criteria, gets the population variance of a column.

variancePopulationIf(region == 'West', sales)

`varianceSample`

varianceSample(<value1> : number) => double

Gets the unbiased variance of a column.

varianceSample(sales)

`varianceSampleIf`

varianceSampleIf(<value1> : boolean, <value2> : number) => double

Based on a criteria, gets the unbiased variance of a column.

varianceSampleIf(region == 'West', sales)

W

`weekOfYear`

weekOfYear(<value1> : datetime) => integer

Gets the week of the year given a date.

weekOfYear(toDate('2008-02-20')) -> 8

`weeks`

weeks(<value1> : integer) => long

Duration in milliseconds for number of weeks.

weeks(2) -> 1209600000L

X

`xor`

xor(<value1> : boolean, <value2> : boolean) => boolean

Logical XOR operator. Same as ^ operator.

xor(true, false) -> true
xor(true, true) -> false
true ^ false -> true

Y

`year`

year(<value1> : datetime) => integer

Gets the year value of a date.

year(toDate('2012-8-8')) -> 2012

Next steps

List of all aggregate functions.
List of all array functions.
List of all cached lookup functions.
List of all conversion functions.
List of all date and time functions.
List of all expression functions.
List of all map functions.
List of all metafunctions.
List of all window functions.
Learn how to use Expression Builder.

Files

data-flow-expressions-usage.md

Latest commit

History

data-flow-expressions-usage.md

File metadata and controls

Data transformation expression usage in mapping data flow

Alphabetical listing of all functions

A

abs

acos

add

addDays

addMonths

and

approxDistinctCount

array

assertErrorMessages

asin

associate

at

atan

atan2

avg

avgIf

B

between

bitwiseAnd

bitwiseOr

bitwiseXor

blake2b

blake2bBinary

byItem

byName

byNames

byOrigin

byOrigins

byPath

byPosition

C

case

cbrt

ceil

coalesce

collect

columnNames

columns

compare

concat

concatWS

contains

cos

cosh

count

countAll

countDistinct

countAllDistinct

countIf

covariancePopulation

covariancePopulationIf

covarianceSample

covarianceSampleIf

crc32

cumeDist

currentDate

currentTimestamp

currentUTC

D

dayOfMonth

dayOfWeek

dayOfYear

days

degrees

denseRank

distinct

divide

dropLeft

dropRight

E

endsWith