Window Functions
A window function performs a calculation across a frame of data rows around the current row and finds a result for each row.
PARTITION BY
and ORDER BY
define the frame of data over which the calculation is made.
You can use window functions in the following three categories:
- Aggregate Functions:
COUNT()
,MIN()
,MAX()
,AVG()
, andSUM()
. - Ranking Functions:
ROW_NUMBER()
,RANK()
,DENSE_RANK()
,PERCENT_RANK()
, andNTILE()
. - Analytic Functions:
CUME_DIST()
,LAG()
, andLEAD()
.
The syntax of a window function is as follows:
function_name (expression [, expression...])
OVER (
PARTITION BY expression [, expression...]
ORDER BY expression [ASC | DESC] [, ...]
)
The PARTITION BY
and ORDER BY
clauses are optional.
To use window functions, enable the new SQL engine:
PUT _cluster/settings
{
"persistent": {
"opendistro.sql.engine.new.enabled" : "true"
}
}
Ranking functions
Ranking functions assign an incremental ranking value to each row in the frame.
The increase in the ranking value depends on how the ranking function is implemented. The ranking value is mostly determined by the field values in the ORDER BY
clause. If the PARTITION BY
clause is also present, the ranking function resets its state, while maintaining the incremental ranking value.
If you use the ranking function without the ORDER BY
clause, the result is undetermined. Without the ORDER BY
clause, ROW_NUMBER
assigns a random number to each data row while RANK
and DENSE_RANK
assign a ranking value of 1 to each data row.
RANK
The RANK
function assigns a ranking value to each row of a result set. It assigns the same ranking value for the same field values specified in the ORDER BY
list.
SELECT gender, RANK()
OVER (
ORDER BY gender DESC
)
AS rnk FROM accounts;
gender | rank |
---|---|
M | 1 |
M | 1 |
M | 1 |
F | 4 |
In this case, the next few ranks are skipped depending on the number of ties that occur.
ROW_NUMBER
ROW_NUMBER
assigns a number to each data row of the result set sequentially. The row number increases by 1 regardless of the fields specified in the ORDER BY
list.
SELECT gender, balance, ROW_NUMBER()
OVER (
PARTITION BY gender ORDER BY balance
)
AS num FROM accounts;
gender | balance | num |
---|---|---|
F | 32838 | 1 |
M | 4180 | 1 |
M | 5686 | 2 |
M | 39225 | 3 |
DENSE_RANK
Similar to the RANK
function, DENSE_RANK
also assigns a ranking value to each row but without any gaps between the ranking values.
SELECT gender, DENSE_RANK()
OVER (
ORDER BY gender DESC
)
AS rnk FROM accounts;
gender | rank |
---|---|
M | 1 |
M | 1 |
M | 1 |
F | 2 |