Window functions

This section explains the window functions that you can use in your SIGNAL queries.

Window functions are aggregate-like functions that you can perform over a selection of the rows which are selected by a query. You can use window functions to perform calculations on a set of table rows that are related to the current row in a table. In window functions, each row remains separate to the query output. Window functions have access to all the rows that are part of the current row's group which is determined by the PARTITION BY list of the window function.

Below is the general form of a window function:

{aggregation function} OVER ([PARTITION BY {partition expressions}] [ORDER BY {order expressions}] [[ROWS | RANGE] BETWEEN {window frame}])
  • The {aggregation function} is the function which groups the values of multiple rows to create a single summary value.

  • The ORDER BY clause sorts the data set in the PARTITION BY clause in ascending or descending order.

  • The PARTITION BY clause is a subclause of the OVER clause and groups a data set into partitions.

  • ROWS|RANGE modes define the scope of the {window frame}.

  • The {window frame} is the set of rows related to the current row where the window function is used to calculate the values of the defined window. You can define the window frame by using the ROW and RANGE modes.

Limitations

Below is a list of current limitations that apply while using window functions in SIGNAL queries:

  • Window functions can only be used on flat data, not on event level or nested data.

  • You can't create an empty window frame when using window functions. The following are examples of empty window frames:

    • ROWS BETWEEN 1 PRECEDING AND 2 PRECEDING

    • ROWS BETWEEN 1 FOLLOWING AND 1 PRECEDING

    • ROWS BETWEEN 1 FOLLOWING AND CURRENT ROW

Considerations

Below is a list of considerations when using window functions in SIGNAL queries:

  • Window functions are evaluated after aggregate functions in a SELECT clause.

  • Window functions can only occur in SELECT clauses and on flat data. Nested data structures aren’t supported. If you have a nested data structure like an event log, you can use window functions on case level. If you want to use window functions on event level you can use the FLATTEN operator to create a flat table and then use PARTITION BY case_id.

  • In ORDER BY clauses, only use expressions such as column names and not numeric values for indices.

Functions

In window functions you can use aggregate and non-aggregate functions.

Aggregate functions

You can use the following aggregate functions:

  • SUM

  • COUNT

  • MIN

  • MAX

  • AVG

  • FIRST

  • LAST

  • BOOL_OR

  • BOOL_AND

Non-aggregate functions

You can use the following non-aggregate functions:

  • LAG

  • LEAD

  • ROW_NUMBER

ORDER BY

The ORDER BY clause sorts the data set in the PARTITION BY clause in ascending or descending order.

Syntax:

SELECT column name, SUM(column name) OVER (ORDER BY column name ASC)
Parameter Description
column name The column name in your table you want to include in the function.

Example:

Consider the following table:

City Value
Berlin 1000
Paris 3000
London 2500
Rome 1500
SELECT city, SUM(value) OVER (ORDER BY value ASC)

This query returns the cumulative sum:

City Value
Berlin 1000
Rome 2500
London 5000
Paris 8000

PARTITION BY

The PARTITION BY clause defines the group of rows which the window function operates with. You can add multiple expressions after the PARTITION BY. For example:

  • PARTITION BY attr, ...

  • PARTITION BY attr, … ORDER BY attr, ...

Syntax:

SELECT column name, SUM(column name) OVER (PARTITION BY column name)
Parameter Description
column name The column name in your table you want to include in the function.
SELECT column name, SUM(column name) OVER (PARTITION BY column name ORDER BY column name)
Parameter Description
column name The column name in your table you want to include in the function.

Example 1:

Consider the following table:

City Value
Berlin 1000
Berlin 1800
Paris 3000
London 2500
Paris 1500
London 1200
Berlin 1300
SELECT city, SUM(value) OVER (PARTITION BY city)

A grouped sum is returned:

City Value
Berlin 4100
Berlin 4100
Berlin 4100
Paris 4500
Paris 4500
London 3700
London 3700

Example 2:

Consider the following table:

City Value
Berlin 1000
Berlin 1800
Paris 3000
London 2500
Paris 1500
London 1200
Berlin 1300

Now an ORDER BY is added to the function:

SELECT city, SUM(value) OVER (PARTITION BY city ORDER BY value)

The cumulative sums per city are returned:

City Value
Berlin 1000
Berlin 2300
Berlin 4100
London 1200
London 3700
Paris 1500
Paris 4500

ROWS

With the ROWS mode, you can define the start and end of the window frame in terms of rows relevant to the current row. You can define the window frame with the ROWS mode in following ways:

  • ROWS BETWEEN 10 PRECEDING AND CURRENT ROW

  • ROWS BETWEEN 10 PRECEDING AND 10 FOLLOWING

  • ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

The UNBOUNDED keyword refers to the first or last row in a column or partition.

  • UNBOUNDED PRECEDING is the first row

  • UNBOUNDED FOLLOWING is the last row

If there is no ORDER BY clause, the returned results are undefined and the order in which the rows are processed isn't uniform.

An offset of 0 refers to the current row.

Syntax:

SELECT column name, SUM(column name)
OVER (ORDER BY column name ASC ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)
Parameter Description
column name The column name in your table you want to include in the function.

Example:

Consider the following table:

City Value
Berlin 1000
Paris 3000
London 2500
Paris 1500
SELECT city, SUM(value) OVER (ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)

The following is returned:

City Value
Berlin 4000
Paris 6500
London 7000
Paris 4000

RANGE

With the RANGE mode, you can define where the window frame starts and ends in window functions. When using the RANGE keyword the ORDER BY clause is required and you must specify one column name by which the window frame is ordered. Using the RANGE keyword is useful when working time series and when there are many gaps or duplicate data in your tables.

With the RANGE keyword, you can define the window frame by the maximum difference between the value of the column in the current row and its value in the preceding or following rows of the scope.

The RANGE mode only works with data types that are of the type interval, which are numbers and timestamps. Choice and Boolean data types aren't supported.

You can define the window frame with the RANGE mode in the following ways:

  • ORDER BY attr RANGE BETWEEN 10 PRECEDING AND CURRENT ROW

  • ORDER BY attr RANGE BETWEEN 1.0 PRECEDING AND 1.0 FOLLOWING

  • ORDER BY attr RANGE BETWEEN DURATION "1day" PRECEDING AND DURATION "1hour" FOLLOWING

  • ORDER BY attr RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

Syntax:

SELECT column name , SUM(column name)
OVER (ORDER BY column name RANGE BETWEEN interval type PRECEDING AND interval type FOLLOWING)
Parameter Description
column name The column name in your table you want to include in the function.
interval type Data type which are of type interval such as numbers (1.0) and timestamps ('10days').

Example:

Consider the following table:

City Value
Berlin 1000
Paris 3000
London 3000
Paris 2000
SELECT city, SUM(value) OVER (ORDER BY value RANGE BETWEEN 1000.0 PRECEDING AND 1000.0 FOLLOWING)

The following is returned:

City Value
Berlin 3000.0
Paris 9000.0
London 8000.0
Paris 8000.0

Window frame

The window frame is the set of rows related to the current row where the window function is used to calculate the values of the defined window. You can define the window frame by using the ROW and RANGE modes.

The window frame contains a frame_start and a frame_end. These frames are the start and end of your window frame.

In the frame_start you can add the following keywords:

  • CURRENT ROW

  • UNBOUND PRECEDING

  • offset PRECEDING

In the frame_end you can add the following keywords:

  • CURRENT ROW

  • UNBOUNDED FOLLOWING

  • offset FOLLOWING

The frame_start of CURRENT ROW means the frame starts with the current row's first peer row (a row that the window's ORDER BY clause sorts as equivalent to the current row). The frame_end of CURRENT ROW means the frame ends with the current row's last peer row.

The UNBOUNDED keyword is the first or last row of the peer group with the partition.

The offset expression's data type can vary depending on the data type of the ordering column. If you use numeric ordering columns, the type is the same as the ordering column.

If the ordering column is of the type timestamp ('10 days'), you can have the following RANGE BETWEEN '1 day' PRECEDING AND '10 days' FOLLOWING. The offset expression must be a non-null and non-negative value.