Linear regression functions

With the linear regression functions, you can calculate the relationship between two variables using the least squares regression method, which is a standard approach for calculating a linear relationship. The least squares regression line, also called the line of best fit, can be visualized as a straight line drawn through a set of data points that represents the relationship between them.

For the regression line calculation, you need two parameters, slope and intercept. The slope and intercept show how two variables are related according to an average rate of change. This works well with scatter plots because scatter plots show two variables. The regression line is plotted on a scatter plot of the same data to show the general data trend.

The purpose of the linear regression function is to find the relationship between the explanatory variable, X and the dependent variable, Y. It predicts the value of Y when the value of X is known.

Where to use linear regression

  • The primary purpose of the linear regression function is to facilitate the plotting of regression lines in Correlation widget.

  • You can also use the linear regression function within custom SQL statements to calculate trend or regression values. This is useful for making predictions that extend beyond the provided data. For example, you have order value plotted against quantity and you want to know the value of an order size that is 10 times the maximum order size within your dataset.

Linear regression formula and functions

A linear regression line has an equation of the form: Y = a + bX

X is the explanatory variable and Y is the dependent variable.

a = intercept (the value of Y when X = 0) and b = slope.

Following are the two functions implemented to support the linear regression:

  • regr_slope(Y, X) function: slope of the least-squares-fit linear equation determined by the (X, Y) pairs

  • regr_intercept(Y, X) function: Y-intercept of the least-squares-fit linear equation determined by the (X, Y) pairs

The slope indicates the steepness of the regression line, whereas the intercept indicates its intersection with the Y axis. The Y-intercept is the point at which the graph intersects the Y-axis. Based on the slope of each X-axis unit, the subsequent point on the Y-axis is determined.

Syntax

REGR_INTERCEPT(Y attribute,X attribute),
REGR_SLOPE(Y attribute,X attribute)
Parameter Description
Y attribute

Y is the dependent variable and it is plotted along the Y axis of a scatterplot.

X attribute

X is the explanatory variable and it is plotted along the X axis of a scatterplot.

Examples

Example 1

The following query correlates the order amount with order quantity and return values for intercept and slope.

SELECT 
    REGR_INTERCEPT("Order Amount" , "Order Quantity") as "Intercept", 
    REGR_SLOPE("Order Amount" , "Order Quantity") as "Slope"
FROM THIS_PROCESS      

Result:

The query returns the following result.

Following is the scatter plot generated using Correlation widget in a process. It shows the scatter plot and regression line for order amount plotted against order quantity.

Example 2

The following query calculates the order value along the regression line relative to an order size that is not included in your data set.

SELECT 300 as "Quantity","intercept" + 300 * "slope" as "Amount"
FROM 
(
  SELECT 
    REGR_INTERCEPT("Order Amount" , "Order Quantity") as "intercept", 
    REGR_SLOPE("Order Amount" , "Order Quantity") as "slope"
  FROM THIS_PROCESS
) as sub      

Result:

The query returns the following result.

Example 3

The following query calculates the order value along the regression line relative to an order size that is not included in your data set and grouped by City.

SELECT "City",
  300 as "Quantity",
  "intercept" + 300 * "slope" as "Amount"
FROM
(
  SELECT "City",
    REGR_INTERCEPT("Order Amount" , "Order Quantity") as "intercept", 
    REGR_SLOPE("Order Amount" , "Order Quantity") as "slope"
  FROM THIS_PROCESS
  GROUP BY 1
) as sub	
	

Result:

The query returns the following result.