Lead, Lag and others are SQL analytic functions. Just another set of tools to help with querying different reporting tables. The syntax is similar to other window functions:
<analytic function> OVER (...)
The difference is that while window functions aggregate a relative window of rows from the home row. Analytics functions lookup a specific row from the home row.
LEAD & LAG
These two return the preceding (LAG) or following (LEAD) row value that you specify.
SELECT
date,
metric,
LEAD(metric) OVER(ORDER BY date)
FROM table_1;
The line LEAD(col_1) OVER(ORDER BY col_2)
column_1 | column_2 | lead |
---|---|---|
date_1 | 1 | 2 |
date_2 | 2 | 3 |
date_3 | 3 | null |
For example LAG can be used for calculating difference to previous month:
select
date,
metric,
metric - LAG(metric) OVER(ORDER BY date)
as difference_to_prev
FROM
table_1;
date | metric | difference_to_prev |
---|---|---|
date_1 | 1 | NULL |
date_2 | 2 | 1 |
date_3 | 3 | 1 |
LEAD(x,y,z) and LAG(x,y,z)
Both LEAD() and LAG() can also take arguments. These work like this:
LEAD(x,y,z)
x = the column you want to return
y = the offset, or how many rows forward or backward you want to go to return a value
z = value to return if nothing is found. For example when just using LEAD(x) the query would return NULL for the last row in the table. Because there is no rows after the last row.
Note that, for value z, our replacement value must be of the same data type as the column. So assuming the revenue column is a integer type, if we want to fix the last row for LEAD(revenue) we must write LEAD(revenue, 1, 0) and NOT LEAD(revenue,1,’nothing to compare’).
SELECT
date,
metric,
LAG(metric,3,0) OVER(ORDER BY date)
FROM table_1;
date | metric | difference_to_prev_3 |
---|---|---|
date_1 | 100 | 0 |
date_2 | 110 | 0 |
date_3 | 90 | -10 |
date_4 | 140 | 40 |
date_5 | 150 | 50 |
Note that we can actually use either LEAD or LAG to lookup in both directions. For this we need to specify the ORDER BY argument to either ASC or DESC depending on the direction we want to lookup.
LAG(…) OVER(ORDER BY …) is the same as LEAD(…) OVER (ORDER BY … ASC)
FIRST_VALUE, LAST_VALUE and NTH_VALUE
What if we want to look up a specifc row from the table? This is where these guys come in:
FIRST_VALUE- returns the first value in the partition
LAST_VALUE- returns the last value in the partition
NTH_VALUE- retursn the nth value n the partition
Seems pretty self explanatory. So how would we use them?
Let’s say we want to compare each row to the overall maximum and minimum values of that colum
SELECT
date,
metric,
FIRST_VALUE(metric)
OVER(ORDER BY metric ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) as min_metric,
LAST_VALUE(column_2)
OVER(ORDER BY metric ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) as max_metric
FROM
table_1;
date | metric | min_metric | max_metric |
---|---|---|---|
date_1 | 100 | 90 | 150 |
date_2 | 110 | 90 | 150 |
date_3 | 90 | 90 | 150 |
date_4 | 150 | 90 | 150 |
date_5 | 140 | 90 | 150 |
Let’s see whats happening here
FIRST_VALUE(column_2)
OVER(ORDER BY column_2 ASC
ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING)
We are telling the database to get the first value of metric when it is ordered in an ascending order. This would return us the smallest value for metric. Note that we could also order metric by another column ( date ). This would return the value from metric that is linked to the smallest value in date. (the metric value on the earliest date).
We are also specifying the window of the calculation:
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
This tells the database to use the whole table as a windows. If we wanted we could fine tune the window to use on N number of rows in each direction.
NTH Value
Maybe you want to show the highest value in a partition. OR the 3rd highest, or the 5th lowest.
SELECT
date,
metric,
LAST_VALUE(metric)
OVER(ORDER BY metric ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) as top_1_metric,
NTH_VALUE(metric,2)
OVER(ORDER BY metric DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) as top_2_metric,
NTH_VALUE(metric,3)
OVER(ORDER BY metric DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) as top_3_metric
FROM
table_1;
date | metric | top_1_metric | top_2_metric | top_3_metric |
---|---|---|---|---|
date_1 | 100 | 150 | 140 | 110 |
date_2 | 110 | 150 | 140 | 110 |
date_3 | 90 | 150 | 140 | 110 |
date_4 | 140 | 150 | 140 | 110 |
date_5 | 150 | 150 | 140 | 110 |
There is much more to analytic functions in SQL and each database has some variations and special functions implemented. However, these above are the base analytic functions that should you help grasp other more specialized ones as well.