(re)Learning SQL – Leading and Lagging

Lead, Lag and others are SQL analytic functions. Just another set of tools to help with querying different reporting tables. The syntax is similar to other window functions:

<analytic function> OVER (...)

The difference is that while window functions aggregate a relative window of rows from the home row. Analytics functions lookup a specific row from the home row.

LEAD & LAG

These two return the preceding (LAG) or following (LEAD) row value that you specify.

SELECT
  date, 
  metric,
  LEAD(metric) OVER(ORDER BY date)
FROM table_1;

The line LEAD(col_1) OVER(ORDER BY col_2)

column_1 column_2 lead
date_1 1 2
date_2 2 3
date_3 3 null

For example LAG can be used for calculating difference to previous month:

select
  date,
  metric,
  metric - LAG(metric) OVER(ORDER BY date) 
  as difference_to_prev
FROM
table_1;
date metric difference_to_prev
date_1 1 NULL
date_2 2 1
date_3 3 1

LEAD(x,y,z) and LAG(x,y,z)

Both LEAD() and LAG() can also take arguments. These work like this:

LEAD(x,y,z)
x = the column you want to return
y = the offset, or how many rows forward or backward you want to go to return a value
z = value to return if nothing is found. For example when just using LEAD(x) the query would return NULL for the last row in the table. Because there is no rows after the last row.

Note that, for value z, our replacement value must be of the same data type as the column. So assuming the revenue column is a integer type, if we want to fix the last row for LEAD(revenue) we must write LEAD(revenue, 1, 0) and NOT LEAD(revenue,1,’nothing to compare’). 

SELECT
  date,
  metric,
  LAG(metric,3,0) OVER(ORDER BY date)
FROM table_1;
date metric difference_to_prev_3
date_1 100 0
date_2 110 0
date_3 90 -10
date_4 140 40
date_5 150 50

Note that we can actually use either LEAD or LAG to lookup in both directions. For this we need to specify the ORDER BY argument to either ASC or DESC depending on the direction we want to lookup.

LAG(…) OVER(ORDER BY …) is the same as LEAD(…) OVER (ORDER BY … ASC)

FIRST_VALUE, LAST_VALUE and NTH_VALUE

What if we want to look up a specifc row from the table? This is where these guys come in:

FIRST_VALUE- returns the first value in the partition
LAST_VALUE- returns the last value in the partition
NTH_VALUE- retursn the nth value n the partition

Seems pretty self explanatory. So how would we use them?

Let’s say we want to compare each row to the overall maximum and minimum values of that colum

SELECT
   date,
   metric,
FIRST_VALUE(metric) 
   OVER(ORDER BY metric ASC
   ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
   ) as min_metric,
LAST_VALUE(column_2) 
   OVER(ORDER BY metric ASC
   ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
   ) as max_metric
FROM
table_1;
date metric min_metric max_metric
date_1 100 90 150
date_2 110 90 150
date_3 90 90 150
date_4 150 90 150
date_5 140 90 150

Let’s see whats happening here

FIRST_VALUE(column_2) 
   OVER(ORDER BY column_2 ASC
   ROWS BETWEEN UNBOUNDED PRECEDING 
   AND UNBOUNDED FOLLOWING)

We are telling the database to get the first value of metric when it is ordered in an ascending order. This would return us the smallest value for metric. Note that we could also order metric by another column ( date ). This would return the value from metric that is linked to the smallest value in date. (the metric value on the earliest date).

We are also specifying the window of the calculation: 
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

This tells the database to use the whole table as a windows. If we wanted we could fine tune the window to use on N number of rows in each direction.

NTH Value

Maybe you want to show the highest value in a partition. OR the 3rd highest, or the 5th lowest.

SELECT
   date,
   metric,
LAST_VALUE(metric) 
   OVER(ORDER BY metric ASC
   ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
   ) as top_1_metric,
NTH_VALUE(metric,2) 
   OVER(ORDER BY metric DESC
   ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
   ) as top_2_metric,
NTH_VALUE(metric,3) 
   OVER(ORDER BY metric DESC
   ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
   ) as top_3_metric
FROM
table_1;
date metric top_1_metric top_2_metric top_3_metric
date_1 100 150 140 110
date_2 110 150 140 110
date_3 90 150 140 110
date_4 140 150 140 110
date_5 150 150 140 110

There is much more to analytic functions in SQL and each database has some variations and special functions implemented. However, these above are the base analytic functions that should you help grasp other more specialized ones as well.

Leave a comment

Design a site like this with WordPress.com
Get started