Some time last year I felt that I needed to brush up on SQL. I took a few evenings to skim some courses and online resources. (Stackoverflow, Vertabello Academy, db-fiddle.com)
Below are some of my thoughts and notes from going over the material.
* Window functions
* Rows and range
* Partition by
Window Functions
I have been using window functions for calculating simple cross table totals and rolling totals.
This is pretty simple and can be achieved just with SUM(column) OVER()
for getting the total for the whole table.
We can use something like:SUM(colum) OVER(ORDER BY date_column ROWS BETWEEN UNBOUNDED PREVIOUS AND CURRENT ROW)
for the row by row rolling total. Lets get into the last one a bit more.
CREATE TABLE orders (
placed_dt DATE,
total_price NUMERIC
);
INSERT INTO orders(placed_dt, total_price)
VALUES
('2020-01-01',1.23),
('2020-01-02',2.50),
('2020-01-03',5.23),
('2020-01-04',1.50),
('2020-01-04',9.53),
('2020-01-06',8.90),
('2020-01-07',0.23),
('2020-01-08',5.50),
('2020-01-09',1.23),
('2020-01-10',2.60);
SELECT
placed_dt,
SUM(total_price) OVER(
ORDER BY placed_dt
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
from orders
We are calculating a rolling sum of total_price. For each row the sum of current row and all preceding rows. Here OVERdefines the start of the window calculation. That means the sub selection (window) of rows we want the SUM calculation to work on. ORDER BY defines the order of the rows in the window calculation. ROWS can use the following arguments to define the size of the window for the calculation.
UNBOUNDED PRECEDING
CURRENT ROW
UNBOUNDED FOLLOWING
BETWEEN [number] PRECEDING AND [number] FOLLOWING
These can be abbreviated by ommiting either the FOLLOWING or PRECEDING if all we want is to calculate in one direction (until the start or end of the table) in relation to the current row
ROWS UNBOUNDED PRECEDING
meansBETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
ROWS [number] PRECEDING
meansBETWEEN [number] PRECEDING AND CURRENT ROW
ROWS CURRENT ROW
meansBETWEEN CURRENT ROW AND CURRENT ROW
Window functions nested in calculated fields
Window functions can also be used inside calculated fields. Below an example of calculating the ratio of a purchase to the average purchase price in a 3 day window:
SELECT
placed_dt,
total_price,
total_price / (AVG(total_price) OVER (
ORDER BY placed_dt
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)) as ratio
FROM orders
*remember that ROW constraints (i.e. what you write after ORDER BY) must be in logical order, from preceding to following row constraint. So we need to write:
ORDER BY year ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
NOT
ORDER BY year ROWS BETWEEN CURRENT ROW AND 2 PRECEDING
Rows and Range
We could also use RANGE instead of ROWS in the window function.
The difference between ROWS
and RANGE
is that while ROWS
will apply the window function to all the rows specified, RANGE
will limit the window function to any rows that have the same value in the column which we order by.
ROWS applies the window function across any rows that are specified
SELECT
placed_dt,
total_price,
SUM(total_price) OVER (
ORDER BY placed_dt
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as window_rolling_sum
FROM orders
Output:
| placed_dt | total_price | window_rolling_sum |
| ---------- | ----------- | ------------------ |
| 01-01-2020 | 1.23 | 1.23 |
| 02-01-2020 | 2.50 | 3.73 |
| 03-01-2020 | 5.23 | 8.96 |
| 04-01-2020 | 1.50 | 10.46 |<- grouped row by row
| 04-01-2020 | 9.53 | 19.99 |<- grouped row by row
RANGE groups the window function by the ORDER BY columns
SELECT
placed_dt,
total_price,
SUM(total_price) OVER (
ORDER BY placed_dt
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as window_rolling_sum
FROM orders
Output:
| placed_dt | total_price | window_rolling_sum |
| ---------- | ----------- | ------------------ |
| 01-01-2020 | 1.23 | 1.23 |
| 02-01-2020 | 2.50 | 3.73 |
| 03-01-2020 | 5.23 | 8.96 |
| 04-01-2020 | 1.50 | 19.99 |<- grouped by placed_dt
| 04-01-2020 | 9.53 | 19.99 |<- grouped by placed_dt
Using Partition By
We can do similar grouping by forgetting about ORDER BY and using PARTITION BY
SELECT
placed_dt,
total_price,
SUM(total_price) OVER(
PARTITION BY placed_dt)
FROM orders
Output:
| placed_dt | total_price | window_sum |
| ---------- | ----------- | ---------- |
| 01-01-2020 | 1.23 | 1.23 |
| 01-02-2020 | 2.50 | 2.50 |
| 01-03-2020 | 5.23 | 5.23 |
| 01-04-2020 | 1.50 | 11.03 |<- grouped by placed_dt
| 01-04-2020 | 9.53 | 11.03 |<- grouped by placed_dt
Order of operations
This is a good spot to refresh general rules for SQL order of operations. That is in what order the database processes your query. Starting from the first process:
FROM
– the database gets the data from tables inFROM
clause and if necessary performs theJOIN
s,WHERE
– the data are filtered with conditions specified inWHERE
clause,GROUP BY
– the data are grouped by with conditions specified inWHERE
clause,- aggregate functions — the aggregate functions are applied to the groups created in the
GROUP BY
phase, HAVING
– the groups are filtered with the given condition,- window functions,
SELECT
– the database selects the given columns,DISTINCT
– repeated values are removed,UNION
/INTERSECT
/EXCEPT
– the database applies set operations,ORDER BY
– the results are sorted,OFFSET
– the first rows are skipped,LIMIT
/FETCH
/TOP
– only the first rows are selected
As we can see window functions get executed after WHERE
, GROUP BY
or HAVING
. Practically, this order means that you can’t put window functions anywhere in the FROM
, WHERE
, GROUP BY
or HAVING
clauses.
To recap
- You can define a window frame within
OVER(...)
. The syntax is:[ROWS|RANGE] <window frame definition>
. <window frame definition>
is defined withBETWEEN <lower bound> AND <upper bound>
, where the bounds may be defined with:UNBOUNDED PRECEDING
,n PRECEDING
(ROWS
only),CURRENT ROW
,n FOLLOWING
(ROWS
only),UNBOUNDED FOLLOWING
ROWS
always treats rows individually (like theROW_NUMBER()
function),RANGE
also considers rows which share the same value in the column we order by.- Window functions can only appear in the
SELECT
andORDER BY
clauses. - If you need window functions in other parts of the query, use a subquery.