Detect value changes between lines with PostgreSQL

by Florent Lebreton (fle)
in development

tagged postgresql

A few days ago, in a Django project, I had to solve a SQL problem that I had never met yet. Something like : "The last time that this column value has changed between a row and the next one". Crap...How?

By requesting help of regilero, who told me about PostgreSQL window functions.

Window Functions

To solve this, or any SQL query where you have to compare similar rows, PostgreSQL provides a usefull functionnality: Window Functions. PostgreSQL 9.1.13 Documentation introduces this feature by saying:

A window function performs a calculation across a set of table rows that are somehow related to the current row. This is comparable to the type of calculation that can be done with an aggregate function. But unlike regular aggregate functions, use of a window function does not cause rows to become grouped into a single output row — the rows retain their separate identities. Behind the scenes, the window function is able to access more than just the current row of the query result.

Different built-in window functions allows to compute rank of a row in a partition, get previous or next row value, etc. This kind of function must be invoked using window function syntax (i.e. with an OVER clause).

A simple example

Let's take a quite useless but simple example:

  • Each day, I take note on weather (How warm is it ? Is it rainy ?).
  • I want to extract some information like:
    • On What day did the weather change?
    • When did it start to rain for the last time ?
    • ...

In a database, a very simple representation looks like this:

db=> \d weather
 Table « public.weather »
 Column | Type | Modifiers
 ------------+---------+---------------
 day | date | non NULL
 temperature | integer | non NULL
 rainy | boolean | non NULL

and with sample data:

db=> SELECT * FROM weather ORDER BY day DESC;
 day | temperature | rainy
 -----------+-------------+-------
 2014年04月08日 | 22 | f
 2014年04月07日 | 20 | f
 2014年04月06日 | 16 | t
 2014年04月05日 | 18 | t
 2014年04月04日 | 19 | t
 2014年04月03日 | 22 | f
 2014年04月02日 | 20 | f
 2014年04月01日 | 18 | t

The very intersting part is here : thanks to window functions lag and lead, I can select for each row the column values of the previous and next rows:

SELECT
 day,
 rainy,
 lead(rainy) OVER (ORDER BY day DESC) as prev_rainy,
 lag(rainy) OVER (ORDER BY day DESC) as next_rainy
FROM
 weather
ORDER BY
 day DESC
 day | rainy | prev_rainy | next_rainy
------------+-------+------------+------------
 2014年04月08日 | f | f |
 2014年04月07日 | f | t | f
 2014年04月06日 | t | t | f
 2014年04月05日 | t | t | t
 2014年04月04日 | t | f | t
 2014年04月03日 | f | f | t
 2014年04月02日 | f | t | f
 2014年04月01日 | t | | f

Note: Obviously, the ORDER BY clause is very important here.

By nesting this in an other query, I can detect value changes between rows of my table. For example, the query below gives "each day on which the weather changed" (switch of the rainy boolean):

SELECT
 w1.day, w1.rainy
FROM
 (SELECT
 w2.day,
 w2.rainy,
 lead(w2.rainy) OVER (ORDER BY w2.day DESC) as prev_rainy
 FROM
 weather w2
 ORDER BY
 w2.day DESC) as w1
WHERE
 w1.rainy IS DISTINCT FROM w1.prev_rainy
ORDER BY
 w1.day DESC;
 day | rainy
-----------+-------
2014年04月07日 | f
2014年04月04日 | t
2014年04月02日 | f
2014年04月01日 | t

Based on this first selection, I can easily extract some other information like "the last time the weather began to be nice":

SELECT
 w1.day, w1.rainy
FROM
 (SELECT
 w2.day,
 w2.rainy,
 lead(w2.rainy) OVER (ORDER BY w2.day DESC) as prev_rainy
 FROM
 weather w2
 ORDER BY
 w2.day DESC) as w1
WHERE
 w1.rainy IS DISTINCT FROM w1.prev_rainy
AND
 w1.rainy IS FALSE
ORDER BY
 w1.day DESC
LIMIT 1;
 day | rainy
-----------+-------
2014年04月07日 | f

Go further

Use case above is just an example focused on window functions lag and lead (I have no idea how to solve this kind of problem without them) but PostgreSQL provides other usefull builtin window functions.

In addition, it's possible to call any built-in or user-defined aggregate function as a window function!

Stay tuned

Keep in touch on twitter, through this blog feed or by commenting this article below!

[FR] Ce billet en français sur le blog de Makina Corpus : Détecter un changement de valeurs entre deux lignes avec PostgreSQL !

Comments !