3
\$\begingroup\$

The Finnish personal identity code is defined as:

  • Day, month, and year of birth, in ddmmyy form (six digits, zero padded if necessary)
  • A separator
    • For those born >= 2000, one of A, B, C, D, E, or F
    • For those born in the 1900s, one of Y, X, W, V, U, or -
    • For those born in the 1800s, a plus sign (+)
  • A three-digit individual number, odd for male, even for female. In practice this is between 002 and 899 (900-999 is used for temporary identity codes).
  • A control character (the nine digits modulo 31, table lookup 0-Y)

For example, Anna Finlander's personal identity code is 131052-308T. She is thus born October 13th, 1952. For official information, see here and here.

I wanted to calculate and store the date of birth from the personal identity code in Postgresql. I came up with the following, using a generated column.

create table people (
 id char(11),
 birthday date generated always as (
 case
 when substring(id from 7 for 1) in ('A', 'B', 'C', 'D', 'E', 'F') then make_date((2000 + substring(id from 5 for 2)::int), substring(id from 3 for 2)::int, substring(id from 1 for 2)::int)
 when substring(id from 7 for 1) in ('Y', 'X', 'W', 'V', 'U', '-') then make_date((1900 + substring(id from 5 for 2)::int), substring(id from 3 for 2)::int, substring(id from 1 for 2)::int)
 else make_date(1800 + substring(id from 5 for 2)::int, substring(id from 3 for 2)::int, substring(id from 1 for 2)::int)
 end
 ) stored
);

I generate a bunch of PICs to test with. Generator here (not mine).

insert into people values
('200118E3687'),
('290555X6906'),
('290366W021U'),
('020896X648D'),
('060113E330N'),
('140293-3254'),
('240445Y5535'),
('271208W637P'),
('200712F334D'),
('180330Y745T'),
('010275X898K'),
('290365X369T'),
('020326Y5549'),
('210316F447H'),
('240643V002F'),
('270814B294Y'),
('020437-011D'),
('270693U601M'),
('180841U181F'),
('220311D245B'),
('030791+8405'),
('070192+523X'),
('041138U469C'),
('120902F564P'),
('210948W637R'),
('220288V496H'),
('291193+769C'),
('121020C279Y'),
('160595+647R'),
('250601-547V'),
('230533-381N'),
('271159U448Y'),
('011218D368U'),
('290391+659V'),
('221219-0457'),
('130590X252S'),
('280717V179V'),
('270798+6743'),
('230990V8624'),
('240895W718L'),
('140748Y270S'),
('130940V611N'),
('131119D0603'),
('070796+2278'),
('241098U148L'),
('110656X072P'),
('241052U730H'),
('160185+507D'),
('011002E6761'),
('090380X6594'),
('010594Y9032'),
('010594Y9021'),
('020594X903P'),
('020594X902N'),
('030594W903B'),
('030694W9024'),
('040594V9030'),
('040594V902Y'),
('050594U903M'),
('050594U902L'),
('010516B903X'),
('010516B902W'),
('020516C903K'),
('020516C902J'),
('030516D9037'),
('030516D9026'),
('010501E9032'),
('020502E902X'),
('020503F9037'),
('020504A902E'),
('020504B904H');
select
 id, birthday,
 age(birthday) as age
from people;
 id | birthday | age 
-------------+------------+---------------------------
 200118E3687 | 2018年01月20日 | 6 years 5 mons 1 day
 290555X6906 | 1955年05月29日 | 69 years 23 days
 290366W021U | 1966年03月29日 | 58 years 2 mons 23 days
 020896X648D | 1996年08月02日 | 27 years 10 mons 19 days
 060113E330N | 2013年01月06日 | 11 years 5 mons 15 days
 140293-3254 | 1993年02月14日 | 31 years 4 mons 7 days
 240445Y5535 | 1945年04月24日 | 79 years 1 mon 27 days
 271208W637P | 1908年12月27日 | 115 years 5 mons 25 days
 200712F334D | 2012年07月20日 | 11 years 11 mons 1 day
 180330Y745T | 1930年03月18日 | 94 years 3 mons 3 days
 010275X898K | 1975年02月01日 | 49 years 4 mons 20 days
 290365X369T | 1965年03月29日 | 59 years 2 mons 23 days
 020326Y5549 | 1926年03月02日 | 98 years 3 mons 19 days
 210316F447H | 2016年03月21日 | 8 years 3 mons
 240643V002F | 1943年06月24日 | 80 years 11 mons 27 days
 270814B294Y | 2014年08月27日 | 9 years 9 mons 25 days
 020437-011D | 1937年04月02日 | 87 years 2 mons 19 days
 270693U601M | 1993年06月27日 | 30 years 11 mons 24 days
 180841U181F | 1941年08月18日 | 82 years 10 mons 3 days
 220311D245B | 2011年03月22日 | 13 years 2 mons 30 days
 030791+8405 | 1891年07月03日 | 132 years 11 mons 18 days
 070192+523X | 1892年01月07日 | 132 years 5 mons 14 days
 041138U469C | 1938年11月04日 | 85 years 7 mons 17 days
 120902F564P | 2002年09月12日 | 21 years 9 mons 9 days
 210948W637R | 1948年09月21日 | 75 years 9 mons
 220288V496H | 1988年02月22日 | 36 years 3 mons 28 days
 291193+769C | 1893年11月29日 | 130 years 6 mons 22 days
 121020C279Y | 2020年10月12日 | 3 years 8 mons 9 days
 160595+647R | 1895年05月16日 | 129 years 1 mon 5 days
 250601-547V | 1901年06月25日 | 122 years 11 mons 26 days
 230533-381N | 1933年05月23日 | 91 years 29 days
 271159U448Y | 1959年11月27日 | 64 years 6 mons 24 days
 011218D368U | 2018年12月01日 | 5 years 6 mons 20 days
 290391+659V | 1891年03月29日 | 133 years 2 mons 23 days
 221219-0457 | 1919年12月22日 | 104 years 5 mons 30 days
 130590X252S | 1990年05月13日 | 34 years 1 mon 8 days
 280717V179V | 1917年07月28日 | 106 years 10 mons 24 days
 270798+6743 | 1898年07月27日 | 125 years 10 mons 25 days
 230990V8624 | 1990年09月23日 | 33 years 8 mons 28 days
 240895W718L | 1995年08月24日 | 28 years 9 mons 28 days
 140748Y270S | 1948年07月14日 | 75 years 11 mons 7 days
 130940V611N | 1940年09月13日 | 83 years 9 mons 8 days
 131119D0603 | 2019年11月13日 | 4 years 7 mons 8 days
 070796+2278 | 1896年07月07日 | 127 years 11 mons 14 days
 241098U148L | 1998年10月24日 | 25 years 7 mons 28 days
 110656X072P | 1956年06月11日 | 68 years 10 days
 241052U730H | 1952年10月24日 | 71 years 7 mons 28 days
 160185+507D | 1885年01月16日 | 139 years 5 mons 5 days
 011002E6761 | 2002年10月01日 | 21 years 8 mons 20 days
 090380X6594 | 1980年03月09日 | 44 years 3 mons 12 days
 010594Y9032 | 1994年05月01日 | 30 years 1 mon 20 days
 010594Y9021 | 1994年05月01日 | 30 years 1 mon 20 days
 020594X903P | 1994年05月02日 | 30 years 1 mon 19 days
 020594X902N | 1994年05月02日 | 30 years 1 mon 19 days
 030594W903B | 1994年05月03日 | 30 years 1 mon 18 days
 030694W9024 | 1994年06月03日 | 30 years 18 days
 040594V9030 | 1994年05月04日 | 30 years 1 mon 17 days
 040594V902Y | 1994年05月04日 | 30 years 1 mon 17 days
 050594U903M | 1994年05月05日 | 30 years 1 mon 16 days
 050594U902L | 1994年05月05日 | 30 years 1 mon 16 days
 010516B903X | 2016年05月01日 | 8 years 1 mon 20 days
 010516B902W | 2016年05月01日 | 8 years 1 mon 20 days
 020516C903K | 2016年05月02日 | 8 years 1 mon 19 days
 020516C902J | 2016年05月02日 | 8 years 1 mon 19 days
 030516D9037 | 2016年05月03日 | 8 years 1 mon 18 days
 030516D9026 | 2016年05月03日 | 8 years 1 mon 18 days
 010501E9032 | 2001年05月01日 | 23 years 1 mon 20 days
 020502E902X | 2002年05月02日 | 22 years 1 mon 19 days
 020503F9037 | 2003年05月02日 | 21 years 1 mon 19 days
 020504A902E | 2004年05月02日 | 20 years 1 mon 19 days
 020504B904H | 2004年05月02日 | 20 years 1 mon 19 days
Sᴀᴍ Onᴇᴌᴀ
29.5k16 gold badges45 silver badges201 bronze badges
asked Jun 21, 2024 at 17:37
\$\endgroup\$

1 Answer 1

2
\$\begingroup\$

short ID

As I read the generated clause for the "birthday" column, I worry that we might occasionally accept a short ID and then mangle it. If possible, I would like to see a fatal error raised, or at least see zero such rows actually INSERTed and committed.

Background: All Americans know that Boston is an east coast city that has U.S. postal ZIP codes like 02108. And yet modern data processing systems get it wrong an astonishing amount of the time, truncating it down to 2108, with databases and Excel spreadsheets turning it into an integer. One could use a %05d format string, but then that risks turning a "dropped keystroke" typo into a legitimate looking entry, plus it risks running afoul of ZIP+4 entries. It's the sort of thing that requires constant vigilance, always preserving a ZIP as a string, and remembering to add Boston test records to midwest datasets. And it looks like Anna Suomalainen could suffer a similar fate. Fortunately nearly a third of Finns are born in the first nine days of months, so it would be harder for such errors to slip by undetected.

A simple length check would be a good start. For extra credit you might compute the mod 31 checksum, which would probably involve writing a stored procedure. And repeatedly flipping around DDMMYY to YYYYMMDD is tedious enough that, while you're at it, may as well write a trivial user func for that as well. Alternatively, exploit date parse format strings to make it a little more compact.

Instead of defaulting to else make_date(1800 + ..., I'd like to see an explicit check for a '+' separator, causing short or otherwise mangled inputs to fail.

I like that bad dates such as February 30th produce fatal error -- good.

mixed case

I imagine that IDs sometimes get entered by hand, without the CAPS LOCK key on. So if a separator of 'A' arrives as 'a', the OP code defaults to interpreting it as '+' from centuries ago.

Similarly if a checksum of 'B' arrives as 'b', we should signal fatal error or perhaps just upcase it. We could range check for 002 .. 899, but perhaps the checksum suffices.

You may find initial validation via regexp to be helpful.

answered Jun 25, 2024 at 1:42
\$\endgroup\$
2
  • \$\begingroup\$ Thanks @J_H. Regarding date parse format strings, do you mean using to_date instead of make_date? That's not possible in a generated column, as functions used there have to be immutable, and to_date is only stable. \$\endgroup\$ Commented Jun 25, 2024 at 7:03
  • \$\begingroup\$ Awww! Sad face. Perhaps a user func or stored procedure may be a better fit. Or perhaps we should be more trusting that some well-tested application library is feeding us only well-formatted inputs. \$\endgroup\$ Commented Jun 25, 2024 at 15:34

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.