I'm trying to calculate the maximum length of NUMERIC columns in a postgres db. There are a number of tables in the db, and most of those tables contain a number of numeric columns.
I'm importing a fairly large number of json data into the database. SQLModel or pydantic fails to insert numeric fields if the destination column precision/scale is less than that of the input. For now, I'm seeding data to generic NUMERIC(16,5)
columns, but I'd like to reduce storage space by optimizing column sizes. (mine is a semi-readonly dataset, the column sizes won't differ much in future)
For reference, following is my abortive stab at solving the problem...
SELECT
table_schema,
TABLE_NAME,
COLUMN_NAME,
(
xpath (
'/row/max/text()',
query_to_xml (
format (
'SELECT LENGTH ( CAST ( MAX ( %I ) AS CHARACTER VARYING ( 40 ) ) ) from %I.%I',
COLUMN_NAME,
table_schema,
TABLE_NAME
),
TRUE,
TRUE,
''
)
)
) [ 1 ] :: TEXT :: INT AS max_length
FROM
information_schema.COLUMNS
WHERE
table_schema = 'public'
AND data_type = 'numeric'
ORDER BY
table_schema,
TABLE_NAME,
COLUMN_NAME;
Even better would be to split the max column lengths into precision & scale.
2 Answers 2
Since the numbers coming from external source - the best place to check those numbers are outside of the database. Use any language, scripting or compiled, to do the scan of JSON files. You can also use tools like jq
to extract the field and find maximum value.
The PostgreSql by itself can handle numeric values for up to ridiculous amounts.
up to 131072 digits before the decimal point; up to 16383 digits after the decimal point
But if numeric(16,5)
does work for you now, then you can replace it with bigint
datatype and store the value as integer. Just do not forget to divide it on a client by 10e+5 before displaying. Speed wise you will get improvement - the size of bigint
is just 8 bytes and it supported natively by 64bit processors.
Splitting value to two integers can be also possible, but if your values really require (16,5), then integer
data type can be not enough for the integer part of value. The 4 byte integer
cannot hold 10e+11 values, so you will need to go to bigint
anyway.
In the sample code you showed, you are working with XML, but previously you stated, that data comes as JSON. So which one is it? If you want, to parse the JSON blobs in the database, you can use JSON functions. Or was it a mistake and you data actually comes in XML format?
-
the input data is JSON. the code shown above is stolen from dba.stackexchange.com/a/215809/273417 ... unfortunately I'm not familiar with XML function and wasn't able to adopt it for NUMERIC data-types...masroore– masroore2023年05月20日 03:36:23 +00:00Commented May 20, 2023 at 3:36
Alright, I got it working. Fixed the column name (length
):
SELECT
table_schema,
TABLE_NAME,
COLUMN_NAME,
(
xpath (
'/row/length/text()',
query_to_xml (
format (
'SELECT LENGTH ( CAST ( MAX ( %I ) AS CHARACTER VARYING ( 40 ) ) ) from %I.%I',
COLUMN_NAME,
table_schema,
TABLE_NAME
),
TRUE,
TRUE,
''
)
)
) [ 1 ] :: TEXT :: INT AS max_length
FROM
information_schema.COLUMNS
WHERE
table_schema = 'public'
AND data_type = 'numeric'
ORDER BY
table_schema,
TABLE_NAME,
COLUMN_NAME;
Is there a way to split the above values into numeric scale and precision?
numeric(3,2)
or innumeric(100,50)
doesn't matter space-wise. The more digits, the more space the number takes.