How to dump to csv valid json with Postgresql?

Question 1

I am trying to dump some data as json with:

\copy (SELECT json_build_object('areaSqft',area_sqft, 'titleNos',title_nos, 'buildingIds',building_ids, 'geometry',geom_json) FROM my_data) to my_data.csv with delimiter ',' csv header

what I am expecting is a valid json per row, but what I get is:

"{""areaSqft"": 214.394254595041, ""geometry"": {""type"": ""MultiPolygon"", ""coordinates"": [[[[0.000015, 51.449107], [0.000154, 51.441108], [0.000238, 51.44111], [0.00024, 51.441052], [0.000137, 51.441051], [0.000041, 51.441049], [0.000015, 51.441107]]]]}, ""titleNos"": [""ZB78669""], ""buildingIds"": [7521141, 9530393, 7530394]}"

There are extra " as first and last character and "" around instead of single ".

How can I get a valid json stripping unnecessary quotes?

Question 2

The quotes around fields and inside fields are part of the CSV format, and they're required here, because, according the CSV spec:

Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes

If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote

I think that you don't want or need CSV in your case. Just take the output of SELECT, with the unaligned format of psql

=# \pset format unaligned 
Output format is unaligned.
=# select json_build_object('foo', 1, bar, 2) AS myjson
 from (values (E'xy\zt'), ('ab,cd')) as b(bar);
myjson
{"foo" : 1, "xyzt" : 2}
{"foo" : 1, "ab,cd" : 2}
(2 rows)

You may also use \g output.json instead of the semi-colon at the end of the query to have psql redirect the results of that query to a file, and \pset tuples_only to remove headers and footers.

Question 3

damn, the process doesn't get to an end but exits with a mysterious "Killed" :(

Question 4

can be your previous answer here: dba.stackexchange.com/questions/101471/… a possible solution?

Question 5

@Randomize: not sure, but "Killed" smells like OOM indeed. There would be more details in the system logs. If the dataset is huge, using FETCH_COUNT does help.

Question 6

BTW merging your two answers I got the thing working thanks. Yes it very likely to be an OOM problem but I am running Postgres on AWS/RDS so I have a limited control on the system.

Question 7

You can just add more parameters to the query:

COPY (
 SELECT row_to_json(fruit_data) FROM (
 SELECT
 name AS fruit_name,
 quantity
 FROM data
 ) fruit_data
 , TRUE -- add this parameter 
 ) TO 'a.file';

See also:

Export Postgres table as json
Create Quick JSON Data Dumps From PostgreSQL by Josh Branchaud
Exporting data from Postgres to json and renaming the columns of the fly on Stack Overflow

Question 8

same as previous display ?

Question 9

yes the same unfortunately

Question 10

Putting this here for future ref, as I've had to deal with a particular column containing JSON on a CSV (already) exported file.

You can get away with it by using in your copy options for FORMAT CSV:

QUOTE '''' (using the single quote char instead of the default double quote one)
along with DELIMITER '|' (or any other one than comma, to prevent having to escape all commas in your JSON, which is error prone and makes for lower compatibility downstream when you try to use the file for loading (copy from))

The problem for me was actually that I was previously using cursor.copy_to() from Python's psycopg2 library and needed to replace it with cursor.copy_expert() for better/more control (quoting/unquoting of column names passed). And apparently the .copy_to method uses the QUOTE '''' option without it being documented anywhere I could find...

score 4 · Accepted Answer · 2018-05-31 09:59:18Z

The quotes around fields and inside fields are part of the CSV format, and they're required here, because, according the CSV spec:

Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes

If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote

I think that you don't want or need CSV in your case. Just take the output of SELECT, with the unaligned format of psql

=# \pset format unaligned 
Output format is unaligned.
=# select json_build_object('foo', 1, bar, 2) AS myjson
 from (values (E'xy\zt'), ('ab,cd')) as b(bar);
myjson
{"foo" : 1, "xyzt" : 2}
{"foo" : 1, "ab,cd" : 2}
(2 rows)

You may also use \g output.json instead of the semi-colon at the end of the query to have psql redirect the results of that query to a file, and \pset tuples_only to remove headers and footers.

damn, the process doesn't get to an end but exits with a mysterious "Killed" :(
can be your previous answer here: dba.stackexchange.com/questions/101471/… a possible solution?
@Randomize: not sure, but "Killed" smells like OOM indeed. There would be more details in the system logs. If the dataset is huge, using FETCH_COUNT does help.
BTW merging your two answers I got the thing working thanks. Yes it very likely to be an OOM problem but I am running Postgres on AWS/RDS so I have a limited control on the system.

Stack Exchange Network

How to dump to csv valid json with Postgresql?

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

How to dump to csv valid json with Postgresql?

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions