I am trying to dump some data as json with:
\copy (SELECT json_build_object('areaSqft',area_sqft, 'titleNos',title_nos, 'buildingIds',building_ids, 'geometry',geom_json) FROM my_data) to my_data.csv with delimiter ',' csv header
what I am expecting is a valid json per row, but what I get is:
"{""areaSqft"": 214.394254595041, ""geometry"": {""type"": ""MultiPolygon"", ""coordinates"": [[[[0.000015, 51.449107], [0.000154, 51.441108], [0.000238, 51.44111], [0.00024, 51.441052], [0.000137, 51.441051], [0.000041, 51.441049], [0.000015, 51.441107]]]]}, ""titleNos"": [""ZB78669""], ""buildingIds"": [7521141, 9530393, 7530394]}"
There are extra "
as first and last character and ""
around instead of single "
.
How can I get a valid json stripping unnecessary quotes?
3 Answers 3
The quotes around fields and inside fields are part of the CSV format, and they're required here, because, according the CSV spec:
Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes
If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote
I think that you don't want or need CSV in your case. Just take the output of SELECT, with the unaligned
format of psql
=# \pset format unaligned
Output format is unaligned.
=# select json_build_object('foo', 1, bar, 2) AS myjson
from (values (E'xy\zt'), ('ab,cd')) as b(bar);
myjson
{"foo" : 1, "xyzt" : 2}
{"foo" : 1, "ab,cd" : 2}
(2 rows)
You may also use \g output.json
instead of the semi-colon at the end of the query to have psql redirect the results of that query to a file, and \pset tuples_only
to remove headers and footers.
-
damn, the process doesn't get to an end but exits with a mysterious "Killed" :(Randomize– Randomize2018年05月31日 13:25:44 +00:00Commented May 31, 2018 at 13:25
-
can be your previous answer here: dba.stackexchange.com/questions/101471/… a possible solution?Randomize– Randomize2018年05月31日 13:27:36 +00:00Commented May 31, 2018 at 13:27
-
@Randomize: not sure, but "Killed" smells like OOM indeed. There would be more details in the system logs. If the dataset is huge, using FETCH_COUNT does help.Daniel Vérité– Daniel Vérité2018年05月31日 19:57:16 +00:00Commented May 31, 2018 at 19:57
-
BTW merging your two answers I got the thing working thanks. Yes it very likely to be an OOM problem but I am running Postgres on AWS/RDS so I have a limited control on the system.Randomize– Randomize2018年05月31日 19:59:31 +00:00Commented May 31, 2018 at 19:59
You can just add more parameters to the query:
COPY (
SELECT row_to_json(fruit_data) FROM (
SELECT
name AS fruit_name,
quantity
FROM data
) fruit_data
, TRUE -- add this parameter
) TO 'a.file';
See also:
- Export Postgres table as json
- Create Quick JSON Data Dumps From PostgreSQL by Josh Branchaud
- Exporting data from Postgres to json and renaming the columns of the fly on Stack Overflow
-
same as previous display ?Steve Ruben– Steve Ruben2018年05月31日 09:48:56 +00:00Commented May 31, 2018 at 9:48
-
yes the same unfortunatelyRandomize– Randomize2018年05月31日 09:50:03 +00:00Commented May 31, 2018 at 9:50
Putting this here for future ref, as I've had to deal with a particular column containing JSON on a CSV (already) exported file.
You can get away with it by using in your copy options for FORMAT CSV
:
QUOTE ''''
(using the single quote char instead of the default double quote one)along with
DELIMITER '|'
(or any other one than comma, to prevent having to escape all commas in your JSON, which is error prone and makes for lower compatibility downstream when you try to use the file for loading (copy from))
The problem for me was actually that I was previously using cursor.copy_to()
from Python's psycopg2 library and needed to replace it with cursor.copy_expert()
for better/more control (quoting/unquoting of column names passed). And apparently the .copy_to method uses the QUOTE ''''
option without it being documented anywhere I could find...