The columns of the CSV are of the format text
(can include empty lines), date
and a JSON array of strings (something like ['a', 'b', 'c']
. I've been trying to copy that CSV to a PostgreSQL table (using psycopg2
's copy_expert
, which simply executes the given SQL COPY
command, if that matters)
Table is created with
CREATE TABLE posts(
id SERIAL PRIMARY KEY,
text TEXT NOT NULL,
created_date TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
rubrics jsonb[] NOT NULL
);
and the copy command is
COPY posts(text, created_date, rubrics)
FROM STDIN
WITH CSV HEADER
where STDIN
is the CSV file.
The error I get is
malformed array literal: "['a', 'b', 'c']"
DETAIL: "[" must introduce explicitly-specified array dimensions.
CONTEXT: COPY posts, line 15, column rubrics: "['a', 'b', 'c']"
I've tried all 4 JSON-related data types (json
and jsonb
with and without []
or [3]
), including square brackets produces the error above, while omitting them (rubrics jsonb NOT NULL
when creating) gives a new one:
invalid input syntax for type json
DETAIL: Token "'" is invalid.
CONTEXT: JSON data, line 1: ['...
COPY posts, line 15, column rubrics: "['a', 'b', 'c']"
Do I have any recourse besides manually fixing the .csv to use {}
instead of []
before copying it? It feels like I do, yet I haven't really been able to found anything besides a couple somewhat, but not fully relevant questions.
Update regarding the comments
I've changed the table creation to
CREATE TABLE posts(
id SERIAL PRIMARY KEY,
text TEXT NOT NULL,
created_date TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
rubrics jsonb NOT NULL
);
and left a single entry in the CSV to test, so it looks like this
text,created_date,rubrics
"Lorem
Ipsum
Test",2019年07月25日 12:42:13,'["f", "o", "o"]'
Things I've tried and errors I got
Using the CSV as is:
extra data after last expected column
CONTEXT: COPY posts, line 6: ""Lorem
Ipsum
Test",2019年07月25日 12:42:13,'["f", "o", "o"]'"
An additional pair of ""
around the array ("'["f", "o", "o"]'"
)
invalid input syntax for type json
DETAIL: Token "'" is invalid.
CONTEXT: JSON data, line 1: '...
COPY posts, line 6, column rubrics: "'[f, o, o]'"
No quotes at all (just the ["f", "o", "o"]
)
extra data after last expected column
CONTEXT: COPY posts, line 6: ""Lorem
Ipsum
Test",2019年07月25日 12:42:13,["f", "o", "o"]"
Double quotes instead of single quotes ("["f", "o", "o"]"
)
invalid input syntax for type json
DETAIL: Token "f" is invalid.
CONTEXT: JSON data, line 1: [f...
COPY posts, line 6, column rubrics: "[f, o, o]"
Double quotes outside, single quotes inside ("['f', 'o', 'o']"
)
invalid input syntax for type json
DETAIL: Token "'" is invalid.
CONTEXT: JSON data, line 1: ['...
COPY posts, line 6, column rubrics: "['f', 'o', 'o']"
Could it be an issue with the Python library I'm using after all?
1 Answer 1
JSON is a layer of encapsulation with a standardized syntax that the JSON field in your input must respect.
CSV is another layer of encapsulation that applies on top, with a syntax that is more of less standardized in rfc 4180. As mentioned in the RFC, programs that read or write CSV do not always conform strictly to all these rules, but concerning the quoting rules, Postgres is conformant.
JSON uses double quotes around string literals, so all the attempts in the question to use single quotes around strings are invalid for the JSON parser
CSV requires that double quotes are doubled, see rules #5 and #7 of rfc 4180, so all attempts to use double quotes in JSON without doubling them for CSV or without enclosing the entire field in double quotes are invalid for the CSV parser.
- Each field may or may not be enclosed in double quotes (however some programs, such as Microsoft Excel, do not use double quotes at all). If fields are not enclosed with double quotes, then double quotes may not appear inside the fields
- If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote
Your example slightly edited to conform to both CSV and JSON does work:
CREATE TABLE posts(
id SERIAL PRIMARY KEY,
text TEXT NOT NULL,
created_date TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
rubrics jsonb NOT NULL
);
COPY posts(text, created_date, rubrics)
FROM STDIN WITH CSV;
foo,2022年05月17日,"[""a"", ""b"", ""c""]"
\.
select * from posts;
id | text | created_date | rubrics
----+------+---------------------+-----------------
1 | foo | 2022年05月17日 00:00:00 | ["a", "b", "c"]
If you wanted a Postgres array of JSON objects in rubrics
, that would be more complicated, as expressing the Postgres array as text requires another layer of encapsulation with its own syntax rules, so the production of valid data to import with COPY would have to do
JSON quoting -> postgres array quoting -> CSV quoting
But as seen in your example, it seems that you don't need a Postgres array of JSON objects as a column, but rather a single JSON value that contains a JSON array.
-
Thank you!! Made a quick
re.sub
and it worked!bqback– bqback2022年05月17日 20:23:38 +00:00Commented May 17, 2022 at 20:23
['a', 'b', 'c']
is not valid JSON, because strings must be enclosed in double quotes, not single quotes.rubrics jsonb[]
and notrubrics jsonb
?"['a','b','c']"
if I view the CSV through a text editor (so one entry is"Lorem Ipsum",2019年07月25日 12:42:13,"['a', 'b', 'c']"
), I just had the file open in Excel['a', 'b', 'c']
and it's not a valid JSON or a valid array of JSON. Any JSON parser will reject that.