Im trying to figure out the engineering to get a CSV file into PostgreSQL. Requirements:
- The CSV has X number of predetermined required fields
- The CSV file has Y number of dynamic fields
Example:
- TenantCode,DomainCode,PersonCode,ExtraField1,ExtraField2,ExtraFieldN
- "CUST1","STUD","C0001","Donald","Duck","M"
- "CUST1","STUD","C0002","Diana","Duck","F"
into this format (in PostgreSQL 10)
TenantCode---DomainCode---PersonCode---ExtendedFields(JSONB)
- CUST1---STUD---C0001--- {"ExtraField1":"Donald","ExtraField2":"Duck","ExtraFieldN":"M"}
- CUST1---STUD---C0002--- {"ExtraField1":"Diana","ExtraField2":"Duck","ExtraFieldN":"F"}
My original thought would be to (in Python) use Pandas to convert the entire file to json, then use PostgreSQL to COPY into a staging table, then use PostgreSQL SQL to parse out the required fields and insert into the destination table.
1 Answer 1
I wouldn't do all that. I would just use python to transform the CSV file into another CSV where all the variable fields are a string of JSON.
Create the right table with the jsonb
column and load direct into it. To do that use psql and \COPY
. You can even create a program that reads from STDIN does the reformatting and outputs to STDOUT
. And sits in the pipeline. Or you can write a quick perl script to get this done,
perl reformat.pl stupid1.csv stupid2.csv |
psql -d test -e '\COPY mymastertable FROM stdin WITH ( format = csv )'
Or whatever.
-
So, Im my research over the past day, I had to adjust my thinking on what the solution is. The solution is in-and-of-itself, not a json document. It's a stream of json documents, with each document a series of CSV fields and a json string. I can iterate over the CSV lines and then use the json.dump(row, output.file). Thoughts?David Crumb– David Crumb2018年08月17日 18:14:59 +00:00Commented Aug 17, 2018 at 18:14
-
I don't understand what you're talking about when you say stream, or what you're trying to do. I prefer you reduce it to a question, ask on a site and send me a link (I'm kind of tapped out) and perhaps others could help out.Evan Carroll– Evan Carroll2018年08月17日 18:27:38 +00:00Commented Aug 17, 2018 at 18:27