Importing csv file with varying numbers of values per row into PostgreSQL?

Question 1

I am trying to import a csv file into postgres using this statement:

COPY T00_import FROM E'C:\\temp\\CSV\\Aberdeen City 9051_20150212_C_01.csv' CSV;

The Table T00_import is a simple table with 17 fields, all set to varchar(100). The csv does not have a header and the first row only has 9 values, rows further down have up to 17 values (columns). Postgres is giving me this error:

 ERROR: missing data for column "field10"
CONTEXT: COPY t00_import, line 1: "10,"Aberdeen City Council",9051,2015年02月12日,1,2015年02月12日,111245,1.0,"C""
********** Error **********
ERROR: missing data for column "field10"
SQL state: 22P04
Context: COPY t00_import, line 1: "10,"Aberdeen City Council",9051,2015年02月12日,1,2015年02月12日,111245,1.0,"C""

I have tried identifying the Null string but this did nothing. It should be easy to fix this, but I can't find how.

Question 2

Every row should have the right number of columns. How else it is possible to guess which columms are missing? For empty columns there are only delimeters: attr1,,,attr4 means that second and third columns are NULL.

Question 3

Your .csv file is mis-formatted. From the documentation:

Note: Many programs produce strange and occasionally perverse CSV files, so the file format is more a convention than a standard. Thus you might encounter some files that cannot be imported using this mechanism, and COPY might produce files that other programs cannot process.

I don't know where your csv came from, but if it is up to 17 fields long it should have extra commas on the first row: "10,"Aberdeen City Council",9051,2015年02月12日,1,2015年02月12日,111245,1.0,"C",,,,,,,,", and to me this would appear to be your problem. Postgis has no indicator that there are more columns in that row, but the table that you're copying to obviously thinks that there should be. I adding headers in the file fixed the output for all subsequent rows. You may have to reprocess each file.

Question 4

Have you tried using DELIMITER ','?

COPY t00_import FROM 'C:\temp\CSV\Aberdeen City 9051_20150212_C_01.csv' DELIMITER ',' CSV;

Also, might try naming the columns in the statement:

COPY t00_import (column1,column2,etc...,column17) FROM 'C:\temp\\CSV\Aberdeen City 9051_20150212_C_01.csv' DELIMITER ',' CSV;

Also, are any of your columns set to not allow NULL values?

I was able to recreate your error by creating an improperly formatted csv file. You could be missing the commas where there are empty values.

enter image description here

example:

WRONG

name,city,state

Peter,Cedar Rapids

Bob,San Diego,CA

Jane,,CA

Instead it should be:

name,city,state

Peter,Cedar Rapids, <-------- SEE THE COMMA

Bob,San Diego,CA

Jane,,CA

Notice the missing commas at the end of the first line of data.

Question 5

I tried making the changes you suggest.. COPY T00_import(field1,field2,field3,field4,field5,field6,field7,field8,field9,field10,field11,field12,field13,field14,field15,field16,field17) FROM E'C:\\temp\\CSV\\Aberdeen City 9051_20150212_C_01.csv' DELIMITER ',' CSV; same error

Question 6

what about looking at your csv to see if it is missing the commas?

Question 7

So far the only thing that did solve this issue is to add a header to the csv (field1,field2,....,field17). Not ideal, as I will be importing many CSV's.

Question 8

This often happens when the original dataset contains too many commas as a part of the free text. I usually load it to the DB, and then edit the data.

Create a temp table with one column that contains the entire row data
Add extra columns to good_data_table: orig_col_1 ,... ,extra_col_1,... extra_col_n

Insert from select to the new table:

INSERT INTO good_data_table 
(orig_col_1, .. orig_col_m, extra_col_1,... extra_col_n) 
(SELECT split_part(data, ‘,’, 1) AS col1), 
split_part(data, ‘,’, 2) AS col2), ...
split_part(data, ‘,’, k) AS colk) 
FROM temp_data_table);

Locate rows with extra columns where the extra_column is not ' ' (empty string)

Question 9

Yep Oto, that's it. Well done. This should work in Postgresql, (maybe some typos).

 DROP TABLE IF EXISTS mydata; 
 CREATE TABLE mydata (bigfield TEXT );
 COPY mydata FROM '\path\file.csv' WITH delimiter '§';

Use some delimiter character that is not in the csv file, otherwise commas may be used by default.

If row 1 in the CSV file is a valid CSV header with meaningful names then you can use them here directly instead of column1, colum2, etc.

 DROP TABLE IF EXISTS mydata2; 
 SELECT 
 split_part(bigfield, ',', 1 ) AS column1,
 split_part(bigfield, ',', 2 ) AS column2,
 split_part(bigfield, ',', k ) AS columnk
 ...etc...
 INTO mydata2 
 FROM mydata
 ;

Question 10

ERROR: COPY delimiter must be a single one-byte character. Using % instead fixed this.

Zack Zack 1,1359 silver badges16 bronze badges · Accepted Answer · 2015-08-07 16:31:45Z

Your .csv file is mis-formatted. From the documentation:

Note: Many programs produce strange and occasionally perverse CSV files, so the file format is more a convention than a standard. Thus you might encounter some files that cannot be imported using this mechanism, and COPY might produce files that other programs cannot process.

I don't know where your csv came from, but if it is up to 17 fields long it should have extra commas on the first row: "10,"Aberdeen City Council",9051,2015年02月12日,1,2015年02月12日,111245,1.0,"C",,,,,,,,", and to me this would appear to be your problem. Postgis has no indicator that there are more columns in that row, but the table that you're copying to obviously thinks that there should be. I adding headers in the file fixed the output for all subsequent rows. You may have to reprocess each file.

Stack Exchange Network

Importing csv file with varying numbers of values per row into PostgreSQL?

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Importing csv file with varying numbers of values per row into PostgreSQL?

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions