Copying CSV file to temp table with dynamic number of columns?

Question 1

I am wondering if there is a way to copy a csv file into a temp table where the number of columns is unknown in the csv file. The DB software I'm using is PgAdmin III. I figured out that if I do know the number of columns then I can create a temp table with that amount of columns and then copy the csv file in like so:

 CREATE TEMPORARY TABLE temp
 (
 col1 VARCHAR(80),
 col2 VARCHAR(80),
 ....
 coln VARCHAR(80)
 );
COPY temp FROM 'C:/Users/postgres/Boost.txt' CSV HEADER DELIMITER E' '

However if I try to just simply copy the csv file to a temp table with no columns in the temp table, Postgresql (version 8.4) complains that I'm working with a table that has less columns than in the csv file. I've been researching and can't seem to find anything in the Postgresql docs about this. Does anyone know if copying a csv file into a temp table with an arbitrary number of columns decided at run time is possible in Postgresql? Once the temp table is loaded with the csv file I plan on doing some comparisons to other tables with the temp table before it is destroyed. Also the first row in the csv file contains headers.

Question 2

You'll want to use an ETL tool or script. Take a look at CloverETL, Talend Studio, Pentaho Kettle. There are probably also a bunch of simple "CSV loader" scripts out there for your preferred scripting language.

Question 3

Basics:

The 1st row of the CSV file has column names of defined format.
The PROGRAM clause of COPY and GET DIAGNOSTICS after COPY require Postgres 9.3+.
format() requires Postgres 9.1+
This works with pure standard Postgres - except for the head command that the shell is expected to provide. For Windows versions consider:
How to do what head, tail, more, less, sed do in Powershell?

Full automation

This function copies any table structure completely dynamically:

CREATE OR REPLACE FUNCTION f_dynamic_copy(_file text
 , _tbl text = 'tmp1'
 , _delim text = E'\t'
 , _nodelim text = chr(127)) -- see below!
 RETURNS text
 LANGUAGE plpgsql AS
$func$
DECLARE
 row_ct int;
BEGIN
 -- create staging table for 1st row as single text column 
 CREATE TEMP TABLE tmp0(cols text) ON COMMIT DROP;
 -- fetch 1st row
 EXECUTE format($$COPY tmp0 FROM PROGRAM 'head -n1 %I' WITH (DELIMITER %L)$$ -- impossible delimiter
 , _file, _nodelim);
 -- create actual temp table with all columns text
 EXECUTE (
 SELECT format('CREATE TEMP TABLE %I(', _tbl)
 || string_agg(quote_ident(col) || ' text', ',')
 || ')'
 FROM (SELECT cols FROM tmp0 LIMIT 1) t
 , unnest(string_to_array(t.cols, E'\t')) col
 );
 -- Import data
 EXECUTE format($$COPY %I FROM %L WITH (FORMAT csv, HEADER, NULL '\N', DELIMITER %L)$$
 , _tbl, _file, _delim);
 GET DIAGNOSTICS row_ct = ROW_COUNT;
 RETURN format('Created table %I with %s rows.', _tbl, row_ct);
END
$func$;

Call variants:

SELECT f_dynamic_copy('/path/to/file.csv');
SELECT f_dynamic_copy('/path/to/file2.csv', 'tmp_file2');
SELECT f_dynamic_copy(_file => '/path/to/file2.csv'
 , _tbl => 'tmp_file2');
 , _delim => E'\t'); -- using assignment operator since pg 9.5

Answer:

Created table tmp_file2 with 123 rows.

Before the main COPY, run a preliminary COPY ... TO tmp0 to fetch the first row with column names, which are expected to be unquoted, case-sensitive strings like COPY ... TO ... (FORMAT csv, HEADER) would export them.

The structure of the actual target table is derived from it, all columns with data type text. The default name of the resulting table is tmp1 - or provide your own as 2nd function parameter.

Then COPY is executed. The default delimiter is a tab character - or provide your delimiter as 3rd function parameter.

Use any single-byte character for the non-delimiter _nodelim which does not appear in the first line of your CSV file. I am arbitrarily picking the control character "Delete" (ASCII 127). That character would be swallowed here on SO, so I generate with chr(127) instead, which is also valid. Assuming the character won't pop up - or provide your non-delimiter as 4th function parameter.

The function returns table name and number of imported rows.
Remember, a temporary table dies with the end of the session.

The manual:

Executing a command with PROGRAM might be restricted by the operating system's access control mechanisms, such as SELinux.

Postgres 8.4

Postgres 8.4 is too old, not back-porting. Some hints:

GET DIAGNOSTICS is an optional feature. You can just leave it away or replace it with a full count on the table
A primitive (expensive) alternative for the PROGRAM clause of COPY in pg 9.3 would be to import the complete table instead:
```
 EXECUTE format($$COPY tmp0 FROM %L WITH (DELIMITER %L)$,ドル _file, _delim);
```

Or you prepare a second input file, or you can make it work by piping from the shell: COPY tablename FROM STDIN is available in pg 8.4.

format() can be replaced with plain string concatenation. Be wary of SQL injection though!
SQL injection in Postgres functions vs prepared queries

Question 4

Well I don't have the option of upgrading at this time. So the line of code COPY tablename FROM STDIN is what I would use to replace all instances of EXECUTE format... or do I need it for just fetching the first row as the column names and the rest of the rows as data?

Question 5

@John: Only for the first instance, the point of which is to fetch the first row only. You could even COPY the complete table another time, it just would be twice the cost. The second instance is a plain COPY (dynamically using input parameters). I also added info on format(). You should urgently consider upgrading to a current version.

Question 6

@ErwinBrandstetter you're a literal god

Question 7

This program handles creating PostgreSQL tables based on most any CSV or JSON with one line:

https://github.com/lukasmartinelli/pgfutter

Question 8

I used the code example from the comment above and tweaked it a little for Windows and for the case when the column names have Russian characters (maybe it will be useful for someone)

-- transliteration function
CREATE OR REPLACE FUNCTION cyrillic_transliterate(p_string text)
RETURNS character varying AS $BODY$
SELECT replace( replace( replace( replace(
replace( replace( replace( replace( translate(lower(1ドル),
'абвгдеёзийклмнопрстуфхцэы', 'abvgdeezijklmnoprstufхcey'), 'ж', 'zh'),
'ч', 'ch'), 'ш', 'sh'), 'щ', 'shh'), 'ъ', ''), 'ю', 'yu'), 'я', 'ya'), 'ь', '');
$BODY$
LANGUAGE SQL IMMUTABLE COST 100;
--ALTER FUNCTION cyrillic_transliterate(p_string text) OWNER TO blog;
--Function import
CREATE OR REPLACE FUNCTION f_dynamic_copy(_file text
 , _tbl text = 'tmp1'
 , _delim text = E'\t'
 , _nodelim text = chr(127)) -- see below!
 RETURNS text
 LANGUAGE plpgsql AS
$func$
DECLARE
 row_ct int;
BEGIN
 -- create staging table for 1st row as single text column 
 DO $$
 BEGIN
 DROP TABLE tmp0;
 EXCEPTION WHEN OTHERS
 THEN
 END;$$;
 
 CREATE TABLE tmp0(cols text);-- ON COMMIT DROP;
 -- fetch 1st row.
 EXECUTE format($$COPY tmp0 FROM PROGRAM 'powershell -command "& {get-content %s|select-object -first 1}" & chcp 1251' DELIMITER %L ENCODING 'WIN'$$ -- impossible delimiter
 , _file, _nodelim);
 UPDATE tmp0 SET cols=cyrillic_transliterate(cols);
 
 EXECUTE format('
 DO $$
 BEGIN
 DROP TABLE %I;
 EXCEPTION WHEN OTHERS
 THEN
 END;$$',_tbl);
 -- create actual temp table with all columns text
 EXECUTE (
 SELECT format('CREATE TABLE %I(', _tbl)
 || string_agg(quote_ident(col) || ' text', ',')
 || ')'
 FROM (SELECT cols FROM tmp0 LIMIT 1) t
 , unnest(string_to_array(t.cols, E';')) col
 );
 -- Import data
 EXECUTE format($$COPY %I FROM %L ENCODING 'WIN' HEADER NULL '\N' DELIMITER %L CSV$$
 , _tbl, _file, _delim);
 GET DIAGNOSTICS row_ct = ROW_COUNT;
 RETURN format('Created table %I with %s rows.', _tbl, row_ct);
END
$func$;
SELECT f_dynamic_copy('C:\...\file.csv','tmpp1',';');
SELECT * FROM tmpp1

score 6 · Answer 1 · 2015-07-01 02:27:29Z

Basics:

The 1st row of the CSV file has column names of defined format.
The PROGRAM clause of COPY and GET DIAGNOSTICS after COPY require Postgres 9.3+.
format() requires Postgres 9.1+
This works with pure standard Postgres - except for the head command that the shell is expected to provide. For Windows versions consider:
How to do what head, tail, more, less, sed do in Powershell?

Full automation

This function copies any table structure completely dynamically:

CREATE OR REPLACE FUNCTION f_dynamic_copy(_file text
 , _tbl text = 'tmp1'
 , _delim text = E'\t'
 , _nodelim text = chr(127)) -- see below!
 RETURNS text
 LANGUAGE plpgsql AS
$func$
DECLARE
 row_ct int;
BEGIN
 -- create staging table for 1st row as single text column 
 CREATE TEMP TABLE tmp0(cols text) ON COMMIT DROP;
 -- fetch 1st row
 EXECUTE format($$COPY tmp0 FROM PROGRAM 'head -n1 %I' WITH (DELIMITER %L)$$ -- impossible delimiter
 , _file, _nodelim);
 -- create actual temp table with all columns text
 EXECUTE (
 SELECT format('CREATE TEMP TABLE %I(', _tbl)
 || string_agg(quote_ident(col) || ' text', ',')
 || ')'
 FROM (SELECT cols FROM tmp0 LIMIT 1) t
 , unnest(string_to_array(t.cols, E'\t')) col
 );
 -- Import data
 EXECUTE format($$COPY %I FROM %L WITH (FORMAT csv, HEADER, NULL '\N', DELIMITER %L)$$
 , _tbl, _file, _delim);
 GET DIAGNOSTICS row_ct = ROW_COUNT;
 RETURN format('Created table %I with %s rows.', _tbl, row_ct);
END
$func$;

Call variants:

SELECT f_dynamic_copy('/path/to/file.csv');
SELECT f_dynamic_copy('/path/to/file2.csv', 'tmp_file2');
SELECT f_dynamic_copy(_file => '/path/to/file2.csv'
 , _tbl => 'tmp_file2');
 , _delim => E'\t'); -- using assignment operator since pg 9.5

Answer:

Created table tmp_file2 with 123 rows.

Before the main COPY, run a preliminary COPY ... TO tmp0 to fetch the first row with column names, which are expected to be unquoted, case-sensitive strings like COPY ... TO ... (FORMAT csv, HEADER) would export them.

The structure of the actual target table is derived from it, all columns with data type text. The default name of the resulting table is tmp1 - or provide your own as 2nd function parameter.

Then COPY is executed. The default delimiter is a tab character - or provide your delimiter as 3rd function parameter.

Use any single-byte character for the non-delimiter _nodelim which does not appear in the first line of your CSV file. I am arbitrarily picking the control character "Delete" (ASCII 127). That character would be swallowed here on SO, so I generate with chr(127) instead, which is also valid. Assuming the character won't pop up - or provide your non-delimiter as 4th function parameter.

The function returns table name and number of imported rows.
Remember, a temporary table dies with the end of the session.

The manual:

Executing a command with PROGRAM might be restricted by the operating system's access control mechanisms, such as SELinux.

Postgres 8.4

Postgres 8.4 is too old, not back-porting. Some hints:

GET DIAGNOSTICS is an optional feature. You can just leave it away or replace it with a full count on the table
A primitive (expensive) alternative for the PROGRAM clause of COPY in pg 9.3 would be to import the complete table instead:
```
 EXECUTE format($$COPY tmp0 FROM %L WITH (DELIMITER %L)$,ドル _file, _delim);
```

Or you prepare a second input file, or you can make it work by piping from the shell: COPY tablename FROM STDIN is available in pg 8.4.

format() can be replaced with plain string concatenation. Be wary of SQL injection though!
SQL injection in Postgres functions vs prepared queries

Well I don't have the option of upgrading at this time. So the line of code COPY tablename FROM STDIN is what I would use to replace all instances of EXECUTE format... or do I need it for just fetching the first row as the column names and the rest of the rows as data?
@John: Only for the first instance, the point of which is to fetch the first row only. You could even COPY the complete table another time, it just would be twice the cost. The second instance is a plain COPY (dynamically using input parameters). I also added info on format(). You should urgently consider upgrading to a current version.

CalZ CalZ 9344 silver badges10 bronze badges · Answer 2 · 2017-06-29 19:45:23Z

1

This program handles creating PostgreSQL tables based on most any CSV or JSON with one line:

https://github.com/lukasmartinelli/pgfutter

Share

Improve this answer

answered Jun 29, 2017 at 19:45

CalZ's user avatar

CalZ CalZ

9344 silver badges10 bronze badges

Add a comment |

user303118 user303118 1 · Answer 3 · 2024-08-27 19:10:32Z

I used the code example from the comment above and tweaked it a little for Windows and for the case when the column names have Russian characters (maybe it will be useful for someone)

-- transliteration function
CREATE OR REPLACE FUNCTION cyrillic_transliterate(p_string text)
RETURNS character varying AS $BODY$
SELECT replace( replace( replace( replace(
replace( replace( replace( replace( translate(lower(1ドル),
'абвгдеёзийклмнопрстуфхцэы', 'abvgdeezijklmnoprstufхcey'), 'ж', 'zh'),
'ч', 'ch'), 'ш', 'sh'), 'щ', 'shh'), 'ъ', ''), 'ю', 'yu'), 'я', 'ya'), 'ь', '');
$BODY$
LANGUAGE SQL IMMUTABLE COST 100;
--ALTER FUNCTION cyrillic_transliterate(p_string text) OWNER TO blog;
--Function import
CREATE OR REPLACE FUNCTION f_dynamic_copy(_file text
 , _tbl text = 'tmp1'
 , _delim text = E'\t'
 , _nodelim text = chr(127)) -- see below!
 RETURNS text
 LANGUAGE plpgsql AS
$func$
DECLARE
 row_ct int;
BEGIN
 -- create staging table for 1st row as single text column 
 DO $$
 BEGIN
 DROP TABLE tmp0;
 EXCEPTION WHEN OTHERS
 THEN
 END;$$;
 
 CREATE TABLE tmp0(cols text);-- ON COMMIT DROP;
 -- fetch 1st row.
 EXECUTE format($$COPY tmp0 FROM PROGRAM 'powershell -command "& {get-content %s|select-object -first 1}" & chcp 1251' DELIMITER %L ENCODING 'WIN'$$ -- impossible delimiter
 , _file, _nodelim);
 UPDATE tmp0 SET cols=cyrillic_transliterate(cols);
 
 EXECUTE format('
 DO $$
 BEGIN
 DROP TABLE %I;
 EXCEPTION WHEN OTHERS
 THEN
 END;$$',_tbl);
 -- create actual temp table with all columns text
 EXECUTE (
 SELECT format('CREATE TABLE %I(', _tbl)
 || string_agg(quote_ident(col) || ' text', ',')
 || ')'
 FROM (SELECT cols FROM tmp0 LIMIT 1) t
 , unnest(string_to_array(t.cols, E';')) col
 );
 -- Import data
 EXECUTE format($$COPY %I FROM %L ENCODING 'WIN' HEADER NULL '\N' DELIMITER %L CSV$$
 , _tbl, _file, _delim);
 GET DIAGNOSTICS row_ct = ROW_COUNT;
 RETURN format('Created table %I with %s rows.', _tbl, row_ct);
END
$func$;
SELECT f_dynamic_copy('C:\...\file.csv','tmpp1',';');
SELECT * FROM tmpp1

Stack Exchange Network

Copying CSV file to temp table with dynamic number of columns?

3 Answers 3

Full automation

Postgres 8.4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Copying CSV file to temp table with dynamic number of columns?

3 Answers 3

Full automation

Postgres 8.4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions