In postgres, we can do COPY to export the data into file. But all the data will be saved into a file. But I want to chunk that file into small files. We can do this after the export.
But is there a way to export it during the COPY process?
OR in psycopg2 can we do this?
-
You could provide the table DDL so that an optimal field can be chosen?Vérace– Vérace2020年12月01日 13:58:47 +00:00Commented Dec 1, 2020 at 13:58
3 Answers 3
You can use a WHERE
condition:
COPY (SELECT * FROM tab WHERE id % 3 = 0) TO '/dir/file1.csv' (FORMAT 'csv');
COPY (SELECT * FROM tab WHERE id % 3 = 1) TO '/dir/file2.csv' (FORMAT 'csv');
COPY (SELECT * FROM tab WHERE id % 3 = 2) TO '/dir/file3.csv' (FORMAT 'csv');
Here id
stands for any numerical column. You can also use other data types and split up the data in other ways with an appropriate WHERE
condition.
-
WHERE id % 3 = 0
will this work only if i have the id column?TheDataGuy– TheDataGuy2020年12月01日 12:26:32 +00:00Commented Dec 1, 2020 at 12:26 -
I don't quite understand your question.Laurenz Albe– Laurenz Albe2020年12月01日 12:55:53 +00:00Commented Dec 1, 2020 at 12:55
-
in the where condition you mentioned that
id
, if I don't have this column in the table, then it wont run right?TheDataGuy– TheDataGuy2020年12月01日 13:09:29 +00:00Commented Dec 1, 2020 at 13:09 -
Ah, I see. I have added an explanation to the answer.Laurenz Albe– Laurenz Albe2020年12月01日 13:11:27 +00:00Commented Dec 1, 2020 at 13:11
But is there a way to export it during the COPY process?
Streaming the COPY data into a program is supported through the PROGRAM
clause.
The Unix command split
can do the actual splitting. For instance:
COPY (<your query>) TO PROGRAM 'split -l 10000 -d - /path/file-';
This will split the output into files numbered sequentially that do not exceed 10000 lines each.
The PROGRAM
clause is also supported by psql's \copy
; in that case the program is executed client-side.
Caveat: if the export format is CSV and there are multi-line text fields, it comes with the risk of splitting a field across multiple files.
If you already have a big file and have enough disc space you can use the method below.
Suppose that you have created a big file with copy command;
COPY (SELECT ... FROM ... WHERE ...) TO '/path/to/file';
Split files into chunks (Linux only);
split --number=l/10 --numeric-suffixes /path/to/file file_