I am using Python with pandas to import a CSV file into a table in Postgres
import pandas as pd
import psycopg2
from sqlalchemy import create_engine
df = pd.read_csv('products.csv', sep=';', low_memory=False)
engine = create_engine('postgresql://myuser:mypass@server/postgres')
df.to_sql('new_table', con=engine, if_exists='append', index=False, chunksize=20000)
The .csv file size is ~10GB. I left the script running for 15 hours but it's nowhere near finishing. What better way can I use to push the db to the server?
I can't import the db from the server directly because the compressed file size is larger than the size allowed.
1 Answer 1
I used psql
to push the CSV file to the table, as suggested by @a_horse_with_no_name.
psql -h port -d db -U user -c "\copy products from 'products.csv' with delimiter as ',' csv header;"
It only took a couple of minutes to copy the table, compared to 10+ hours with the python script.
copy ... from stdin
? That would be much faster.copy from stdin
streams the file from the client to the server. It does not load the whole file into memory - at least withpsql
and Java this is the case. I don't know Python though