Create a table from a CSV file with headers

Question 1

I'm seeking to find a way to generate a new MySQL table solely based on the contents of a specified CSV. The CSV files I'll be using have the following properties;

"|" delimited.
First row specifies the column names (headers), also "|" delimited.
Column names & order are not fixed.
The number of columns is not fixed.
Files are of a large size (1 mil rows / 50 columns).

In Excel this is all rather simple, however with MySQL it does not appear to be (no luck with Google). Any suggestions on what I should be looking at?

Question 2

You can use csvsql, which is part of csvkit (a suite of utilities for converting to and working with CSV files):

Linux or Mac OS X
free and open source
sudo pip install csvkit
Example: csvsql --dialect mysql --snifflimit 100000 datatwithheaders.csv > mytabledef.sql
It creates a CREATE TABLE statement based on the file content. Column names are taken from the first line of the CSV file.

To extend on ivansabik's answer using pandas, see How to insert pandas dataframe via mysqldb into database?.

Question 3

csvsql is too slow for a reasonably large file. In my case, a 7.8M csv file takes 4+ minutes to finish.

Question 4

If you're ok with using Python, Pandas worked great for me (csvsql hanged forever and less cols and rows than in your case). Something like:

from sqlalchemy import create_engine
import pandas as pd
df = pd.read_csv('/PATH/TO/FILE.csv', sep='|')
# Optional, set your indexes to get Primary Keys
df = df.set_index(['COL A', 'COL B'])
engine = create_engine('mysql://user:pass@host/db', echo=False)
df.to_sql(table_name, engine, index=False)

Question 5

Where do you define dwh_engine? Is this a typo and you meant engine?

Question 6

Yes it should be engine! Corrected the answer thanks for spotting

Question 7

to_sql takes up too much time if the number of rows is high. For us, around 36000 rows took around 90 mins. A direct load statement was done in 3 seconds.

Question 8

You need to generate a CREATE TABLE based on datatypes, size, etc of the various columns.

Then you use LOAD DATA INFILE ... FIELDS TERMINATED BY '|' LINES TERMINATED BY "\n" SKIP 1 LINE ...; (See the manual page for details.)

Do likewise for each csv --> table.

score 15 · Answer 1 · 2015-12-24 19:32:33Z

You can use csvsql, which is part of csvkit (a suite of utilities for converting to and working with CSV files):

Linux or Mac OS X
free and open source
sudo pip install csvkit
Example: csvsql --dialect mysql --snifflimit 100000 datatwithheaders.csv > mytabledef.sql
It creates a CREATE TABLE statement based on the file content. Column names are taken from the first line of the CSV file.

To extend on ivansabik's answer using pandas, see How to insert pandas dataframe via mysqldb into database?.

csvsql is too slow for a reasonably large file. In my case, a 7.8M csv file takes 4+ minutes to finish.

ivansabik ivansabik 1312 bronze badges · Answer 2 · 2017-03-28 04:18:36Z

3

If you're ok with using Python, Pandas worked great for me (csvsql hanged forever and less cols and rows than in your case). Something like:

from sqlalchemy import create_engine
import pandas as pd
df = pd.read_csv('/PATH/TO/FILE.csv', sep='|')
# Optional, set your indexes to get Primary Keys
df = df.set_index(['COL A', 'COL B'])
engine = create_engine('mysql://user:pass@host/db', echo=False)
df.to_sql(table_name, engine, index=False)

Share

Improve this answer

edited Mar 28, 2017 at 21:36

answered Mar 28, 2017 at 4:18

ivansabik's user avatar

ivansabik ivansabik

1312 bronze badges

3

Where do you define dwh_engine? Is this a typo and you meant engine?

joanolo
– joanolo

2017年03月28日 07:00:19 +00:00
Commented Mar 28, 2017 at 7:00
Yes it should be engine! Corrected the answer thanks for spotting

ivansabik
– ivansabik

2017年03月28日 21:37:06 +00:00
Commented Mar 28, 2017 at 21:37
to_sql takes up too much time if the number of rows is high. For us, around 36000 rows took around 90 mins. A direct load statement was done in 3 seconds.

mvinayakam
– mvinayakam

2018年12月03日 10:32:46 +00:00
Commented Dec 3, 2018 at 10:32

Add a comment |

Rick James Rick James 80.7k5 gold badges52 silver badges119 bronze badges · Answer 3 · 2015-02-14 23:01:41Z

You need to generate a CREATE TABLE based on datatypes, size, etc of the various columns.

Then you use LOAD DATA INFILE ... FIELDS TERMINATED BY '|' LINES TERMINATED BY "\n" SKIP 1 LINE ...; (See the manual page for details.)

Do likewise for each csv --> table.

Stack Exchange Network

Create a table from a CSV file with headers

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Create a table from a CSV file with headers

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions