I am trying to load a text file into a database. My text file is about 1.6GB. I need to write a python script to load the text file with all the headers into a database
Any guidelines on how I go about doing this?
thanks
-
1you need to provide some more information; surely python can do it! Post how the text-file is structured and the format the db expects the data in.Fredrik Pihl– Fredrik Pihl2011年09月07日 08:16:51 +00:00Commented Sep 7, 2011 at 8:16
-
2we need more information. Like what kind of text file do you have? CSV? What kind of database? MySQL? ...naeg– naeg2011年09月07日 08:18:07 +00:00Commented Sep 7, 2011 at 8:18
-
hi guys i am using MySQL database and a normal text file .txt,Clayton– Clayton2011年09月07日 08:30:57 +00:00Commented Sep 7, 2011 at 8:30
-
hi guys i am using MySQL database and a normal text file .txt, the format of my ext file is like this.... <==============================================================================> ,"start of article 1. some text 2. some text 3. some text MAINO " <==============================================================================> thats a sample of the format it is in the text file. its simply a\n article header and a summary in point form. The articles are seperated using ======>Clayton– Clayton2011年09月07日 08:43:03 +00:00Commented Sep 7, 2011 at 8:43
1 Answer 1
Using python is certainly possible. If you're reading into mysql, you might check out mysql-python. Reading the text file you can use file = open('filename','r'), and file.readline() to get each line and parse it.
However... there is an added overhead of using python. If the text file is orderly (that is, one reacord per row, each row having the same number of columns with a consistent delimiter such as a comma, tab, semi-colon, etc), then the most efficient way is to load it directly. In mysql, you'd do this something like:
LOAD DATA INFILE '/folder/input.dat' INTO TABLE destination_table
FIELDS TERMINATED BY ',' (id,field_1,field_2,field_3);
If you need some minor modifications to the file, such as changing commas or things at the beginning or end of the line you might use a command line sed (if you're on *nix or osx... you'll have to install if you're on windows).
Update
LOAD DATA INFILE will be quickest: http://dev.mysql.com/doc/refman/5.5/en/load-data.html
When you say "start of article 1. some text 2. some text 3. some text MAINO", are 1., 2., 3. and MAINO DIFFERENT FIELDS? If you had 2 fields such as a header and article, you might format your text document to look something like:
This is my header of the article.;;And here is my article, it's a nice
story about programming.
My article goes on for many lines.
And many more.
<==============================================================================>
This is my second article header.;;And here is article 2. It is.
And is.
Etc.
<==============================================================================>
Then, you could use LOAD DATA INFILE like:
LOAD DATA INFILE '/folder/filename.txt' INTO TABLE destination_table
LINES TERMINATED BY '<==============================================================================>'
FIELDS TERMINATED BY ';;' (header,article);