4

I have a tab delimited file in the format:

sentenceID (sid) documentID (scid) sentenceText (sent)

E.g.

100004 100 即便您喜爱流连酒吧,也定然在这轻松安闲的一隅,来一场甜蜜沉醉的约会。
100005 100 您可以慢慢探究菜单上所有的秘密惊喜。

I want to put it into sqlite3 with the following schema:

CREATE TABLE sent (
 sid INTEGER PRIMARY KEY,
 scid INTEGER,
 sent TEXT,
 );

Is there a quick way to use the python API for sqlite (http://docs.python.org/2/library/sqlite3.html) to put them into a table?

I've been doing it as such:

#!/usr/bin/python
# -*- coding: utf-8 -*-
import sqlite3 as lite
import sys, codecs
con = lite.connect('mycorpus.db')
with con: 
 cur = con.cursor()
 cur.execute("CREATE TABLE Corpus(sid INT, scid INT, sent TEXT, PRIMARY KEY (sid))")
 for line in codecs.read('corpus.tab','r','utf8'):
 sid,scid,sent = line.strip().split("\t")
 cur.execute("INSERT INTO Corpus VALUES("+sid+","+scid+"'"+sent+"')")
asked May 15, 2013 at 7:19
0

2 Answers 2

3

Here's an example using unicodecsv module:

#!/usr/bin/python
# -*- coding: utf-8 -*-
import sqlite3
import unicodecsv
con = sqlite3.connect('mycorpus.db')
cur = con.cursor()
cur.execute("CREATE TABLE Corpus(sid INT, scid INT, sent TEXT, PRIMARY KEY (sid))")
with open('corpus.tab', 'rb') as input_file:
 reader = unicodecsv.reader(input_file, delimiter="\t")
 data = [row for row in reader]
cur.executemany("INSERT INTO Corpus (sid, scid, sent) VALUES (?, ?, ?);", data)
con.commit()

Also see:

Hope that helps.

answered May 15, 2013 at 8:15
Sign up to request clarification or add additional context in comments.

2 Comments

quotes? I don't see quotes in your example. Anyway, you can pass appropriate quotechar to the reader object or set quoting. See docs.
=) No worries your code works for input without " or '. Just that some of my other lines on the file has crazy quotation marks.
3
 #!/usr/bin/python
 # -*- coding: utf-8 -*-
 import sqlite3 as lite
 con = lite.connect('myCorpus.db')
 cur = con.cursor() 
 cur.execute("CREATE TABLE Corpus(sid INT, scid INT, sent TEXT, PRIMARY KEY (sid))")
 data=[row.split('\t') for row in file('myfile.tab','r').readlines()]
 cur.executemany("INSERT INTO Corpus (sid, scid,sent) VALUES (?, ?, ?);", data)
 con.commit()
answered May 28, 2013 at 18:03

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.