I have problems splitting the values of bulk-insert because the idea is to make 1 insert every 10 values at a time and reading the entire contents of CSV file
The code already inserts in a single line reading the entire CSV file but I am unable to perform the division of VALUES in the case in the future perform an insert of 10 thousand values at a time.
def bulk_insert(table_name, **kwargs):
mysqlConnection = MySqlHook(mysql_conn_id='id_db')
a = mysqlConnection.get_conn()
c = a.cursor()
with open('/pasta/arquivo.csv') as f:
reader = csv.reader(f, delimiter='\t')
sql ="""INSERT INTO user (id,user_name) VALUES"""
for row in reader:
sql +="(" + row[0] + " , '" + row[1] + "'),"
c.execute(sql[:-1])
a.commit()
1 Answer 1
Something like this ought to work. The batch_csv function is a generator that yields a list of rows of size size on each iteration.
The bulk_insert function is amended to use parameter substitution and the cursor's executemany method. Parameter substitution is safer than manually constructing SQL.
cursor.executemany may batch SQL inserts as in the original function, though this is implementation-dependent and should be tested.
def batch_csv(size=10):
with open('/pasta/arquivo.csv') as f:
reader = csv.reader(f, delimiter='\t')
batch = []
for row in reader:
batch.append(row)
if len(row) == size:
yield batch
del batch[:]
yield batch
def bulk_insert(table_name, **kwargs):
mysqlConnection = MySqlHook(mysql_conn_id='id_db')
a = mysqlConnection.get_conn()
c = a.cursor()
sql ="""INSERT INTO user (id,user_name) VALUES (%s, %s)"""
batcher = batch_csv()
for batch in batcher:
c.executemany(sql, [row[0:2] for row in batch])
a.commit()
1 Comment
INSERT INTO user (id,username) VALUES (1,'name'),(2,'name) INSERT INTO user (id,username) VALUES (3,'name'),(4,'name) ... It's not like that ... INSERT INTO user (id,username) VALUES (1,'name'),(2,'name),(3,'name'),(4,'name')... I think I should mess with a.commit () to do INSERT in batches.
LOAD DATAbulk insert tool. No need in re-inventing the wheel and trying to manually do this from a Python script.