Trunk of a report Following a piece of the report obtained, as you can see, the datetime is unfortunately rounded at the second, but it is clear, to migrate each record less that 1 second is needed, most of the time is spended during the preparation of the record to be migrated
2016年06月23日 03:41:35 | SOURCE DB QUERY : SELECT resource_id, page, version, content, iv, NULL AS loid FROM repository_secview WHERE resource_id=1445333482345622 AND page=0 AND version=1
2016年06月23日 03:41:35 | DESTINATION DB QUERY : INSERT INTO file_secview (resource_id, version, page, file_oid, iv) VALUES (1445333482345622, 1, 1, 2095944, NULL)
2016年06月23日 03:41:38 | SOURCE DB QUERY : SELECT resource_id, page, version, content, iv, NULL AS loid FROM repository_secview WHERE resource_id=1445333501859624 AND page=0 AND version=1
2016年06月23日 03:41:38 | DESTINATION DB QUERY : INSERT INTO file_secview (resource_id, version, page, file_oid, iv) VALUES (1445333501859624, 1, 1, 2095945, NULL)
2016年06月23日 03:41:40 | SOURCE DB QUERY : SELECT resource_id, page, version, content, iv, NULL AS loid FROM repository_secview WHERE resource_id=1457424635012030 AND page=0 AND version=1
2016年06月23日 03:41:40 | DESTINATION DB QUERY : INSERT INTO file_secview (resource_id, version, page, file_oid, iv) VALUES (1457424635012030, 1, 1, 2095946, NULL)
Trunk of a report Following a piece of the report obtained, as you can see, the datetime is unfortunately rounded at the second, but it is clear, to migrate each record less that 1 second is needed, most of the time is spended during the preparation of the record to be migrated
2016年06月23日 03:41:35 | SOURCE DB QUERY : SELECT resource_id, page, version, content, iv, NULL AS loid FROM repository_secview WHERE resource_id=1445333482345622 AND page=0 AND version=1
2016年06月23日 03:41:35 | DESTINATION DB QUERY : INSERT INTO file_secview (resource_id, version, page, file_oid, iv) VALUES (1445333482345622, 1, 1, 2095944, NULL)
2016年06月23日 03:41:38 | SOURCE DB QUERY : SELECT resource_id, page, version, content, iv, NULL AS loid FROM repository_secview WHERE resource_id=1445333501859624 AND page=0 AND version=1
2016年06月23日 03:41:38 | DESTINATION DB QUERY : INSERT INTO file_secview (resource_id, version, page, file_oid, iv) VALUES (1445333501859624, 1, 1, 2095945, NULL)
2016年06月23日 03:41:40 | SOURCE DB QUERY : SELECT resource_id, page, version, content, iv, NULL AS loid FROM repository_secview WHERE resource_id=1457424635012030 AND page=0 AND version=1
2016年06月23日 03:41:40 | DESTINATION DB QUERY : INSERT INTO file_secview (resource_id, version, page, file_oid, iv) VALUES (1457424635012030, 1, 1, 2095946, NULL)
Currently my problem is that when I have to deal with a big database (aboutbetween 12 and 30GB) , at a certain point the script slow down very much. Currently (Jun10 15.30) is still running and I started it Jun08 2.29 = 2 days and 13 hr of activities If my calculation is correct. That's too much for 30GB.
NoteCRASHES:. With smaller communities I used to open a buffered mysql connection, which seemed to be more performing, but when it comes this 30GB migration turn , the script really freeze with buffered connection30GB database, then I removed this mysql connection option, now connection is in streaming mode (the default)took 4 days of activity and then it just got killed from the system.
### EXECUTE QUERIES ON DBs
def execute_psql(cnx_psql, cursor, query, dataset = None, direct_commit = False):
if dataset:
try:
cursor.execute( query + dataset )
print "EXECUTED QUERY : " + query + dataset
except psycopg2.Error as err:
print "/!\ Cannot execute the query on " + query + dataset, err.pgerror
cnx_psql.rollback()
sys.exit( "Rolledback! And leaving early this lucky script, find out what is wrong" )
else:
print "The dataset for " + query + " is empty, skipping..."
return cursor
def execute_msql(cursor, query):
try:
cursor.execute( query )
except mysql.connector.Error as err:
print "/!\ Cannot execute the following query:" , query
print "/!\ Error:", err
sys.exit( "leaving early this lucky script, find out what is wrong" )
return cursor
### FILES migration
def convert_iv(iv):
if iv:
iv = bytearray((base64.b64decode(string)))
return iv
def lobject_direct_migration(cnx_psql, blob):
bytearray_to_string = str(bytearray(blob))
loid = cnx_psql.lobject().oid
try:
cnx_psql.lobject(loid, mode='w').write( bytearray_to_string )
print "lo_migration | new oid : "+str(loid)
del bytearray_to_string
except psycopg2.Error as err:
print "/!\ Cannot insert large_object " + str(loid), err.pgerror
cnx_psql.rollback()
sys.exit( "Rollback! And leaving early this lucky script, find out what is wrong" )
return loid
Logging and monitoring: Additionally, in order to check the system resources consuming, I've implemented the following function, and called it after each query, but the system memory usage was very stable, and under a warning limit.
def get_log():
now = datetime.datetime.now()
print '\n*** BEGIN LOGGING ***'
print 'DATETIME: '+str(now)
print 'MEMORY USAGE: '+str(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)+' Kilobyte'
print 'TIME IN USER MODE (FLOAT): '+str(resource.getrusage(resource.RUSAGE_SELF).ru_utime)
print 'TIME IN SYSTEM MODE (FLOAT): '+str(resource.getrusage(resource.RUSAGE_SELF).ru_stime)
print 'SHARED MEMORY USAGE: '+str(resource.getrusage(resource.RUSAGE_SELF).ru_ixrss)+' Kilobyte'
print 'UNSHARED MEMORY USAGE: '+str(resource.getrusage(resource.RUSAGE_SELF).ru_idrss)+' Kilobyte'
print 'PAGE FAULTS NOT REQUIRING I/O: '+str(resource.getrusage(resource.RUSAGE_SELF).ru_minflt)
print 'PAGE FAULTS REQUIRING I/O: '+str(resource.getrusage(resource.RUSAGE_SELF).ru_majflt)
print 'NUMBER OF SWAP OUTS: '+str(resource.getrusage(resource.RUSAGE_SELF).ru_nswap)
print 'BLOCK INPUT OPERATIONS: '+str(resource.getrusage(resource.RUSAGE_SELF).ru_inblock)
print 'BLOCK OUTPUT OPERATIONS: '+str(resource.getrusage(resource.RUSAGE_SELF).ru_oublock)
print 'MESSAGES SENT: '+str(resource.getrusage(resource.RUSAGE_SELF).ru_msgsnd)
print 'MESSAGES RECEIVED: '+str(resource.getrusage(resource.RUSAGE_SELF).ru_msgrcv)
print 'SIGNALS RECEIVED: '+str(resource.getrusage(resource.RUSAGE_SELF).ru_nsignals)
print 'VOLUNTARY CONTEXT SWITCHES: '+str(resource.getrusage(resource.RUSAGE_SELF).ru_nvcsw)
print 'INVOLUNTARY CONTEXT SWITCHES: '+str(resource.getrusage(resource.RUSAGE_SELF).ru_nivcsw)
## LIMITS ON RESOURCES USAGE
#print str(resource.getrlimit(resource.RLIMIT_CORE))
#print str(resource.getrlimit(resource.RLIMIT_CPU))
#print str(resource.getrlimit(resource.RLIMIT_FSIZE))
print '*** END LOGGING ***\n'
return True
Currently my problem is that when I have to deal with a big database (about 30GB) , at a certain point the script slow down very much. Currently (Jun10 15.30) is still running and I started it Jun08 2.29 = 2 days and 13 hr of activities If my calculation is correct. That's too much for 30GB.
Note: With smaller communities I used to open a buffered mysql connection, which seemed to be more performing, but when it comes this 30GB migration turn , the script really freeze with buffered connection, then I removed this mysql connection option, now connection is in streaming mode (the default).
### EXECUTE QUERIES ON DBs
def execute_psql(cnx_psql, cursor, query, dataset = None, direct_commit = False):
if dataset:
try:
cursor.execute( query + dataset )
print "EXECUTED QUERY : " + query + dataset
except psycopg2.Error as err:
print "/!\ Cannot execute the query on " + query + dataset, err.pgerror
cnx_psql.rollback()
sys.exit( "Rolledback! And leaving early this lucky script, find out what is wrong" )
else:
print "The dataset for " + query + " is empty, skipping..."
return cursor
def execute_msql(cursor, query):
try:
cursor.execute( query )
except mysql.connector.Error as err:
print "/!\ Cannot execute the following query:" , query
print "/!\ Error:", err
sys.exit( "leaving early this lucky script, find out what is wrong" )
return cursor
### FILES migration
def convert_iv(iv):
if iv:
iv = bytearray((base64.b64decode(string)))
return iv
def lobject_direct_migration(cnx_psql, blob):
bytearray_to_string = str(bytearray(blob))
loid = cnx_psql.lobject().oid
try:
cnx_psql.lobject(loid, mode='w').write( bytearray_to_string )
print "lo_migration | new oid : "+str(loid)
del bytearray_to_string
except psycopg2.Error as err:
print "/!\ Cannot insert large_object " + str(loid), err.pgerror
cnx_psql.rollback()
sys.exit( "Rollback! And leaving early this lucky script, find out what is wrong" )
return loid
Currently my problem is that when I have to deal with a big database (between 12 and 30GB) , at a certain point the script CRASHES. With the 30GB database, this took 4 days of activity and then it just got killed from the system.
### EXECUTE QUERIES ON DBs
def execute_psql(cnx_psql, cursor, query, dataset = None, direct_commit = False):
if dataset:
try:
cursor.execute( query + dataset )
print "EXECUTED QUERY : " + query + dataset
except psycopg2.Error as err:
print "/!\ Cannot execute the query on " + query + dataset, err.pgerror
cnx_psql.rollback()
sys.exit( "Rolledback! And leaving early this lucky script, find out what is wrong" )
else:
print "The dataset for " + query + " is empty, skipping..."
return cursor
def execute_msql(cursor, query):
try:
cursor.execute( query )
except mysql.connector.Error as err:
print "/!\ Cannot execute the following query:" , query
print "/!\ Error:", err
sys.exit( "leaving early this lucky script, find out what is wrong" )
return cursor
### FILES migration
def convert_iv(iv):
if iv:
iv = bytearray((base64.b64decode(string)))
return iv
def lobject_direct_migration(cnx_psql, blob):
bytearray_to_string = str(bytearray(blob))
loid = cnx_psql.lobject().oid
try:
cnx_psql.lobject(loid, mode='w').write( bytearray_to_string )
print "lo_migration | new oid : "+str(loid)
del bytearray_to_string
except psycopg2.Error as err:
print "/!\ Cannot insert large_object " + str(loid), err.pgerror
cnx_psql.rollback()
sys.exit( "Rollback! And leaving early this lucky script, find out what is wrong" )
return loid
Logging and monitoring: Additionally, in order to check the system resources consuming, I've implemented the following function, and called it after each query, but the system memory usage was very stable, and under a warning limit.
def get_log():
now = datetime.datetime.now()
print '\n*** BEGIN LOGGING ***'
print 'DATETIME: '+str(now)
print 'MEMORY USAGE: '+str(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)+' Kilobyte'
print 'TIME IN USER MODE (FLOAT): '+str(resource.getrusage(resource.RUSAGE_SELF).ru_utime)
print 'TIME IN SYSTEM MODE (FLOAT): '+str(resource.getrusage(resource.RUSAGE_SELF).ru_stime)
print 'SHARED MEMORY USAGE: '+str(resource.getrusage(resource.RUSAGE_SELF).ru_ixrss)+' Kilobyte'
print 'UNSHARED MEMORY USAGE: '+str(resource.getrusage(resource.RUSAGE_SELF).ru_idrss)+' Kilobyte'
print 'PAGE FAULTS NOT REQUIRING I/O: '+str(resource.getrusage(resource.RUSAGE_SELF).ru_minflt)
print 'PAGE FAULTS REQUIRING I/O: '+str(resource.getrusage(resource.RUSAGE_SELF).ru_majflt)
print 'NUMBER OF SWAP OUTS: '+str(resource.getrusage(resource.RUSAGE_SELF).ru_nswap)
print 'BLOCK INPUT OPERATIONS: '+str(resource.getrusage(resource.RUSAGE_SELF).ru_inblock)
print 'BLOCK OUTPUT OPERATIONS: '+str(resource.getrusage(resource.RUSAGE_SELF).ru_oublock)
print 'MESSAGES SENT: '+str(resource.getrusage(resource.RUSAGE_SELF).ru_msgsnd)
print 'MESSAGES RECEIVED: '+str(resource.getrusage(resource.RUSAGE_SELF).ru_msgrcv)
print 'SIGNALS RECEIVED: '+str(resource.getrusage(resource.RUSAGE_SELF).ru_nsignals)
print 'VOLUNTARY CONTEXT SWITCHES: '+str(resource.getrusage(resource.RUSAGE_SELF).ru_nvcsw)
print 'INVOLUNTARY CONTEXT SWITCHES: '+str(resource.getrusage(resource.RUSAGE_SELF).ru_nivcsw)
## LIMITS ON RESOURCES USAGE
#print str(resource.getrlimit(resource.RLIMIT_CORE))
#print str(resource.getrlimit(resource.RLIMIT_CPU))
#print str(resource.getrlimit(resource.RLIMIT_FSIZE))
print '*** END LOGGING ***\n'
return True
Does anyone see a bottleneck? In my opinion the issue could lay into the following conversion into string, located into lobject_direct_migration()
function, but I don't see any other way to perform the same action:
bytearray_to_string = str(bytearray(blob))
Any way to tune this snippet in order to have better performance? any improvement tip is welcome. I would like to decrease the execution time to 1day maximum : )
Does anyone see a bottleneck? Any way to tune this snippet in order to have better performance? any improvement tip is welcome. I would like to decrease the execution time to 1day maximum : )
Does anyone see a bottleneck? In my opinion the issue could lay into the following conversion into string, located into lobject_direct_migration()
function, but I don't see any other way to perform the same action:
bytearray_to_string = str(bytearray(blob))
Any way to tune this snippet in order to have better performance? any improvement tip is welcome. I would like to decrease the execution time to 1day maximum : )
- 145.6k
- 22
- 190
- 479