Skip to main content
Code Review

Return to Question

Notice removed Draw attention by Community Bot
Bounty Ended with Pimgd's answer chosen by Community Bot
Tweeted twitter.com/StackCodeReview/status/747785479731437568
added 1468 characters in body
Source Link
lese
  • 145
  • 1
  • 9

Trunk of a report Following a piece of the report obtained, as you can see, the datetime is unfortunately rounded at the second, but it is clear, to migrate each record less that 1 second is needed, most of the time is spended during the preparation of the record to be migrated

2016年06月23日 03:41:35 | SOURCE DB QUERY : SELECT resource_id, page, version, content, iv, NULL AS loid FROM repository_secview WHERE resource_id=1445333482345622 AND page=0 AND version=1
2016年06月23日 03:41:35 | DESTINATION DB QUERY : INSERT INTO file_secview (resource_id, version, page, file_oid, iv) VALUES (1445333482345622, 1, 1, 2095944, NULL)
2016年06月23日 03:41:38 | SOURCE DB QUERY : SELECT resource_id, page, version, content, iv, NULL AS loid FROM repository_secview WHERE resource_id=1445333501859624 AND page=0 AND version=1
2016年06月23日 03:41:38 | DESTINATION DB QUERY : INSERT INTO file_secview (resource_id, version, page, file_oid, iv) VALUES (1445333501859624, 1, 1, 2095945, NULL)
2016年06月23日 03:41:40 | SOURCE DB QUERY : SELECT resource_id, page, version, content, iv, NULL AS loid FROM repository_secview WHERE resource_id=1457424635012030 AND page=0 AND version=1
2016年06月23日 03:41:40 | DESTINATION DB QUERY : INSERT INTO file_secview (resource_id, version, page, file_oid, iv) VALUES (1457424635012030, 1, 1, 2095946, NULL)

Trunk of a report Following a piece of the report obtained, as you can see, the datetime is unfortunately rounded at the second, but it is clear, to migrate each record less that 1 second is needed, most of the time is spended during the preparation of the record to be migrated

2016年06月23日 03:41:35 | SOURCE DB QUERY : SELECT resource_id, page, version, content, iv, NULL AS loid FROM repository_secview WHERE resource_id=1445333482345622 AND page=0 AND version=1
2016年06月23日 03:41:35 | DESTINATION DB QUERY : INSERT INTO file_secview (resource_id, version, page, file_oid, iv) VALUES (1445333482345622, 1, 1, 2095944, NULL)
2016年06月23日 03:41:38 | SOURCE DB QUERY : SELECT resource_id, page, version, content, iv, NULL AS loid FROM repository_secview WHERE resource_id=1445333501859624 AND page=0 AND version=1
2016年06月23日 03:41:38 | DESTINATION DB QUERY : INSERT INTO file_secview (resource_id, version, page, file_oid, iv) VALUES (1445333501859624, 1, 1, 2095945, NULL)
2016年06月23日 03:41:40 | SOURCE DB QUERY : SELECT resource_id, page, version, content, iv, NULL AS loid FROM repository_secview WHERE resource_id=1457424635012030 AND page=0 AND version=1
2016年06月23日 03:41:40 | DESTINATION DB QUERY : INSERT INTO file_secview (resource_id, version, page, file_oid, iv) VALUES (1457424635012030, 1, 1, 2095946, NULL)
added 1825 characters in body
Source Link
lese
  • 145
  • 1
  • 9

Currently my problem is that when I have to deal with a big database (aboutbetween 12 and 30GB) , at a certain point the script slow down very much. Currently (Jun10 15.30) is still running and I started it Jun08 2.29 = 2 days and 13 hr of activities If my calculation is correct. That's too much for 30GB.

NoteCRASHES:. With smaller communities I used to open a buffered mysql connection, which seemed to be more performing, but when it comes this 30GB migration turn , the script really freeze with buffered connection30GB database, then I removed this mysql connection option, now connection is in streaming mode (the default)took 4 days of activity and then it just got killed from the system.

### EXECUTE QUERIES ON DBs
def execute_psql(cnx_psql, cursor, query, dataset = None, direct_commit = False):
 if dataset:
 try:
 cursor.execute( query + dataset )
 print "EXECUTED QUERY : " + query + dataset
 except psycopg2.Error as err:
 print "/!\ Cannot execute the query on " + query + dataset, err.pgerror
 cnx_psql.rollback()
 sys.exit( "Rolledback! And leaving early this lucky script, find out what is wrong" )
 else:
 print "The dataset for " + query + " is empty, skipping..."
 return cursor
def execute_msql(cursor, query):
 try:
 cursor.execute( query )
 except mysql.connector.Error as err:
 print "/!\ Cannot execute the following query:" , query
 print "/!\ Error:", err
 sys.exit( "leaving early this lucky script, find out what is wrong" )
 return cursor
### FILES migration
def convert_iv(iv):
 if iv:
 iv = bytearray((base64.b64decode(string)))
 return iv
def lobject_direct_migration(cnx_psql, blob):
 bytearray_to_string = str(bytearray(blob))
 loid = cnx_psql.lobject().oid
 try:
 cnx_psql.lobject(loid, mode='w').write( bytearray_to_string )
 print "lo_migration | new oid : "+str(loid)
 del bytearray_to_string
 except psycopg2.Error as err:
 print "/!\ Cannot insert large_object " + str(loid), err.pgerror
 cnx_psql.rollback()
 sys.exit( "Rollback! And leaving early this lucky script, find out what is wrong" )
 return loid

Logging and monitoring: Additionally, in order to check the system resources consuming, I've implemented the following function, and called it after each query, but the system memory usage was very stable, and under a warning limit.

def get_log():
 now = datetime.datetime.now()
 print '\n*** BEGIN LOGGING ***'
 print 'DATETIME: '+str(now)
 print 'MEMORY USAGE: '+str(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)+' Kilobyte'
 print 'TIME IN USER MODE (FLOAT): '+str(resource.getrusage(resource.RUSAGE_SELF).ru_utime)
 print 'TIME IN SYSTEM MODE (FLOAT): '+str(resource.getrusage(resource.RUSAGE_SELF).ru_stime)
 print 'SHARED MEMORY USAGE: '+str(resource.getrusage(resource.RUSAGE_SELF).ru_ixrss)+' Kilobyte'
 print 'UNSHARED MEMORY USAGE: '+str(resource.getrusage(resource.RUSAGE_SELF).ru_idrss)+' Kilobyte'
 print 'PAGE FAULTS NOT REQUIRING I/O: '+str(resource.getrusage(resource.RUSAGE_SELF).ru_minflt)
 print 'PAGE FAULTS REQUIRING I/O: '+str(resource.getrusage(resource.RUSAGE_SELF).ru_majflt)
 print 'NUMBER OF SWAP OUTS: '+str(resource.getrusage(resource.RUSAGE_SELF).ru_nswap)
 print 'BLOCK INPUT OPERATIONS: '+str(resource.getrusage(resource.RUSAGE_SELF).ru_inblock)
 print 'BLOCK OUTPUT OPERATIONS: '+str(resource.getrusage(resource.RUSAGE_SELF).ru_oublock)
 print 'MESSAGES SENT: '+str(resource.getrusage(resource.RUSAGE_SELF).ru_msgsnd)
 print 'MESSAGES RECEIVED: '+str(resource.getrusage(resource.RUSAGE_SELF).ru_msgrcv)
 print 'SIGNALS RECEIVED: '+str(resource.getrusage(resource.RUSAGE_SELF).ru_nsignals)
 print 'VOLUNTARY CONTEXT SWITCHES: '+str(resource.getrusage(resource.RUSAGE_SELF).ru_nvcsw)
 print 'INVOLUNTARY CONTEXT SWITCHES: '+str(resource.getrusage(resource.RUSAGE_SELF).ru_nivcsw)
 ## LIMITS ON RESOURCES USAGE
 #print str(resource.getrlimit(resource.RLIMIT_CORE))
 #print str(resource.getrlimit(resource.RLIMIT_CPU))
 #print str(resource.getrlimit(resource.RLIMIT_FSIZE))
 
 print '*** END LOGGING ***\n'
 return True

Currently my problem is that when I have to deal with a big database (about 30GB) , at a certain point the script slow down very much. Currently (Jun10 15.30) is still running and I started it Jun08 2.29 = 2 days and 13 hr of activities If my calculation is correct. That's too much for 30GB.

Note: With smaller communities I used to open a buffered mysql connection, which seemed to be more performing, but when it comes this 30GB migration turn , the script really freeze with buffered connection, then I removed this mysql connection option, now connection is in streaming mode (the default).

### EXECUTE QUERIES ON DBs
def execute_psql(cnx_psql, cursor, query, dataset = None, direct_commit = False):
 if dataset:
 try:
 cursor.execute( query + dataset )
 print "EXECUTED QUERY : " + query + dataset
 except psycopg2.Error as err:
 print "/!\ Cannot execute the query on " + query + dataset, err.pgerror
 cnx_psql.rollback()
 sys.exit( "Rolledback! And leaving early this lucky script, find out what is wrong" )
 else:
 print "The dataset for " + query + " is empty, skipping..."
 return cursor
def execute_msql(cursor, query):
 try:
 cursor.execute( query )
 except mysql.connector.Error as err:
 print "/!\ Cannot execute the following query:" , query
 print "/!\ Error:", err
 sys.exit( "leaving early this lucky script, find out what is wrong" )
 return cursor
### FILES migration
def convert_iv(iv):
 if iv:
 iv = bytearray((base64.b64decode(string)))
 return iv
def lobject_direct_migration(cnx_psql, blob):
 bytearray_to_string = str(bytearray(blob))
 loid = cnx_psql.lobject().oid
 try:
 cnx_psql.lobject(loid, mode='w').write( bytearray_to_string )
 print "lo_migration | new oid : "+str(loid)
 del bytearray_to_string
 except psycopg2.Error as err:
 print "/!\ Cannot insert large_object " + str(loid), err.pgerror
 cnx_psql.rollback()
 sys.exit( "Rollback! And leaving early this lucky script, find out what is wrong" )
 return loid

Currently my problem is that when I have to deal with a big database (between 12 and 30GB) , at a certain point the script CRASHES. With the 30GB database, this took 4 days of activity and then it just got killed from the system.

### EXECUTE QUERIES ON DBs
def execute_psql(cnx_psql, cursor, query, dataset = None, direct_commit = False):
 if dataset:
 try:
 cursor.execute( query + dataset )
 print "EXECUTED QUERY : " + query + dataset
 except psycopg2.Error as err:
 print "/!\ Cannot execute the query on " + query + dataset, err.pgerror
 cnx_psql.rollback()
 sys.exit( "Rolledback! And leaving early this lucky script, find out what is wrong" )
 else:
 print "The dataset for " + query + " is empty, skipping..."
 return cursor
def execute_msql(cursor, query):
 try:
 cursor.execute( query )
 except mysql.connector.Error as err:
 print "/!\ Cannot execute the following query:" , query
 print "/!\ Error:", err
 sys.exit( "leaving early this lucky script, find out what is wrong" )
 return cursor
### FILES migration
def convert_iv(iv):
 if iv:
 iv = bytearray((base64.b64decode(string)))
 return iv
def lobject_direct_migration(cnx_psql, blob):
 bytearray_to_string = str(bytearray(blob))
 loid = cnx_psql.lobject().oid
 try:
 cnx_psql.lobject(loid, mode='w').write( bytearray_to_string )
 print "lo_migration | new oid : "+str(loid)
 del bytearray_to_string
 except psycopg2.Error as err:
 print "/!\ Cannot insert large_object " + str(loid), err.pgerror
 cnx_psql.rollback()
 sys.exit( "Rollback! And leaving early this lucky script, find out what is wrong" )
 return loid

Logging and monitoring: Additionally, in order to check the system resources consuming, I've implemented the following function, and called it after each query, but the system memory usage was very stable, and under a warning limit.

def get_log():
 now = datetime.datetime.now()
 print '\n*** BEGIN LOGGING ***'
 print 'DATETIME: '+str(now)
 print 'MEMORY USAGE: '+str(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)+' Kilobyte'
 print 'TIME IN USER MODE (FLOAT): '+str(resource.getrusage(resource.RUSAGE_SELF).ru_utime)
 print 'TIME IN SYSTEM MODE (FLOAT): '+str(resource.getrusage(resource.RUSAGE_SELF).ru_stime)
 print 'SHARED MEMORY USAGE: '+str(resource.getrusage(resource.RUSAGE_SELF).ru_ixrss)+' Kilobyte'
 print 'UNSHARED MEMORY USAGE: '+str(resource.getrusage(resource.RUSAGE_SELF).ru_idrss)+' Kilobyte'
 print 'PAGE FAULTS NOT REQUIRING I/O: '+str(resource.getrusage(resource.RUSAGE_SELF).ru_minflt)
 print 'PAGE FAULTS REQUIRING I/O: '+str(resource.getrusage(resource.RUSAGE_SELF).ru_majflt)
 print 'NUMBER OF SWAP OUTS: '+str(resource.getrusage(resource.RUSAGE_SELF).ru_nswap)
 print 'BLOCK INPUT OPERATIONS: '+str(resource.getrusage(resource.RUSAGE_SELF).ru_inblock)
 print 'BLOCK OUTPUT OPERATIONS: '+str(resource.getrusage(resource.RUSAGE_SELF).ru_oublock)
 print 'MESSAGES SENT: '+str(resource.getrusage(resource.RUSAGE_SELF).ru_msgsnd)
 print 'MESSAGES RECEIVED: '+str(resource.getrusage(resource.RUSAGE_SELF).ru_msgrcv)
 print 'SIGNALS RECEIVED: '+str(resource.getrusage(resource.RUSAGE_SELF).ru_nsignals)
 print 'VOLUNTARY CONTEXT SWITCHES: '+str(resource.getrusage(resource.RUSAGE_SELF).ru_nvcsw)
 print 'INVOLUNTARY CONTEXT SWITCHES: '+str(resource.getrusage(resource.RUSAGE_SELF).ru_nivcsw)
 ## LIMITS ON RESOURCES USAGE
 #print str(resource.getrlimit(resource.RLIMIT_CORE))
 #print str(resource.getrlimit(resource.RLIMIT_CPU))
 #print str(resource.getrlimit(resource.RLIMIT_FSIZE))
 
 print '*** END LOGGING ***\n'
 return True
Notice added Draw attention by lese
Bounty Started worth 100 reputation by lese
added 240 characters in body
Source Link
lese
  • 145
  • 1
  • 9

Does anyone see a bottleneck? In my opinion the issue could lay into the following conversion into string, located into lobject_direct_migration() function, but I don't see any other way to perform the same action:

bytearray_to_string = str(bytearray(blob))

Any way to tune this snippet in order to have better performance? any improvement tip is welcome. I would like to decrease the execution time to 1day maximum : )

Does anyone see a bottleneck? Any way to tune this snippet in order to have better performance? any improvement tip is welcome. I would like to decrease the execution time to 1day maximum : )

Does anyone see a bottleneck? In my opinion the issue could lay into the following conversion into string, located into lobject_direct_migration() function, but I don't see any other way to perform the same action:

bytearray_to_string = str(bytearray(blob))

Any way to tune this snippet in order to have better performance? any improvement tip is welcome. I would like to decrease the execution time to 1day maximum : )

Post Undeleted by lese
Post Deleted by lese
deleted 43 characters in body; edited tags; edited title; edited title
Source Link
200_success
  • 145.6k
  • 22
  • 190
  • 479
Loading
Source Link
lese
  • 145
  • 1
  • 9
Loading
default

AltStyle によって変換されたページ (->オリジナル) /