[SQL] Pick random rows from SELECT?
Peter Otten
__peter__ at web.de
Mon Sep 21 06:52:58 EDT 2009
Gilles Ganault wrote:
> I have a working Python script that SELECTs rows from a database to
> fetch a company's name from a web-based database.
>> Since this list is quite big and the site is the bottleneck, I'd like
> to run multiple instances of this script, and figured a solution would
> be to pick rows at random from the dataset, check in my local database
> if this item has already been taken care of, and if not, download
> details from the remote web site.
>> If someone's done this before, should I perform the randomization in
> the SQL query (SQLite using the APSW wrapper
> http://code.google.com/p/apsw/), or in Python?
>> Thank you.
>> Here's some simplified code:
>> sql = 'SELECT id,label FROM companies WHERE activity=1'
> rows=list(cursor.execute(sql))
> for row in rows:
> id = row[0]
> label = row[1]
>> print strftime("%H:%M")
> url = "http://www.acme.com/details.php?id=%s" % id
> req = urllib2.Request(url, None, headers)
> response = urllib2.urlopen(req).read()
>> name = re_name.search(response)
> if name:
> name = name.group(1)
> sql = 'UPDATE companies SET name=? WHERE id=?'
> cursor.execute(sql, (name,id) )
I don't think you need to randomize the requests. Instead you could control
a pool of worker processes using
http://docs.python.org/library/multiprocessing.html
Peter
More information about the Python-list
mailing list