I need to extract all the urls from an ip list, i wrote this python script, but i have issue extracting the same ip multiple times (more threads are created with the same ip). Could anyone Improve on my solution using multithreading ?
Sorry for my english Thanks all
import urllib2, os, re, sys, os, time, httplib, thread, argparse, random
try:
ListaIP = open(sys.argv[1], "r").readlines()
except(IOError):
print "Error: Check your IP list path\n"
sys.exit(1)
def getIP():
if len(ListaIP) != 0:
value = random.sample(ListaIP, 1)
ListaIP.remove(value[0])
return value
else:
print "\nListaIPs sa terminat\n"
sys.exit(1)
def extractURL(ip):
print ip + '\n'
page = urllib2.urlopen('http://sameip.org/ip/' + ip)
html = page.read()
links = re.findall(r'href=[\'"]?([^\'" >]+)', html)
outfile = open('2.log', 'a')
outfile.write("\n".join(links))
outfile.close()
def start():
while True:
if len(ListaIP) != 0:
test = getIP()
IP = ''.join(test).replace('\n', '')
extractURL(IP)
else:
break
for x in range(0, 10):
thread.start_new_thread( start, () )
while 1:
pass
1 Answer 1
use a threading.Lock. The lock should be global, and create at the beginning when you create the IP list.
lock.acquire at the start of getIP()
and release it before you leave the method.
What you are seeing is, thread 1 executes value=random.sample, and then thread 2 also executes value=random.sample before thread 1 gets to the remove. So the item is still in the list at the time thread 2 gets there.
Therefore both threads have a chance of getting the same IP.
2 Comments
with lock: statements (using context managers )
osonce; no need to import it twice.