Couple things I'd try:
- switch to
requests
module reusing therequests.Session()
to let it reuse the same TCP connection:
..if you're making several requests to the same host, the underlying TCP connection will be reused, which can result in a significant performance increase
- use the "HEAD" HTTP method "HEAD" HTTP method (in case of
requests
you may need theallow_redirects=True
) - try out
Scrapy
web-scraping framework which is of an asynchronous nature and is based on thetwisted
network library. You would also move the CSV output part to an output pipeline output pipeline. - another thing to try is use the
grequests
library (requests
ongevent
)
Some micro-optimization ideas:
- move the
hdr
dictionary definition to the module level to avoid redefining it every timeurlResolution()
is called (and, since it is a constant use upper-case; and pick a more readable variable name -HEADERS
?)
Couple things I'd try:
- switch to
requests
module reusing therequests.Session()
to let it reuse the same TCP connection:
..if you're making several requests to the same host, the underlying TCP connection will be reused, which can result in a significant performance increase
- use the "HEAD" HTTP method (in case of
requests
you may need theallow_redirects=True
) - try out
Scrapy
web-scraping framework which is of an asynchronous nature and is based on thetwisted
network library. You would also move the CSV output part to an output pipeline. - another thing to try is use the
grequests
library (requests
ongevent
)
Some micro-optimization ideas:
- move the
hdr
dictionary definition to the module level to avoid redefining it every timeurlResolution()
is called (and, since it is a constant use upper-case; and pick a more readable variable name -HEADERS
?)
Couple things I'd try:
- switch to
requests
module reusing therequests.Session()
to let it reuse the same TCP connection:
..if you're making several requests to the same host, the underlying TCP connection will be reused, which can result in a significant performance increase
- use the "HEAD" HTTP method (in case of
requests
you may need theallow_redirects=True
) - try out
Scrapy
web-scraping framework which is of an asynchronous nature and is based on thetwisted
network library. You would also move the CSV output part to an output pipeline. - another thing to try is use the
grequests
library (requests
ongevent
)
Some micro-optimization ideas:
- move the
hdr
dictionary definition to the module level to avoid redefining it every timeurlResolution()
is called (and, since it is a constant use upper-case; and pick a more readable variable name -HEADERS
?)
Couple things I'd try:
- switch to
requests
module reusing therequests.Session()
to let it reuse the same TCP connection:
..if you're making several requests to the same host, the underlying TCP connection will be reused, which can result in a significant performance increase
- use the "HEAD" HTTP method (in case of
requests
you may need theallow_redirects=True
) - try out
Scrapy
web-scraping framework which is of an asynchronous nature and is based on thetwisted
network library. You would also move the CSV output part to an output pipeline. - another thing to try is use the
grequests
library (requests
ongevent
)
Some micro-optimization ideas:
- move the
hdr
dictionary definition to the module level to avoid redefining it every timeurlResolution()
is called (and, since it is a constant use upper-case; and pick a more readable variable name -HEADERS
?)
Couple things I'd try:
- switch to
requests
module reusing therequests.Session()
to let it reuse the same TCP connection:
..if you're making several requests to the same host, the underlying TCP connection will be reused, which can result in a significant performance increase
- use the "HEAD" HTTP method
- try out
Scrapy
web-scraping framework which is of an asynchronous nature and is based on thetwisted
network library. You would also move the CSV output part to an output pipeline. - another thing to try is use the
grequests
library (requests
ongevent
)
Some micro-optimization ideas:
- move the
hdr
dictionary definition to the module level to avoid redefining it every timeurlResolution()
is called (and, since it is a constant use upper-case; and pick a more readable variable name -HEADERS
?)
Couple things I'd try:
- switch to
requests
module reusing therequests.Session()
to let it reuse the same TCP connection:
..if you're making several requests to the same host, the underlying TCP connection will be reused, which can result in a significant performance increase
- use the "HEAD" HTTP method (in case of
requests
you may need theallow_redirects=True
) - try out
Scrapy
web-scraping framework which is of an asynchronous nature and is based on thetwisted
network library. You would also move the CSV output part to an output pipeline. - another thing to try is use the
grequests
library (requests
ongevent
)
Some micro-optimization ideas:
- move the
hdr
dictionary definition to the module level to avoid redefining it every timeurlResolution()
is called (and, since it is a constant use upper-case; and pick a more readable variable name -HEADERS
?)
Couple things I'd try:
- switch to
requests
module reusing therequests.Session()
to let it reuse the same TCP connection:
..if you're making several requests to the same host, the underlying TCP connection will be reused, which can result in a significant performance increase
- use the "HEAD" HTTP method
- try out
Scrapy
web-scraping framework which is of an asynchronous nature and is based on thetwisted
network library. You would also move the CSV output part to an output pipeline. - another thing to try is use the
grequests
library (requests
ongevent
)
Some micro-optimization ideas:
- move the
hdr
dictionary definition to the module level to avoid redefining it every timeurlResolution()
is called (and, since it is a constant use upper-case; and pick a more readable variable name -HEADERS
?)