Skip to main content
Code Review

Return to Answer

replaced http://stackoverflow.com/ with https://stackoverflow.com/
Source Link

Also take a look at this Stackoverflow post this Stackoverflow post to deal with the insert statement a bit better.

Also take a look at this Stackoverflow post to deal with the insert statement a bit better.

Also take a look at this Stackoverflow post to deal with the insert statement a bit better.

Source Link
ferada
  • 11.4k
  • 25
  • 65

For performance questions you should really take a profiler to it and make sure that the Python code isn't accidentally the culprit first.

The numbers you quote for this sound way too high, though of course we can't see the actual data.

Now for the code.

Even though this is running Python 2 it should still be made compatible with Python 3, in particular:

  • Use print as a function, that is, print(...), for consistency.
  • Avoid zip for big lists, instead use izip from itertools.
  • Use consistent syntax, also read PEP8 regarding naming, because Python names should be consistenly lower case with underscores - since your names are at least consistent that is less of an issue.
  • A program should have a if __name__ == "__main__": ... block, assuming this not part of a bigger script where such a block is already in place.`

Apart from that more generally:

  • Use more descriptive names. zz, dp, fp, mep, dat tell the reader absolutely nothing about the meaning.
  • Random float literals should be named, or create a separate
  • Use functions to reuse functionality and to decompose the whole program into clear steps.
  • Related things should be place together, e.g. the cursor call can be moved to the end where the database functionality is located instead of at the start of the loop. Btw. are you sure that you acquire a new transaction automatically? It would help to know which particular library is being used.

Also:

  • The replace call can be done in one step, ",ドル".replace("", "").

  • The XML extraction looks pretty verbose. Unfortunately I'm not so sure about the types, though I'd imagine this could be written a bit more clearly with at least a helper function for two of the steps, quantity and prime:

     def empty_or_text(thing):
     return '' if thing is None else thing.text
    

    It also makes sense to cache things instead of recomputing it all over again, in this case that applies to the findAll calls for "offersummary".

  • dat is simply zz[0:4].

  • Since strings are immutable the assignment of the empty string can be done in one line, e.g. a = b = "".

And finally as has been said, it might make sense to group inserts into larger batches unless you require the transactional behaviour for some reason, or you can't simply rerun a batch.

Also take a look at this Stackoverflow post to deal with the insert statement a bit better.

That's it. I'm afraid without a bit more information about the data structures it's harder to infer how it should be structured instead.

At this I point I have the following for the modified code:

url = getSignedUrl(params)
resp = requests.get(url)
responseSoup = BeautifulSoup(resp.text)
def empty_or_text(thing):
 return '' if thing is None else thing.text
# EXTRACT FROM XML
offerSummaries = responseSoup.findAll("offersummary")
quantity = [empty_or_text(product.amount)
 for product in offerSummaries]
prime = [empty_or_text(product.iseligibleforprime)
 for product in responseSoup("offer")]
price = [product.lowestnewprice is None else product.lowestnewprice.formattedprice.text
 for product in offerSummaries]
from itertools import izip
# UPDATE DB
for zz in izip(asins.split(","), price, quantity, prime):
 if zz[1] == "Too low to display":
 print("Scraping...")
 zz = scrapeTooLow(zz[0])
 priceFormat = mep = dp = fp = ""
 if zz[1] != "":
 priceFormat = float(zz[1].replace(",ドル", ""))
 mep = round(priceFormat * 1.35, 2)
 dp = round(priceFormat * 1.38, 2)
 fp = round(priceFormat * 1.17, 2)
 dat = zz[0:4] + [mep, dp, fp]
 print(dat)
 try:
 conn.cursor().execute("INSERT INTO data VALUES (%s, %s, %s, %s, %s, %s, %s)", dat)
 conn.commit()
 except:
 conn.rollback()
print(count, i)
default

AltStyle によって変換されたページ (->オリジナル) /