6
\$\begingroup\$

I've written a script which parses name and price of different items from craigslist. Usually a script throws error when it finds the name or the price is None. I've fixed it and it fetches results successfully now. I hope I did it flawlessly.

import requests
from lxml import html
page = requests.get('http://bangalore.craigslist.co.in/search/rea?s=120').text
tree = html.fromstring(page)
rows = tree.xpath('//li[@class="result-row"]')
for row in rows:
 link = row.xpath('.//a[contains(@class,"hdrlnk")]/text()')[0] if len(row.xpath('.//a[contains(@class,"hdrlnk")]/text()'))>0 else ""
 price = row.xpath('.//span[@class="result-price"]/text()')[0] if len(row.xpath('.//span[@class="result-price"]/text()'))>0 else ""
 print (link,price)
Peilonrayz
44.4k7 gold badges80 silver badges157 bronze badges
asked May 26, 2017 at 9:54
\$\endgroup\$

2 Answers 2

4
\$\begingroup\$

It is usually easier to ask forgiveness than permission. You could just surround the statements with try..except blocks:

import requests
from lxml import html
page = requests.get('http://bangalore.craigslist.co.in/search/rea?s=120').text
tree = html.fromstring(page)
for row in tree.xpath('//li[@class="result-row"]'):
 try:
 link = row.xpath('.//a[contains(@class,"hdrlnk")]/text()')[0]
 except IndexError:
 link = ""
 try:
 price = row.xpath('.//span[@class="result-price"]/text()')[0]
 except IndexError:
 price = ""
 print (link, price)

If you have many such actions, you could put it into a function:

def get_if_exists(row, path, index=0, default=""):
 """
 Gets the object at `index` from the xpath `path` from `row`.
 Returns the `default` if it does not exist.
 """
 try:
 return row.xpath(path)[index]
 except IndexError:
 return default

Which you could use here like this:

for row in tree.xpath('//li[@class="result-row"]'):
 # Using the defined default values for index and default:
 link = get_if_exists(row, './/a[contains(@class,"hdrlnk")]/text()')
 # Manually setting them instead:
 price = get_if_exists(row, './/span[@class="result-price"]/text()', 0, "")
 print (link, price)
answered May 26, 2017 at 10:28
\$\endgroup\$
7
  • \$\begingroup\$ Thanks sir Graipher, for your suggestion. If I pursue your second method where it is written for many similar actions, it will save me a lot of hard work cause you know the way i have written the for loop in my script is tedious. I can't dovetail the script with your function, though! \$\endgroup\$ Commented May 26, 2017 at 12:07
  • \$\begingroup\$ @SMth80 I don't understand what you mean with "I can't dovetail the script". Do you mean run? It works fine for me, does it throw any error? \$\endgroup\$ Commented May 26, 2017 at 12:33
  • \$\begingroup\$ Nope sir, I meant, I can't rearrange my script with your function in it. I'm little behind in applying function suggested by you. \$\endgroup\$ Commented May 26, 2017 at 12:52
  • \$\begingroup\$ You just replace the for loop with the one I wrote? If you have more code where you use this, then you should have included it in the question. You can always ask a new question with more context included. \$\endgroup\$ Commented May 26, 2017 at 13:24
  • \$\begingroup\$ @SMth80 Yes, it is. It also seems to work for me, does it not work for you? \$\endgroup\$ Commented May 26, 2017 at 13:34
1
\$\begingroup\$

I have learnt findtext method very lately using which it is very easy to parse text content from xpath expressions without going through complicated process. The most charming feature of this findtext method is that it always gives the result as None (by default) when expected element is not present. Moreover, it makes the code concise and clean. If anyone stumbles across the aforesaid problem, he might wanna give this a try additionally.

import requests
from lxml import html
page = requests.get('http://bangalore.craigslist.co.in/search/rea?s=120').text
tree = html.fromstring(page)
for row in tree.xpath('//li[@class="result-row"]'):
 link = row.findtext('.//a[@data-id]')
 price = row.findtext('.//span[@class="result-price"]')
 print (link, price)
answered Jul 10, 2017 at 20:11
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.