1
\$\begingroup\$

I've written a script which is able to perform reverse search in the website using the Name and Lid from a predefined CSV file. However, when the search is done then it can put the results containing Address and Phone Number adjacent to those Name and Lid creating a new CSV file. It is working errorlessly now. I tried to make the total process clean. Any suggestion to do betterment of this script will be highly appreciated. Here is the code I have tried with:

import csv
import requests
from lxml import html
with open("predefined.csv", "r") as f, open('newly_created.csv', 'w', newline='') as g:
 reader = csv.DictReader(f)
 newfieldnames = reader.fieldnames + ['Address', 'Phone']
 writer = csv.writer = csv.DictWriter(g, fieldnames = newfieldnames)
 writer.writeheader()
 for entry in reader:
 Page = "https://www.yellowpages.com/los-angeles-ca/mip/{}-{}".format(entry["Name"].replace(" ","-"), entry["Lid"])
 response = requests.get(Page)
 tree = html.fromstring(response.text)
 titles = tree.xpath('//article[contains(@class,"business-card")]')
 for title in tree.xpath('//article[contains(@class,"business-card")]'):
 Address= title.xpath('.//p[@class="address"]/span/text()')[0]
 Contact = title.xpath('.//p[@class="phone"]/text()')[0]
 print(Address,Contact)
 new_row = entry
 new_row['Address'] = Address
 new_row['Phone'] = Contact
 writer.writerow(new_row)

Here is the link to the search criteria of "predefined.csv" file.

Here is the link to the results.

alecxe
17.5k8 gold badges52 silver badges93 bronze badges
asked Jul 9, 2017 at 12:36
\$\endgroup\$

1 Answer 1

1
\$\begingroup\$

There are multiple things we can do to improve the code:

  • variable naming - try to be consistent with PEP8 naming suggestions - for instance:
    • Page should probably be page - or even better url
    • Address would be address
    • Contact would be contact
    • f can be input_file
    • g can be output_file
  • titles variable is never used
  • move the url format string into a constant
  • you don't need writer = csv.writer = csv.DictWriter(...) - just assign the writer to the DictWriter instance directly
  • since you are crawling the same domain, re-using requests.Session() instance should have a positive impact on performance
  • use .findtext() method instead of xpath() and then getting the first item
  • I would also create a separate crawl function to keep the web-scraping logic separate

Here is the modified code with the above and other improvements combined:

import csv
import requests
from lxml import html
URL_TEMPLATE = "https://www.yellowpages.com/los-angeles-ca/mip/{}-{}"
def crawl(entries):
 with requests.Session() as session:
 for entry in entries:
 url = URL_TEMPLATE.format(entry["Name"].replace(" ", "-"), entry["Lid"])
 response = session.get(url)
 tree = html.fromstring(response.text)
 titles = tree.xpath('//article[contains(@class,"business-card")]')
 for title in titles:
 address = title.findtext('.//p[@class="address"]/span')
 contact = title.findtext('.//p[@class="phone"]')
 print(address, contact)
 entry['Address'] = address
 entry['Phone'] = contact
 yield entry
if __name__ == '__main__':
 with open("predefined.csv", "r") as input_file, open('newly_created.csv', 'w', newline='') as output_file:
 reader = csv.DictReader(input_file)
 field_names = reader.fieldnames + ['Address', 'Phone']
 writer = csv.DictWriter(output_file, fieldnames=field_names)
 writer.writeheader()
 for entry in crawl(reader):
 writer.writerow(entry)

(not tested)

answered Jul 9, 2017 at 18:54
\$\endgroup\$
3
  • \$\begingroup\$ Thanks sir alecxe, for your elaborative review and the epic code. I tested it just now and found it working like as your code always does. Btw, is it a good idea to write the results creating another csv file other than the existing one? \$\endgroup\$ Commented Jul 9, 2017 at 19:25
  • \$\begingroup\$ @SMth80 you can do either of them technically, but I would probably keep input and output files as separate files just in case there is something wrong in the logic of the program and I don't want to have my file in an intermediate state. Thanks! \$\endgroup\$ Commented Jul 9, 2017 at 19:36
  • \$\begingroup\$ @Shahin yup, I've already seen this post - nice question. And you've got really good reviews - actually don't have anything valuable to add. Thanks for heads up! \$\endgroup\$ Commented Aug 16, 2017 at 21:20

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.