1
\$\begingroup\$

I've written a script in Python Scrapy to harvest various product names and price from books.toscrape. The reason I submit this tiny code to Code Review is because, in Python 3 when it comes to work with Scrapy and parse some data from a web, the csv output looks awkward (if the csv is derived from default command, as in scrapy crawl toscrapesp -o items.csv -t csv). The results found in such CSV file are with a uniform gap between two lines that means there is a line gap between each two rows. I've fixed it using the below script. I didn't use default command to get the CSV output; rather, I've written few lines of code in spider class and got the desired output.

Although It is running smoothly, I'm not sure it is the ideal way of doing such thing. I expect someone to give any suggestion as to how I can improve this script.

"items.py" includes:

import scrapy
class ToscrapeItem(scrapy.Item):
 Name = scrapy.Field()
 Price = scrapy.Field()

Spider contains:

import csv
import scrapy
outfile = open("various_pro.csv", "w", newline='')
writer = csv.writer(outfile)
class ToscrapeSpider(scrapy.Spider):
 name = "toscrapesp"
 start_urls = ["http://books.toscrape.com/"]
 def parse(self, response):
 for link in response.css('.nav-list a::attr(href)').extract():
 yield scrapy.Request(url=response.urljoin(link), callback=self.collect_data)
 def collect_data(self, response):
 global writer 
 for item in response.css('.product_pod'):
 product = item.css('h3 a::text').extract_first()
 value = item.css('.price_color::text').extract_first()
 yield {'Name': product, 'Price': value} 
 writer.writerow([product,value])

Please click this link to see what I was having earlier. Upon executing the script, I get CSV output with no line gap or blank rows.

Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Sep 16, 2017 at 18:16
\$\endgroup\$

1 Answer 1

1
\$\begingroup\$

I don't think you should reinvent the wheel and provide your own CSV export. The following works for me as is (note the addition of .strip() calls - though I don't think they are necessary at all):

import scrapy
class ToscrapeSpider(scrapy.Spider):
 name = "toscrapesp"
 start_urls = ["http://books.toscrape.com/"]
 def parse(self, response):
 for link in response.css('.nav-list a::attr(href)').extract():
 yield scrapy.Request(url=response.urljoin(link), callback=self.collect_data)
 def collect_data(self, response):
 for item in response.css('.product_pod'):
 product = item.css('h3 a::text').extract_first().strip()
 value = item.css('.price_color::text').extract_first().strip()
 yield {'Name': product, 'Price': value} 

Running it with scrapy runspider spider.py -o output.csv -t csv produces a CSV file with no blank lines:

Price,Name
53ドル.74,Tipping the Velvet
29ドル.69,Forever and Forever: The ...
55ドル.53,A Flight of Arrows ...
36ドル.95,The House by the ...
30ドル.25,Mrs. Houdini
28ドル.08,The Marriage of Opposites 
...
answered Sep 18, 2017 at 2:15
\$\endgroup\$
8
  • \$\begingroup\$ Thanks sir for your kind reply. The thing is, I had been suffering from this "line gap" issue in the csv output for the last two years. Tried with several different ways but still no luck until I used the way I've shown above. I just ran the script rectifying the portion you suggested with .strip(), and got the output with the issue again. I don't know if it happens in my case only or with the people using python 3.5 as well. However, this is the reason I used that customized portion in my spider. Looks a bit awkward but it works. Thanks sir. \$\endgroup\$ Commented Sep 18, 2017 at 6:22
  • \$\begingroup\$ @Mithu got it. Okay, but can you be sure the problem is not with your CSV editor? What if you open the CSV file with a simple text editor - do you still see these blank lines there? Thanks. \$\endgroup\$ Commented Sep 18, 2017 at 13:36
  • \$\begingroup\$ Sorry sir alecxe, for this delayed response. I was not around. For your observation, I just uploaded a csv file derived from scrapy using your suggested command. What you said is a little bit tricky for me that is why I uploaded it. Perhaps you can understand, what basically the problem is. Thanks sir. Here goes the link: dropbox.com/s/xv7wfnzivshlu5m/items.csv?dl=0 \$\endgroup\$ Commented Sep 18, 2017 at 17:04
  • \$\begingroup\$ @Mithu ah, of course, this is this windows-specific problem. You should probably patch an item exporter like suggested here. Hope that helps. \$\endgroup\$ Commented Sep 18, 2017 at 20:07
  • \$\begingroup\$ One last thing sir: should i have to create this,i meant scrapy.exporters or it is located somewhere within scrapy projects like settings.py, middleware.py etc? Thanks sir. \$\endgroup\$ Commented Sep 18, 2017 at 20:44

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.