I m just started learning web scraping using Python. My aim is to web scrape the Realtime news for Bajaj Auto Ltd. from http://money.rediff.com/companies/Bajaj-Auto-Ltd/10540026.
The problem: I'm unable to extract the contents(i.e news).
from urllib.request import urlopen
from bs4 import BeautifulSoup
url = 'http://money.rediff.com/companies/Bajaj-Auto-Ltd/10540026'
data = urlopen(url)
soup = BeautifulSoup(data)
te=soup.find('a',attrs={'target':'_jbpinter'})
lis=te.find_all_next('a',attrs={'target':'_jbpinter'})
#print(lis)
for li in lis:
print(li.find('a').contents[0])
I m getting the error "AttributeError: 'NoneType' object has no attribute 'contents'" And I does not get the desired result.
Any input will be appreciated.
DisappointedByUnaccountableMod
6,8444 gold badges21 silver badges23 bronze badges
1 Answer 1
You are trying to get the a tag twice.
Replace
for li in lis:
print(li.find('a').contents[0])
with
for li in lis:
print(li.get_text())
and you get this output:
Need Different Rates For Different Products: Rahul Bajaj on GST
Reforms irrespective of Bihar results: Bajaj
Auto shares in focus; Tata Motors up over 5%
We believe new Avenger will stimulate the market: Bajaj Auto's Eric Vas
BHP Billiton pins future of Indonesian coal mine on new...
answered Nov 4, 2015 at 16:52
dstudeba
9,0583 gold badges34 silver badges42 bronze badges
Sign up to request clarification or add additional context in comments.
Comments
lang-py
liand see if there is actually anain there