This seems similar to my previous post (i'll link at the bottom), but this is a different url and it uses tables. when i run the following code, i can get all of the data within that extracted:
import requests
from bs4 import BeautifulSoup
url = "https://www.nascar.com/wp-content/plugins/raw-feed/raw-feed.php"
r = requests.get(url)
soup = BeautifulSoup(r.text, "lxml")
try:
data = soup.find('div', class_='div-col1')
print(data)
except:
print("You Get Nothing!")
I then change up the try to
try:
data = soup.find_all('td', class_='car')
print(data)
except:
print("You Get Nothing!")
and I am only getting the info pulled from the thead and not the tbody
Is there something i'm missing, or doing wrong? The further in i try to nail down, i either error out, or just get a return of empty [ ]
Also, this webpage is Dynamic, and i tried what was given to me in my previous thread Old Post, and i understand the layout and coding between the 2 pages is different, but my concern with that is that loading Chrome every time I run the script will be a lot since it will probably need tp be refreshed every 30sec-1min 300-400 times.
2 Answers 2
why don't you just go directly with the source, if you see the page source of the link it is getting data from https://www.nascar.com/live/feeds/live-feed.json, with that you can easily get the data in json format and parse it as you like.
import requests
import json
url = "https://www.nascar.com/live/feeds/live-feed.json"
res = requests.get(url)
print(r.json())
5 Comments
import json line, it is not needed for response.json() and may confuse others.requests.json() is way faster than any other approach that uses bs4.json module while using requests which has its own built-in response.json() parser).The data you wish to fetch from that page gets generated dynamically so when you make a http request using requests library, it can't handle that. However, you can try with new library from the same author requests-html. It is capable of handling dynamically generated content. This is how you can go with this new library:
import requests_html
URL = "https://www.nascar.com/wp-content/plugins/raw-feed/raw-feed.php"
with requests_html.HTMLSession() as session:
r = session.get(URL)
r.html.render(sleep=5)
for items in r.html.find('#pqrStatistic tr'):
data = [item.text for item in items.find("th,td")]
print(data)
Partial results:
['pos', 'car', 'driver', 'manuf', 'delta', 'laps', 'last lap', 'best time', 'best speed', 'best lap']
['1', '54', 'Kyle Benjamin(i)', '', '--', '161', '36.474', '20.198', '93.752', '8']
['2', '98', 'Grant Enfinger', '', '0.761', '161', '36.402', '20.144', '94.003', '157']
['3', '4', 'Todd Gilliland #', '', '1.407', '161', '36.359', '20.142', '94.013', '158']
['4', '8', 'John H. Nemechek(i)', '', '2.177', '161', '36.304', '20.234', '93.585', '31']
['5', '16', 'Brett Moffitt', '', '3.268', '161', '36.145', '20.359', '93.010', '8']