Python table scrape returning no data

Question 1

This seems similar to my previous post (i'll link at the bottom), but this is a different url and it uses tables. when i run the following code, i can get all of the data within that extracted:

import requests
from bs4 import BeautifulSoup
url = "https://www.nascar.com/wp-content/plugins/raw-feed/raw-feed.php"
r = requests.get(url)
soup = BeautifulSoup(r.text, "lxml")
try:
 data = soup.find('div', class_='div-col1')
 print(data)
except:
 print("You Get Nothing!")

I then change up the try to

try:
 data = soup.find_all('td', class_='car')
 print(data)
except:
 print("You Get Nothing!")

and I am only getting the info pulled from the thead and not the tbody

Is there something i'm missing, or doing wrong? The further in i try to nail down, i either error out, or just get a return of empty [ ]

Also, this webpage is Dynamic, and i tried what was given to me in my previous thread Old Post, and i understand the layout and coding between the 2 pages is different, but my concern with that is that loading Chrome every time I run the script will be a lot since it will probably need tp be refreshed every 30sec-1min 300-400 times.

Question 2

why don't you just go directly with the source, if you see the page source of the link it is getting data from https://www.nascar.com/live/feeds/live-feed.json, with that you can easily get the data in json format and parse it as you like.

import requests
import json
url = "https://www.nascar.com/live/feeds/live-feed.json"
res = requests.get(url)
print(r.json())

Question 3

I should have mentioned that i'm very new at this, i didn't know i could do that, but this helps as well. Thank you!!

Question 4

@johnll, this is the perfect solution for the question. But, I guess it'll help the OP to understand a bit more if you showed how to use the JSON and print something, like, all the names. Also, remove the import json line, it is not needed for response.json() and may confuse others.

Question 5

@sbiondio, as you said, the page is updating the data continuously (about every 5 secs to be precise) by fetching the data from the link johnll has shown. You can get all the table items from this JSON. Also, requests.json() is way faster than any other approach that uses bs4.

Question 6

@KeyurPotdar Thank you for the clarification, this helps a lot!!! I'm playing around with what this it outputting now!

Question 7

@sbiondio, have a look at this question. Maybe it'll help you to understand it better. (Just remember that you don't have to use the seperate json module while using requests which has its own built-in response.json() parser).

Question 8

The data you wish to fetch from that page gets generated dynamically so when you make a http request using requests library, it can't handle that. However, you can try with new library from the same author requests-html. It is capable of handling dynamically generated content. This is how you can go with this new library:

import requests_html
URL = "https://www.nascar.com/wp-content/plugins/raw-feed/raw-feed.php"
with requests_html.HTMLSession() as session:
 r = session.get(URL)
 r.html.render(sleep=5)
 for items in r.html.find('#pqrStatistic tr'):
 data = [item.text for item in items.find("th,td")]
 print(data)

Partial results:

['pos', 'car', 'driver', 'manuf', 'delta', 'laps', 'last lap', 'best time', 'best speed', 'best lap']
['1', '54', 'Kyle Benjamin(i)', '', '--', '161', '36.474', '20.198', '93.752', '8']
['2', '98', 'Grant Enfinger', '', '0.761', '161', '36.402', '20.144', '94.003', '157']
['3', '4', 'Todd Gilliland #', '', '1.407', '161', '36.359', '20.142', '94.013', '158']
['4', '8', 'John H. Nemechek(i)', '', '2.177', '161', '36.304', '20.234', '93.585', '31']
['5', '16', 'Brett Moffitt', '', '3.268', '161', '36.145', '20.359', '93.010', '8']

Question 9

This may be just what i'm looking for! But when I try to run it, i get all kinds of errors. I installed requests_html, but the slew of errors were: Traceback (most recent call last): File "/Users/salbiondio4/Documents/App Creation/PythonScripts/NASCAR/livefeed.py", line 68, in <module> r.html.render(sleep=5) started with that... it probably doesn't help, but i'll do some digging

Question 10

It requires python 3.6.

Question 11

thought that might be the problem, but I'm running in PyCharm with python 3.6.2. Tried in terminal with python3, same errors. the start of it looks like it's trying to download chromium?? "[W:pyppeteer.chromium_downloader] start chromium download. Download may take a few minutes. Traceback (most recent call last):"

Question 12

Yes, it downloads chromium in the first run. However, in the second or third run (when you experiment for the first time), It should work. Did it fetch you the data along with errors or only the errors you have got so far?

Question 13

I have only gotten errors, no data. Could it be I always have Chromium install from my previous project? (just trying to come up with thoughts to help)

johnII 1,4431 gold badge14 silver badges20 bronze badges · Accepted Answer · 2018-03-26 16:09:35Z

2

why don't you just go directly with the source, if you see the page source of the link it is getting data from https://www.nascar.com/live/feeds/live-feed.json, with that you can easily get the data in json format and parse it as you like.

import requests
import json
url = "https://www.nascar.com/live/feeds/live-feed.json"
res = requests.get(url)
print(r.json())

Share

Improve this answer

answered Mar 26, 2018 at 16:09

johnII's user avatar

johnII

1,4431 gold badge14 silver badges20 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

sbiondio

sbiondio Over a year ago

I should have mentioned that i'm very new at this, i didn't know i could do that, but this helps as well. Thank you!!

2018年03月26日T16:50:09.693Z+00:00

Keyur Potdar

Keyur Potdar Over a year ago

@johnll, this is the perfect solution for the question. But, I guess it'll help the OP to understand a bit more if you showed how to use the JSON and print something, like, all the names. Also, remove the import json line, it is not needed for response.json() and may confuse others.

2018年03月26日T17:18:02.117Z+00:00

Keyur Potdar

Keyur Potdar Over a year ago

@sbiondio, as you said, the page is updating the data continuously (about every 5 secs to be precise) by fetching the data from the link johnll has shown. You can get all the table items from this JSON. Also, requests.json() is way faster than any other approach that uses bs4.

2018年03月26日T17:23:16.913Z+00:00

sbiondio

sbiondio Over a year ago

@KeyurPotdar Thank you for the clarification, this helps a lot!!! I'm playing around with what this it outputting now!

2018年03月26日T17:29:44.803Z+00:00

Keyur Potdar

Keyur Potdar Over a year ago

@sbiondio, have a look at this question. Maybe it'll help you to understand it better. (Just remember that you don't have to use the seperate json module while using requests which has its own built-in response.json() parser).

2018年03月26日T17:33:52.653Z+00:00

CollectivesTM on Stack Overflow

Python table scrape returning no data

2 Answers 2

5 Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

2 Answers 2

5 Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related