3

I am very new to using Beautiful Soup and I'm trying to import data from the below url as a pandas dataframe. However, the final result has the correct columns names, but no numbers for the rows. What should I be doing instead?

Here is my code:

from bs4 import BeautifulSoup
import requests
def get_tables(html):
 soup = BeautifulSoup(html, 'html.parser')
 table = soup.find_all('table')
 return pd.read_html(str(table))[0]
url = 'https://www.cmegroup.com/trading/interest-rates/stir/eurodollar.html'
html = requests.get(url).content
get_tables(html)
asked Oct 4, 2020 at 18:38
1
  • Can you provide an output of what you are getting when you run the current code. And also can you share what your desired output should be. That will help us provide you some tips. Commented Oct 4, 2020 at 19:08

2 Answers 2

3

The data you see in the table is loaded from another URL via JavaScript. You can use this example to save the data to csv:

import json
import requests 
import pandas as pd
data = requests.get('https://www.cmegroup.com/CmeWS/mvc/Quotes/Future/1/G').json()
# uncomment this to print all data:
# print(json.dumps(data, indent=4))
df = pd.json_normalize(data['quotes'])
df.to_csv('data.csv')

Saves data.csv (screenshot from LibreOffice):

enter image description here

answered Oct 4, 2020 at 19:15
Sign up to request clarification or add additional context in comments.

2 Comments

How were you able to find cmegroup.com/CmeWS/mvc/Quotes/Future/1/G?
@Jojo I looked into Firefox developer tools -> Network tab (Chrome has something similar too). There are all requests the page is doing. One of these requests was this Json file.
1

The website you're trying to scrape data from is rendering the table values dynamically and using requests.get will only return the HTML the server sends prior to JavaScript rendering. You will have to find an alternative way of accessing the data or render the webpages JS (see this example).

A common way of doing this is to use selenium to automate a browser which allows you to render the JavaScript and get the source code that way.

Here is a quick example:

import time 
import pandas as pd 
from selenium.webdriver import Chrome
#Request the dynamically loaded page source 
c = Chrome(r'/path/to/webdriver.exe')
c.get('https://www.cmegroup.com/trading/interest-rates/stir/eurodollar.html')
#Wait for it to render in browser
time.sleep(5)
html_data = c.page_source
#Load into pd.DataFrame 
tables = pd.read_html(html_data)
df = tables[0]
df.columns = df.columns.droplevel() #Convert the MultiIndex to an Index 

Note that I didn't use BeautifulSoup, you can directly pass the html to pd.read_html. You'll have to do some more cleaning from there but that's the gist.

Alternatively, you can take a peak at requests-html which is a library that offers JavaScript rendering and might be able to help, search for a way to access the data as JSON or .csv from elsewhere and use that, etc.

answered Oct 4, 2020 at 19:25

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.