I have got an Ubuntu server and need to fetch data from following URL
I will be modifying things in this URL to fetch data for different companies. That I will manage.
However the data is in tabular format on web page and I need help to export this to Excel file using Python on Ubuntu server.
In few similar solutions webdriver.Chrome() has been suggested, which I am not sure if would work on Ubuntu. There was one post which describes procedure to install drivers for Chrome, will that help?
https://tecadmin.net/setup-selenium-chromedriver-on-ubuntu/
Any help will be appreciated.
EDIT:
I used following code to get the tables
import requests
import pandas as pd
url = 'https://beta.bseindia.com/corporates/shpPublicShareholder.aspx?scripcd=500180&qtrid=99.00&QtrName=September%202018'
html = requests.get(url).content
df_list = pd.read_html(html)
df = df_list[-1]
print df
df.to_csv('my-data.csv')
However few of companies have two tables on the page and in that case this code copies the second table and leaves the main one.
-
Are you trying to use web-crawler (selenium) to fetch data ??Mohammad Zain Abbas– Mohammad Zain Abbas2018年10月23日 07:51:03 +00:00Commented Oct 23, 2018 at 7:51
1 Answer 1
You can use pandas.read_html for this. As it:
Read HTML tables into a list of DataFrame objects.
And then you can save that dataframe object into csv via
data_frame_object.to_csv(<file name>.csv)
or you can save as pickle file via
import pickle
with open(<file name>, 'wb') as file:
pickle.dump(<data frame object>, file)
You can learn more from this question