0

I have got an Ubuntu server and need to fetch data from following URL

https://beta.bseindia.com/corporates/shpPublicShareholder.aspx?scripcd=500034&qtrid=99.00&QtrName=September%202018

I will be modifying things in this URL to fetch data for different companies. That I will manage.

However the data is in tabular format on web page and I need help to export this to Excel file using Python on Ubuntu server.

In few similar solutions webdriver.Chrome() has been suggested, which I am not sure if would work on Ubuntu. There was one post which describes procedure to install drivers for Chrome, will that help?

https://tecadmin.net/setup-selenium-chromedriver-on-ubuntu/

Any help will be appreciated.

EDIT:

I used following code to get the tables

import requests
import pandas as pd
url = 'https://beta.bseindia.com/corporates/shpPublicShareholder.aspx?scripcd=500180&qtrid=99.00&QtrName=September%202018'
html = requests.get(url).content
df_list = pd.read_html(html)
df = df_list[-1]
print df
df.to_csv('my-data.csv')

However few of companies have two tables on the page and in that case this code copies the second table and leaves the main one.

Mohammad Zain Abbas
7881 gold badge11 silver badges24 bronze badges
asked Oct 22, 2018 at 12:23
1
  • Are you trying to use web-crawler (selenium) to fetch data ?? Commented Oct 23, 2018 at 7:51

1 Answer 1

0

You can use pandas.read_html for this. As it:

Read HTML tables into a list of DataFrame objects.

And then you can save that dataframe object into csv via

data_frame_object.to_csv(<file name>.csv)

or you can save as pickle file via

import pickle
with open(<file name>, 'wb') as file:
 pickle.dump(<data frame object>, file)

You can learn more from this question

answered Oct 23, 2018 at 8:12
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks Mohammad, The issue with other linked question you posted in that it creates multiple files even for sites which have single table. And then we need to manually get the right file out of them. However I just edited my question with a sample code that helps in case single table. But its not working when there are multiple tables. I only need the first table.
But I guess the website itself provide data in such a random format, getting it smooth in all scenarios may not be feasible, I may have to deal with dual table company data manually
I pointed you to that question since it was looking for smiliar result. And yes, I guess you have to deal with the dual table data manually.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.