Python - Copy tabular data from website in Excel on Ubuntu Server

Question 1

I have got an Ubuntu server and need to fetch data from following URL

https://beta.bseindia.com/corporates/shpPublicShareholder.aspx?scripcd=500034&qtrid=99.00&QtrName=September%202018

I will be modifying things in this URL to fetch data for different companies. That I will manage.

However the data is in tabular format on web page and I need help to export this to Excel file using Python on Ubuntu server.

In few similar solutions webdriver.Chrome() has been suggested, which I am not sure if would work on Ubuntu. There was one post which describes procedure to install drivers for Chrome, will that help?

https://tecadmin.net/setup-selenium-chromedriver-on-ubuntu/

Any help will be appreciated.

EDIT:

I used following code to get the tables

import requests
import pandas as pd
url = 'https://beta.bseindia.com/corporates/shpPublicShareholder.aspx?scripcd=500180&qtrid=99.00&QtrName=September%202018'
html = requests.get(url).content
df_list = pd.read_html(html)
df = df_list[-1]
print df
df.to_csv('my-data.csv')

However few of companies have two tables on the page and in that case this code copies the second table and leaves the main one.

Question 2

Are you trying to use web-crawler (selenium) to fetch data ??

Question 3

You can use pandas.read_html for this. As it:

Read HTML tables into a list of DataFrame objects.

And then you can save that dataframe object into csv via

data_frame_object.to_csv(<file name>.csv)

or you can save as pickle file via

import pickle
with open(<file name>, 'wb') as file:
 pickle.dump(<data frame object>, file)

You can learn more from this question

Question 4

Thanks Mohammad, The issue with other linked question you posted in that it creates multiple files even for sites which have single table. And then we need to manually get the right file out of them. However I just edited my question with a sample code that helps in case single table. But its not working when there are multiple tables. I only need the first table.

Question 5

But I guess the website itself provide data in such a random format, getting it smooth in all scenarios may not be feasible, I may have to deal with dual table company data manually

Question 6

I pointed you to that question since it was looking for smiliar result. And yes, I guess you have to deal with the dual table data manually.

Mohammad Zain Abbas 7881 gold badge11 silver badges24 bronze badges · Accepted Answer · 2018-10-23 08:12:46Z

0

You can use pandas.read_html for this. As it:

Read HTML tables into a list of DataFrame objects.

And then you can save that dataframe object into csv via

data_frame_object.to_csv(<file name>.csv)

or you can save as pickle file via

import pickle
with open(<file name>, 'wb') as file:
 pickle.dump(<data frame object>, file)

You can learn more from this question

Share

Improve this answer

answered Oct 23, 2018 at 8:12

Mohammad Zain Abbas's user avatar

Mohammad Zain Abbas

7881 gold badge11 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

SSD

SSD Over a year ago

Thanks Mohammad, The issue with other linked question you posted in that it creates multiple files even for sites which have single table. And then we need to manually get the right file out of them. However I just edited my question with a sample code that helps in case single table. But its not working when there are multiple tables. I only need the first table.

2018年10月23日T08:35:47.883Z+00:00

SSD

SSD Over a year ago

But I guess the website itself provide data in such a random format, getting it smooth in all scenarios may not be feasible, I may have to deal with dual table company data manually

2018年10月23日T08:50:33.377Z+00:00

Mohammad Zain Abbas

Mohammad Zain Abbas Over a year ago

I pointed you to that question since it was looking for smiliar result. And yes, I guess you have to deal with the dual table data manually.

2018年10月23日T11:02:08.72Z+00:00

CollectivesTM on Stack Overflow

Python - Copy tabular data from website in Excel on Ubuntu Server

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related