Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit c5d49d0

Browse files
Merge pull request avinashkranjan#1631 from Sushilverma002/master
ISSUE [avinashkranjan#1551] WEBSCARPING OF FILPKART MOBILE PHONE UNDER 50K
2 parents 285cd41 + 617283b commit c5d49d0

File tree

3 files changed

+1088
-0
lines changed

3 files changed

+1088
-0
lines changed

‎Flipkart_webscraping/Scrap.py‎

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
import pandas as pd
2+
import requests
3+
from bs4 import BeautifulSoup
4+
5+
Product_name=[]
6+
Prices=[]
7+
Description=[]
8+
Reviews=[]
9+
10+
for i in range(2,43):
11+
#url="https://www.flipkart.com"
12+
url="https://www.flipkart.com/search?q=MOBILE+PHONE+UNDER+50000&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off&page="+str(2)
13+
14+
r=requests.get(url)
15+
soup=BeautifulSoup(r.text,"lxml")
16+
17+
box=soup.find("div",class_="_1YokD2 _3Mn1Gg")
18+
names=box.find_all("div",class_="_4rR01T")
19+
20+
#scraping data 1.product name
21+
for i in names:
22+
name=i.text
23+
Product_name.append(name)
24+
25+
#2.prices
26+
prices=box.find_all("div",class_="_30jeq3 _1_WHN1")
27+
for i in prices:
28+
name=i.text
29+
Prices.append(name)
30+
31+
#3.description
32+
desc=box.find_all("ul",class_="_1xgFaf")
33+
for i in desc:
34+
name=i.text
35+
Description.append(name)
36+
37+
#4.reviews
38+
revi=box.find_all("div",class_="_3LWZlK")
39+
for i in revi:
40+
name=i.text
41+
Reviews.append(name)
42+
43+
#data frame
44+
df=pd.DataFrame({"Product Name":Product_name,"Prices":Prices,"Description":Description,"Reviews":Reviews})
45+
#print(df)
46+
47+
#DF TO CSV
48+
df.to_csv("filpkart-Scraping-under-50k.csv")
49+
50+
51+

‎Flipkart_webscraping/Steps.txt‎

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
WEB SCRAPING
2+
3+
We are doing web scraping of filpkart with Python, which will let us analyse the data from a specific website and store it in many formats such as CSV, txt, excell, and so on.
4+
this data can use for various reasons like for sentiment analyse and want to know specific review from multiple user.
5+
6+
-<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< STEPS <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
7+
8+
STEP 1;
9+
We are request to "flipkart" for scraping the data.
10+
requests.get :- function is use becoz to know the status code. request to flipkart from fetching the data in form of html
11+
response 200:- we can succesfully get the data in form web.
12+
13+
STEP 2:
14+
(i)=know how to deal with multiple pages :
15+
(ii)=format use-LXML = allows for easy handling of XML and HTML files, and can also be used for web scraping.
16+
(iii)=get the html of web in you vs or local so that u can work on it.
17+
(iv)=as their are many pages realted to SINGLE so now fetch data form multiple pages
18+
-try to find ancare tag <a> in the html of page
19+
-not for 2,3 just for NEXT page.
20+
-we have to find a tag of particular tag and for link href and print that.
21+
-in href there is link without the 'https' so to get we just add
22+
cnp="https://www.flipkart.com"+np
23+
24+
(v)=so for web scrap we have to fetch the link of all pages its time taking process so we create a loop for this procces which fetch all link for us.
25+
now we will use for loop to fetch data
26+
for i in range(1(start),10(end))
27+
to move multiple pages we have to use in last of link + [srt(i)]
28+
29+
(vi)=Decide want data want to scrap like:-
30+
-product name ,prize, reveiws ,description.
31+
-create list for every indivdual info.
32+
-Product_name=[]
33+
-Prices=[]
34+
-Description=[]
35+
-Reviews=[]
36+
37+
(vii)=now create a function for each info what u want to fetch and store that data into realted list.
38+
revi=soup.find_all("div",class_="_3LWZlK")
39+
for i in revi:
40+
name=i.text
41+
Reviews.append(name)
42+
print(Reviews)
43+
similarly do for all the list
44+
(viii)=point to remember that we are scraping data form parcticluar box or area so we have to specify that area making variable BOX.
45+
(xi)=now create the datafarme with the help of pandas pf.DATAFRAME({"key":value}) store int the form of key and value.
46+
no remember that we are scraping the data for multiple pages so DON'T FORGET TO RE APPLY THE FOR LOOP AND THE str(i) for multiple pages.
47+
48+
(xii)=last step to convet that data frame into csv file
49+
50+
STEP 3
51+
df.to_csv("filpkart-scraping-under-50k.csv")

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /