Commit c5d49d0

authored

Merge pull request avinashkranjan#1631 from Sushilverma002/master

ISSUE [avinashkranjan#1551] WEBSCARPING OF FILPKART MOBILE PHONE UNDER 50K

2 parents 285cd41 + 617283b commit c5d49d0Copy full SHA for c5d49d0

File tree

3 files changed

+1088

-0

lines changed

Flipkart_webscraping

3 files changed

+1088

-0

lines changed

`‎Flipkart_webscraping/Scrap.py‎`

Lines changed: 51 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,51 @@`
	`1`	`+import pandas as pd`
	`2`	`+import requests`
	`3`	`+from bs4 import BeautifulSoup`
	`4`	`+`
	`5`	`+Product_name=[]`
	`6`	`+Prices=[]`
	`7`	`+Description=[]`
	`8`	`+Reviews=[]`
	`9`	`+`
	`10`	`+for i in range(2,43):`
	`11`	`+ #url="https://www.flipkart.com"`
	`12`	`+ url="https://www.flipkart.com/search?q=MOBILE+PHONE+UNDER+50000&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off&page="+str(2)`
	`13`	`+`
	`14`	`+ r=requests.get(url)`
	`15`	`+ soup=BeautifulSoup(r.text,"lxml")`
	`16`	`+`
	`17`	`+ box=soup.find("div",class_="_1YokD2 _3Mn1Gg")`
	`18`	`+ names=box.find_all("div",class_="_4rR01T")`
	`19`	`+`
	`20`	`+ #scraping data 1.product name`
	`21`	`+ for i in names:`
	`22`	`+ name=i.text`
	`23`	`+ Product_name.append(name)`
	`24`	`+`
	`25`	`+ #2.prices`
	`26`	`+ prices=box.find_all("div",class_="_30jeq3 _1_WHN1")`
	`27`	`+ for i in prices:`
	`28`	`+ name=i.text`
	`29`	`+ Prices.append(name)`
	`30`	`+`
	`31`	`+ #3.description`
	`32`	`+ desc=box.find_all("ul",class_="_1xgFaf")`
	`33`	`+ for i in desc:`
	`34`	`+ name=i.text`
	`35`	`+ Description.append(name)`
	`36`	`+`
	`37`	`+ #4.reviews`
	`38`	`+ revi=box.find_all("div",class_="_3LWZlK")`
	`39`	`+ for i in revi:`
	`40`	`+ name=i.text`
	`41`	`+ Reviews.append(name)`
	`42`	`+`
	`43`	`+ #data frame`
	`44`	`+ df=pd.DataFrame({"Product Name":Product_name,"Prices":Prices,"Description":Description,"Reviews":Reviews})`
	`45`	`+ #print(df)`
	`46`	`+`
	`47`	`+#DF TO CSV`
	`48`	`+df.to_csv("filpkart-Scraping-under-50k.csv")`
	`49`	`+`
	`50`	`+`
	`51`	`+`

`‎Flipkart_webscraping/Steps.txt‎`

Lines changed: 51 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,51 @@`
	`1`	`+ WEB SCRAPING`
	`2`	`+`
	`3`	`+We are doing web scraping of filpkart with Python, which will let us analyse the data from a specific website and store it in many formats such as CSV, txt, excell, and so on.`
	`4`	`+this data can use for various reasons like for sentiment analyse and want to know specific review from multiple user.`
	`5`	`+`
	`6`	`+-<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< STEPS <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<`
	`7`	`+`
	`8`	`+STEP 1;`
	`9`	`+We are request to "flipkart" for scraping the data.`
	`10`	`+requests.get :- function is use becoz to know the status code. request to flipkart from fetching the data in form of html`
	`11`	`+response 200:- we can succesfully get the data in form web.`
	`12`	`+`
	`13`	`+STEP 2:`
	`14`	`+(i)=know how to deal with multiple pages :`
	`15`	`+(ii)=format use-LXML = allows for easy handling of XML and HTML files, and can also be used for web scraping.`
	`16`	`+(iii)=get the html of web in you vs or local so that u can work on it.`
	`17`	`+(iv)=as their are many pages realted to SINGLE so now fetch data form multiple pages`
	`18`	`+ -try to find ancare tag <a> in the html of page`
	`19`	`+ -not for 2,3 just for NEXT page.`
	`20`	`+ -we have to find a tag of particular tag and for link href and print that.`
	`21`	`+ -in href there is link without the 'https' so to get we just add`
	`22`	`+ cnp="https://www.flipkart.com"+np`
	`23`	`+`
	`24`	`+(v)=so for web scrap we have to fetch the link of all pages its time taking process so we create a loop for this procces which fetch all link for us.`
	`25`	`+ now we will use for loop to fetch data`
	`26`	`+ for i in range(1(start),10(end))`
	`27`	`+ to move multiple pages we have to use in last of link + [srt(i)]`
	`28`	`+`
	`29`	`+(vi)=Decide want data want to scrap like:-`
	`30`	`+ -product name ,prize, reveiws ,description.`
	`31`	`+ -create list for every indivdual info.`
	`32`	`+ -Product_name=[]`
	`33`	`+ -Prices=[]`
	`34`	`+ -Description=[]`
	`35`	`+ -Reviews=[]`
	`36`	`+`
	`37`	`+(vii)=now create a function for each info what u want to fetch and store that data into realted list.`
	`38`	`+ revi=soup.find_all("div",class_="_3LWZlK")`
	`39`	`+ for i in revi:`
	`40`	`+ name=i.text`
	`41`	`+ Reviews.append(name)`
	`42`	`+ print(Reviews)`
	`43`	`+ similarly do for all the list`
	`44`	`+(viii)=point to remember that we are scraping data form parcticluar box or area so we have to specify that area making variable BOX.`
	`45`	`+(xi)=now create the datafarme with the help of pandas pf.DATAFRAME({"key":value}) store int the form of key and value.`
	`46`	`+ no remember that we are scraping the data for multiple pages so DON'T FORGET TO RE APPLY THE FOR LOOP AND THE str(i) for multiple pages.`
	`47`	`+`
	`48`	`+(xii)=last step to convet that data frame into csv file`
	`49`	`+`
	`50`	`+STEP 3`
	`51`	`+df.to_csv("filpkart-scraping-under-50k.csv")`

0 commit comments

Comments

(0)

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit c5d49d0

File tree

3 files changed

3 files changed

`‎Flipkart_webscraping/Scrap.py‎`

`‎Flipkart_webscraping/Steps.txt‎`

0 commit comments