Commit f65dbab

authored

Merge pull request avinashkranjan#1060 from Himanshi2997/him

Real Estate Property Data avinashkranjan#1003

2 parents 8302b5b + 010d37a commit f65dbabCopy full SHA for f65dbab

File tree

3 files changed

+112

-0

lines changed

Real Estate Webscrapper

3 files changed

+112

-0

lines changed

`‎Real Estate Webscrapper/README.md`

Lines changed: 34 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,34 @@`
	`1`	`+## Real Estate Webscrapper`
	`2`	`+- It will take information from the real estate site and store it in the form of csv file making the data more organised and locally accessible.`
	`3`	`+`
	`4`	`+___`
	`5`	`+`
	`6`	`+## Requirements`
	`7`	`+- BeautifulSoup`
	`8`	`+- Pandas`
	`9`	`+---`
	`10`	`+## How To install`
	`11`	`+> pip install pandas`
	`12`	`+`
	`13`	`+> pip install beautifulsoup`
	`14`	`+---`
	`15`	`+- Now run the real_estate_webscrapper.py file to create the output2.csv file.`
	`16`	`+- Then output2.csv will be created in the same folder as real_estate_webscrapper.py file and it can be opened using Microsoft Excel.`
	`17`	`+---`
	`18`	`+### Step 1`
	`19`	`+- Load the website https://www.magicbricks.com/ready-to-move-flats-in-new-delhi-pppfs in your code using requests.`
	`20`	`+`
	`21`	`+### Step 2`
	`22`	`+- Use inspect in website to know which div contains the information that we need`
	`23`	`+- Use beautiful soup to load the information in program and store it into a dictionary for each property`
	`24`	`+`
	`25`	`+### Step 3`
	`26`	`+- Use pandas to convert the list of dictionaries to csv file`
	`27`	`+---`
	`28`	`+`
	`29`	`+## Author`
	`30`	`+[Himanshi2997](https://github.com/Himanshi2997)`
	`31`	`+---`
	`32`	`+`
	`33`	`+## Output`
	`34`	`+![output2](https://user-images.githubusercontent.com/67272318/118381259-b8b71f80-b606-11eb-983d-5d8094d05f06.PNG)`

`‎Real Estate Webscrapper/real_estate_webscrapper.py`

Lines changed: 75 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,75 @@`
	`1`	`+import requests`
	`2`	`+from bs4 import BeautifulSoup`
	`3`	`+import pandas`
	`4`	`+`
	`5`	`+`
	`6`	`+headers = {`
	`7`	`+ 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:55.0) Gecko/20100101 Firefox/55.0',`
	`8`	`+}`
	`9`	`+`
	`10`	`+r=requests.get("https://www.magicbricks.com/ready-to-move-flats-in-new-delhi-pppfs", headers=headers)`
	`11`	`+c=r.content`
	`12`	`+soup=BeautifulSoup(c,"html.parser")`
	`13`	`+`
	`14`	`+`
	`15`	`+complete_dataset = []`
	`16`	`+`
	`17`	`+`
	`18`	`+all_containers=soup.find_all("div",{"class":"flex relative clearfix m-srp-card__container"})`
	`19`	`+for item in all_containers:`
	`20`	`+ item_data={}`
	`21`	`+ try:`
	`22`	`+ Price=item.find("div",{"class":"m-srp-card__price"}).text.replace("\n","").replace(" ","").replace("₹","")`
	`23`	`+ p=Price.split()`
	`24`	`+ item_data["Price"]=p[0]`
	`25`	`+`
	`26`	`+ except:`
	`27`	`+ Price=item.find("span",{"class":"luxury-srp-card__price"}).text.replace("\n","").replace(" ","").replace("₹","")`
	`28`	`+ p=Price.split()`
	`29`	`+ item_data["Price"]=p[0]`
	`30`	`+`
	`31`	`+`
	`32`	`+ try:`
	`33`	`+ Pricepersqft=item.find("div",{"class":"m-srp-card__area"}).text.replace("₹","")`
	`34`	`+ pr=Pricepersqft.split()`
	`35`	`+ item_data["Pricepersqft"]=pr[0]`
	`36`	`+`
	`37`	`+ except:`
	`38`	`+ try:`
	`39`	`+ Pricepersqft=item.find("span",{"class":"luxury-srp-card__sqft"}).text.replace("\n","").replace(" ","").replace("₹","")`
	`40`	`+ pr=Pricepersqft.split()`
	`41`	`+ item_data["Pricepersqft"]=pr[0]`
	`42`	`+ except:`
	`43`	`+ item_data["Pricepersqft"]=None`
	`44`	`+`
	`45`	`+ try:`
	`46`	`+ item_data["Size"]=item.find("span",{"class":"m-srp-card__title__bhk"}).text.replace("\n","").strip()[0:5]`
	`47`	`+ except:`
	`48`	`+ item_data["Size"]=None`
	`49`	`+`
	`50`	`+`
	`51`	`+ title=item.find("span",{"class":"m-srp-card__title"})`
	`52`	`+`
	`53`	`+ words=(title.text.replace("in","")).split()`
	`54`	`+`
	`55`	`+ for i in range(len(words)):`
	`56`	`+ if words[i]=="sale" or words[i]=="Sale":`
	`57`	`+ break`
	`58`	`+ s=""`
	`59`	`+ for word in range(i+1,len(words)):`
	`60`	`+ s=s+words[word]+" "`
	`61`	`+`
	`62`	`+ item_data["Address"]=s`
	`63`	`+`
	`64`	`+ try:`
	`65`	`+ item_data["Carpet Area"]=item.find("div",{"class":"m-srp-card__summary__info"}).text`
	`66`	`+ except:`
	`67`	`+ item_data["Carpet Area"]=item.find("div",{"class":"luxury-srp-card__area__value"}).text`
	`68`	`+`
	`69`	`+`
	`70`	`+ complete_dataset.append(item_data)`
	`71`	`+`
	`72`	`+`
	`73`	`+`
	`74`	`+df=pandas.DataFrame(complete_dataset)`
	`75`	`+df.to_csv("./Real Estate Webscrapper/scraped.csv")`

`‎Real Estate Webscrapper/requirements.txt`

Lines changed: 3 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+requests==2.25.1`
	`2`	`+pandas==1.2.4`
	`3`	`+beautifulsoup4==4.9.3`

0 commit comments

Comments

(0)

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit f65dbab

File tree

3 files changed

3 files changed

`‎Real Estate Webscrapper/README.md`

`‎Real Estate Webscrapper/real_estate_webscrapper.py`

`‎Real Estate Webscrapper/requirements.txt`

0 commit comments