Commit c6c1911

authored

Merge pull request avinashkranjan#896 from jhamadhav/link-preview

Project addition : Link preview

2 parents 55b4307 + e529328 commit c6c1911Copy full SHA for c6c1911

File tree

4 files changed

+179

-1

lines changed

.gitignore
Link-Preview

4 files changed

+179

-1

lines changed

`‎.gitignore‎`

Lines changed: 1 addition & 1 deletion

Original file line number	Diff line number	Diff line change
`@@ -678,4 +678,4 @@ geoip/`
`678`	`678`	`test.py`
`679`	`679`	`Test/`
`680`	`680`	`reddit_tokens.json`
`681`		`-scriptcopy.py`
	`681`	`+scriptcopy.py`

`‎Link-Preview/README.md‎`

Lines changed: 33 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,33 @@`
	`1`	`+# Link Preview`
	`2`	`+`
	`3`	`+A script to provide the user with a preview of the link entered.`
	`4`	`+`
	`5`	`+- When entered a link, the script will provide with title, description, and link of the website that the URL points to.`
	`6`	`+- The script will do so by fetching the Html file for the link and analyzing the data from there.`
	`7`	+- The data will be saved in a `JSON` file named `db.json` for further reference
	`8`	`+- Every entry will have a time limit after which it will be updated (Data expires after 7 days)`
	`9`	`+`
	`10`	`+## Setup instructions`
	`11`	`+`
	`12`	`+Download the required packages from the following command in you terminal.(Make sure you're in the same project directory)`
	`13`	`+`
	`14`	+```
	`15`	`+pip3 install -r requirements.txt`
	`16`	+```
	`17`	`+`
	`18`	`+## Running the script:`
	`19`	`+After installing all the requirements,run this command in your terminal.`
	`20`	`+`
	`21`	+```
	`22`	`+python3 linkPreview.py`
	`23`	+```
	`24`	`+`
	`25`	`+## Output`
	`26`	`+`
	`27`	`+The script will provide you with Title, Description, Image Link and URL.`
	`28`	`+`
	`29`	`+![demo gif](https://i.imgur.com/uoIG2io.gif)`
	`30`	`+`
	`31`	`+## Author(s)`
	`32`	`+Hi, I'm [Madhav Jha](https://github.com/jhamadhav) author of this script 🙋‍♂️`
	`33`	`+`

`‎Link-Preview/linkPreview.py‎`

Lines changed: 143 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,143 @@`
	`1`	`+import requests`
	`2`	`+import json`
	`3`	`+import os`
	`4`	`+import time`
	`5`	`+from bs4 import BeautifulSoup`
	`6`	`+`
	`7`	`+# to scrape title`
	`8`	`+`
	`9`	`+`
	`10`	`+def getTitle(soup):`
	`11`	`+ ogTitle = soup.find("meta", property="og:title")`
	`12`	`+`
	`13`	`+ twitterTitle = soup.find("meta", attrs={"name": "twitter:title"})`
	`14`	`+`
	`15`	`+ documentTitle = soup.find("title")`
	`16`	`+ h1Title = soup.find("h1")`
	`17`	`+ h2Title = soup.find("h2")`
	`18`	`+ pTitle = soup.find("p")`
	`19`	`+`
	`20`	`+ res = ogTitle or twitterTitle or documentTitle or h1Title or h2Title or pTitle`
	`21`	`+ res = res.get_text() or res.get("content", None)`
	`22`	`+`
	`23`	`+ if (len(res) > 60):`
	`24`	`+ res = res[0:60]`
	`25`	`+ if (res == None or len(res.split()) == 0):`
	`26`	`+ res = "Not available"`
	`27`	`+ return res.strip()`
	`28`	`+`
	`29`	`+# to scrape page description`
	`30`	`+`
	`31`	`+`
	`32`	`+def getDesc(soup):`
	`33`	`+ ogDesc = soup.find("meta", property="og:description")`
	`34`	`+`
	`35`	`+ twitterDesc = soup.find("meta", attrs={"name": "twitter:description"})`
	`36`	`+`
	`37`	`+ metaDesc = soup.find("meta", attrs={"name": "description"})`
	`38`	`+`
	`39`	`+ pDesc = soup.find("p")`
	`40`	`+`
	`41`	`+ res = ogDesc or twitterDesc or metaDesc or pDesc`
	`42`	`+ res = res.get_text() or res.get("content", None)`
	`43`	`+ if (len(res) > 60):`
	`44`	`+ res = res[0:60]`
	`45`	`+ if (res == None or len(res.split()) == 0):`
	`46`	`+ res = "Not available"`
	`47`	`+ return res.strip()`
	`48`	`+`
	`49`	`+# to scrape image link`
	`50`	`+`
	`51`	`+`
	`52`	`+def getImage(soup, url):`
	`53`	`+ ogImg = soup.find("meta", property="og:image")`
	`54`	`+`
	`55`	`+ twitterImg = soup.find("meta", attrs={"name": "twitter:image"})`
	`56`	`+`
	`57`	`+ metaImg = soup.find("link", attrs={"rel": "img_src"})`
	`58`	`+`
	`59`	`+ img = soup.find("img")`
	`60`	`+`
	`61`	`+ res = ogImg or twitterImg or metaImg or img`
	`62`	`+ res = res.get("content", None) or res.get_text() or res.get("src", None)`
	`63`	`+`
	`64`	`+ count = 0`
	`65`	`+ for i in range(0, len(res)):`
	`66`	`+ if (res[i] == "." or res[i] == "/"):`
	`67`	`+ count += 1`
	`68`	`+ else:`
	`69`	`+ break`
	`70`	`+ res = res[count::]`
	`71`	`+ if ((not res == None) and ((not "https://" in res) or (not "https://" in res))):`
	`72`	`+ res = url + "/" + res`
	`73`	`+ if (res == None or len(res.split()) == 0):`
	`74`	`+ res = "Not available"`
	`75`	`+`
	`76`	`+ return res`
	`77`	`+`
	`78`	`+# print dictionary`
	`79`	`+`
	`80`	`+`
	`81`	`+def printData(data):`
	`82`	`+ for item in data.items():`
	`83`	`+ print(f'{item[0].capitalize()}: {item[1]}')`
	`84`	`+`
	`85`	`+`
	`86`	`+# start`
	`87`	`+print("\n======================")`
	`88`	`+print("- Link Preview -")`
	`89`	`+print("======================\n")`
	`90`	`+`
	`91`	`+# get url from user`
	`92`	`+url = input("Enter URL to preview : ")`
	`93`	`+`
	`94`	`+# parsing and checking the url`
	`95`	`+if (url == ""):`
	`96`	`+ url = 'www.girlscript.tech'`
	`97`	`+if ((not "http://" in url) or (not "https://" in url)):`
	`98`	`+ url = "https://" + url`
	`99`	`+`
	`100`	`+# printing values`
	`101`	`+`
	`102`	`+# first check in the DB`
	`103`	`+db = {}`
	`104`	`+# create file if it doesn't exist`
	`105`	`+if not os.path.exists('Link-Preview/db.json'):`
	`106`	`+ f = open('Link-Preview/db.json', "w")`
	`107`	`+ f.write("{}")`
	`108`	`+ f.close()`
	`109`	`+`
	`110`	`+# read db`
	`111`	`+with open('Link-Preview/db.json', 'r+') as file:`
	`112`	`+ data = file.read()`
	`113`	`+ if (len(data) == 0):`
	`114`	`+ data = "{}"`
	`115`	`+ file.write(data)`
	`116`	`+ db = json.loads(data)`
	`117`	`+`
	`118`	`+# check if it exists`
	`119`	`+if (url in db and db[url]["time"] < round(time.time())):`
	`120`	`+ printData(db[url])`
	`121`	`+else:`
	`122`	`+ # if not in db get via request`
	`123`	`+`
	`124`	`+ # getting the html`
	`125`	`+ r = requests.get(url)`
	`126`	`+ soup = BeautifulSoup(r.text, "html.parser")`
	`127`	`+`
	`128`	`+ sevenDaysInSec = 72460*60`
	`129`	`+ # printing data`
	`130`	`+ newData = {`
	`131`	`+ "title": getTitle(soup),`
	`132`	`+ "description": getDesc(soup),`
	`133`	`+ "url": url,`
	`134`	`+ "image": getImage(soup, url),`
	`135`	`+ "time": round(time.time() * 1000) + sevenDaysInSec`
	`136`	`+ }`
	`137`	`+ printData(newData)`
	`138`	`+ # parse file`
	`139`	`+ db[url] = newData`
	`140`	`+ with open('Link-Preview/db.json', 'w') as file:`
	`141`	`+ json.dump(db, file)`
	`142`	`+`
	`143`	`+print("\n--END--\n")`

`‎Link-Preview/requirements.txt‎`

Lines changed: 2 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,2 @@`
	`1`	`+requests==2.25.1`
	`2`	`+beautifulsoup4==4.9.3`

0 commit comments

Comments

(0)

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit c6c1911

File tree

4 files changed

4 files changed

`‎.gitignore‎`

`‎Link-Preview/README.md‎`

`‎Link-Preview/linkPreview.py‎`

`‎Link-Preview/requirements.txt‎`

0 commit comments