Commit c643b6f

authored

Merge pull request #354 from Yolo-cell-hash/dom-branch

DOM Extraction Script add

2 parents 3eb3ae4 + 3e67747 commit c643b6fCopy full SHA for c643b6f

File tree

+45

-0

lines changed

+45

-0

lines changed

Lines changed: 19 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,19 @@`
	`1`	`+# DOM Extraction Script`
	`2`	`+`
	`3`	`+Extract the DOM elements of a webpage efficiently.`
	`4`	`+`
	`5`	`+## Installation`
	`6`	`+`
	`7`	`+Use the package manager [pip](https://pip.pypa.io/en/stable/) to install the required libraries.`
	`8`	`+`
	`9`	+```bash
	`10`	`+pip install requests beautifulsoup4`
	`11`	`+`
	`12`	+```
	`13`	`+`
	`14`	`+## Usage`
	`15`	`+`
	`16`	+```python
	`17`	`+url = 'https://example.com'`
	`18`	+```
	`19`	`+Replace 'https://example.com' with the URL of the website you want to extract the DOM from.`

Lines changed: 26 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,26 @@`
	`1`	`+import requests`
	`2`	`+from bs4 import BeautifulSoup`
	`3`	`+`
	`4`	`+# Define the URL of the website you want to extract the DOM from`
	`5`	`+url = 'https://example.com'`
	`6`	`+`
	`7`	`+response = requests.get(url)`
	`8`	`+`
	`9`	`+if response.status_code == 200:`
	`10`	`+ soup = BeautifulSoup(response.text, 'html.parser')`
	`11`	`+`
	`12`	`+`
	`13`	`+ title = soup.title`
	`14`	`+ if title:`
	`15`	`+ print("Page Title:", title.text)`
	`16`	`+ else:`
	`17`	`+ print("No title tag found.")`
	`18`	`+`
	`19`	`+`
	`20`	`+ links = soup.find_all('a')`
	`21`	`+ print("Links in the page:")`
	`22`	`+ for link in links:`
	`23`	`+ print(link.get('href'))`
	`24`	`+`
	`25`	`+else:`
	`26`	`+ print("Failed to retrieve the page. Status code:", response.status_code)`

Comments

(0)