Commit 8a9853c

committed

Merge branch 'cont1' of https://github.com/Shivansh-Jain-github/Amazing-Python-Scripts into cont1

2 parents 2328db2 + 0171cee commit 8a9853cCopy full SHA for 8a9853c

File tree

2 files changed

+84

-0

lines changed

Web Scrapping using Beautiful Soup
- README.md
- code.py

2 files changed

+84

-0

lines changed

`‎Web Scrapping using Beautiful Soup/README.md‎`

Lines changed: 27 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,27 @@`
	`1`	`+# Web Scraping with Beautiful Soup`
	`2`	`+`
	`3`	`+This script performs web scraping on a CodeChef problem statement webpage using the Beautiful Soup library in Python.`
	`4`	`+`
	`5`	`+## Description`
	`6`	`+`
	`7`	+The Python script utilizes the `requests` and `BeautifulSoup` libraries to extract information from a CodeChef problem statement webpage. It demonstrates the following actions:
	`8`	`+`
	`9`	`+- Printing the title of the webpage.`
	`10`	`+- Finding and printing all links on the page.`
	`11`	`+- Extracting text from paragraphs.`
	`12`	`+- Extracting image URLs.`
	`13`	`+- Counting and categorizing HTML tags.`
	`14`	`+- Filtering and printing valid links.`
	`15`	`+- Saving extracted data to a text file.`
	`16`	`+`
	`17`	`+## Prerequisites`
	`18`	`+`
	`19`	`+Ensure you have the following libraries installed:`
	`20`	`+`
	`21`	+- `requests`
	`22`	+- `beautifulsoup4`
	`23`	`+`
	`24`	`+You can install them using the following commands:`
	`25`	`+`
	`26`	+```bash
	`27`	`+pip install requests beautifulsoup4`

`‎Web Scrapping using Beautiful Soup/code.py‎`

Lines changed: 57 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,57 @@`
	`1`	`+import requests`
	`2`	`+from bs4 import BeautifulSoup`
	`3`	`+import re`
	`4`	`+`
	`5`	`+url = 'https://www.codechef.com/problems/TWORANGES?tab=statement'`
	`6`	`+response = requests.get(url)`
	`7`	`+soup = BeautifulSoup(response.content, 'html.parser')`
	`8`	`+`
	`9`	`+# Print the title of the webpage`
	`10`	`+print(f"Title: {soup.title.text}\n")`
	`11`	`+`
	`12`	`+# Find and print all links on the page`
	`13`	`+print("Links on the page:")`
	`14`	`+for link in soup.find_all('a'):`
	`15`	`+ print(link.get('href'))`
	`16`	`+`
	`17`	`+# Extract text from paragraphs`
	`18`	`+print("\nText from paragraphs:")`
	`19`	`+for paragraph in soup.find_all('p'):`
	`20`	`+ print(paragraph.text)`
	`21`	`+`
	`22`	`+# Extract image URLs`
	`23`	`+print("\nImage URLs:")`
	`24`	`+for img in soup.find_all('img'):`
	`25`	`+ img_url = img.get('src')`
	`26`	`+ if img_url:`
	`27`	`+ print(img_url)`
	`28`	`+`
	`29`	`+# Count and categorize tags`
	`30`	`+print("\nTag counts:")`
	`31`	`+tag_counts = {}`
	`32`	`+for tag in soup.find_all():`
	`33`	`+ tag_name = tag.name`
	`34`	`+ if tag_name:`
	`35`	`+ tag_counts[tag_name] = tag_counts.get(tag_name, 0) + 1`
	`36`	`+`
	`37`	`+for tag, count in tag_counts.items():`
	`38`	`+ print(f"{tag}: {count}")`
	`39`	`+`
	`40`	`+# Filter and print valid links`
	`41`	`+print("\nValid links:")`
	`42`	`+for link in soup.find_all('a'):`
	`43`	`+ href = link.get('href')`
	`44`	`+ if href and re.match(r'^https?://', href):`
	`45`	`+ print(href)`
	`46`	`+`
	`47`	`+# Save data to a file`
	`48`	`+with open('webpage_data.txt', 'w') as file:`
	`49`	`+ file.write(f"Title: {soup.title.text}\n\n")`
	`50`	`+ file.write("Links on the page:\n")`
	`51`	`+ for link in soup.find_all('a'):`
	`52`	`+ file.write(f"{link.get('href')}\n")`
	`53`	`+ file.write("\nText from paragraphs:\n")`
	`54`	`+ for paragraph in soup.find_all('p'):`
	`55`	`+ file.write(f"{paragraph.text}\n")`
	`56`	`+`
	`57`	`+print("\nData saved to 'webpage_data.txt'")`

0 commit comments

Comments

(0)

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit 8a9853c

File tree

2 files changed

2 files changed

`‎Web Scrapping using Beautiful Soup/README.md‎`

`‎Web Scrapping using Beautiful Soup/code.py‎`

0 commit comments