Newest 'beautifulsoup' Questions

1. Home
2. Questions
3. AI Assist
4. Tags
5. Challenges
6. Chat
7. Articles
8. Users
9. Companies
11. Communities for your favorite technologies. Explore all Collectives
Stack Internal

Stack Overflow for Teams is now called Stack Internal. Bring the best of human thought and AI automation together at your work.
Try for free Learn more
Bring the best of human thought and AI automation together at your work. Learn more

32,822 questions

1 vote

1 answer

73 views

BeautifulSoup: Unable to extract article body text and Topics text despite using correct class names

I am writing a Python script to scrape news articles from a specific website. While I am successfully able to extract the Title, Author, and Category tags (Sectors, Regions, Topics), I am unable to ...

user17356493's user avatar

user17356493

asked 8 hours ago

2 votes

2 answers

100 views

BeautifulSoup - Extracting content blocks after specific subheadings within a larger section, ignoring document introduction

I am scraping the Dead by Daylight Fandom wiki (specifically TOME pages, e.g., https://deadbydaylight.fandom.com/wiki/Tome_1_-_Awakening) to extract memory logs. The goal is to extract the Memory ...

zeromiedo's user avatar

zeromiedo

asked Nov 29, 2025 at 14:53

0 votes

2 answers

219 views

Beautiful Soup, children are clearly inside but can't get it

From the below structure I only want value of href attribute. But rec_block is returning h5 element without its children so basically <h5 class="series">Recommendations</h5>. <...

Emby's user avatar

Emby

asked Nov 25, 2025 at 18:27

-5 votes

1 answer

101 views

What is missing in selenium code to get complete html code to use with beautifulsoup?

I've recently learned how to webscrape with beautifulsoup and now I'm trying to learn a bit about selenium because I couldn't get correct info with beautifulsoup alone. I think there is javascript ...

user75667's user avatar

user75667

asked Oct 5, 2025 at 8:12

-1 votes

2 answers

70 views

Getting element using re.compile with bs4?

i try to find a span element using selenium and bs4 with the following code: import re from bs4 import BeautifulSoup from selenium import webdriver from selenium.webdriver.chrome.options import ...

Rapid1898's user avatar

Rapid1898

1,569

asked Sep 24, 2025 at 19:56

3 votes

1 answer

173 views

How to clean inconsistent address strings in Python?

I'm working on a web scraping project in Python to collect data from a real estate website. I'm running into an issue with the addresses, as they are not always consistent. I've already handled simple ...

Adamzam15's user avatar

Adamzam15

asked Sep 11, 2025 at 12:13

3 votes

1 answer

62 views

Beautiful Soup; splitting a paragraph only by <br> where stripped_strings is not working

I'm rather new to using Beautiful Soup and I'm having some issues splitting some html correctly by only looking at html breaks and ignoring other html elements such as changes in font color etc. The ...

James Brian's user avatar

James Brian

asked Aug 30, 2025 at 17:29

1 vote

1 answer

244 views

Trouble scraping dynamic lottery results table – inconsistent parsing

I’ve been trying to scrape lottery results from a website that shows draws. The data is presented in a results table, but I keep running into strange issues where sometimes the numbers are captured ...

Zuryab's user avatar

Zuryab

asked Aug 27, 2025 at 10:50

0 votes

2 answers

64 views

Get the attribute data by another attribute beautifulsoup

I want to parse the HTML like this below with beautiful soup . . <meta property="og:image" content="https://test.com/test.jp" /> <meta property="og:description" ...

whitebear's user avatar

whitebear

12.6k

asked Aug 25, 2025 at 4:44

-2 votes

1 answer

74 views

How to use Beautiful Soup to find partial links [closed]

I have an eBay page in which I would like to formulate a list of all the item numbers on that page. I have executed and parsed the HTML content using requests and Beautiful Soup, but I can't figure ...

Travis Ward's user avatar

Travis Ward

asked Aug 5, 2025 at 13:54

0 votes

2 answers

111 views

How to use index to find position of JSON record [closed]

Is there a better way than iteration using a for loop to find the index of the record? My problem is that to use index I seem to need the index of the record I'm seeking. import json from bs4 import ...

Peter Hill's user avatar

Peter Hill

asked Aug 2, 2025 at 17:50

4 votes

2 answers

288 views

How to reliably download 1969 "Gazzetta Ufficiale" PDFs (Italian Official Gazette) with Python?

I’m trying to programmatically download the full "pubblicazione completa non certificata" PDFs of the Italian Gazzetta Ufficiale – Serie Generale for 1969 (for an academic article). The site has a ...

Mark's user avatar

Mark

1,801

asked Aug 1, 2025 at 13:13

-1 votes

2 answers

67 views

Puppeteer can't access var doc in javascript

I am trying to scrape a web page using puppeteer, however, I can't access var doc with puppeteer. Although I can see it in the source page of my web browser var rows = []; var i = 1; /* while(i <= ...

M.M. CAN's user avatar

M.M. CAN

asked Jul 15, 2025 at 16:17

0 votes

1 answer

188 views

How can I speed up my Selenium scraper using multiprocessing in Python? [closed]

I'm scraping a large list of URLs (1.2 million) using Selenium + BeautifulSoup with Python's multiprocessing.Pool. I want to scale it up to scrape faster, ideally without hitting system resource ...

SolidOpt's user avatar

SolidOpt

asked Jul 10, 2025 at 6:52

-3 votes

1 answer

72 views

Beautifulsoap - reading multiply pages breaks after random valid reads

I'm reading some data about books title etc from number of pages. Python 3.10.13 Breautifulsoap 4.12.3 Code: def scrapSite(URL): headers = {"User-Agent": "Mozilla/5.0 (Windows NT ...

Error Replicator's user avatar

Error Replicator

asked Jun 20, 2025 at 21:13

15 30 50 per page

2 3 4 5

...

2189 Next

CollectivesTM on Stack Overflow

BeautifulSoup: Unable to extract article body text and Topics text despite using correct class names

BeautifulSoup - Extracting content blocks after specific subheadings within a larger section, ignoring document introduction

Beautiful Soup, children are clearly inside but can't get it

What is missing in selenium code to get complete html code to use with beautifulsoup?

Getting element using re.compile with bs4?

How to clean inconsistent address strings in Python?

Beautiful Soup; splitting a paragraph only by <br> where stripped_strings is not working

Trouble scraping dynamic lottery results table – inconsistent parsing

Get the attribute data by another attribute beautifulsoup

How to use Beautiful Soup to find partial links [closed]

How to use index to find position of JSON record [closed]

How to reliably download 1969 "Gazzetta Ufficiale" PDFs (Italian Official Gazette) with Python?

Puppeteer can't access var doc in javascript

How can I speed up my Selenium scraper using multiprocessing in Python? [closed]

Beautifulsoap - reading multiply pages breaks after random valid reads

Hot Network Questions