112 questions
- Bountied 0
- Unanswered
- Frequent
- Score
- Trending
- Week
- Month
- Unanswered (my tags)
2
votes
0
answers
30
views
How can I install html5lib on a dataproc cluster
I have a dataproc pipeline with which I do webscraping and store data in gcp.
Task setting is something like this:
create_dataproc_cluster = DataprocCreateClusterOperator(
task_id='...
0
votes
0
answers
66
views
Error 'html5lib not found' when using pandas.read_html() function in google cloud composer 2.1.8
I have a code to scrape data from a website. I used pandas.read_html() and wrote everything in a dataproc. When I run the code in (Composer &)Airflow somethimes it runs successfully and sometimes ...
0
votes
1
answer
441
views
FeatureNotFound: Couldn't find a tree builder with the features you requested: html5lib
In the course I am learning it is suggested to execute the following code:
(the idea is to get the 7th table from the wikipedia page)
data = requests.get("https://en.wikipedia.org/wiki/...
1
vote
1
answer
139
views
Best Way to Debug html5lib.html5parser.ParseError: Unexpected character after attribute value"?
I am currently working on a personal project and utilizing the chessdotcom Public API Package. I am currently able to store in a variable the PGN from the daily puzzle (Portable Game Notation) which ...
0
votes
0
answers
135
views
html5lib not Found Error with this library installed
I'm having an issue with the read_html function from pandas. I'm Trying to read a datatable in a webpage that is made with <div> instead of <td> and <tr>. I'm trying to do it with ...
0
votes
1
answer
89
views
VSccode can't recognize html5lib (I installed it)
Visual Studio Code not reading html5lib
I am using bs4 in VS Code, along with html5lib, but VS Code is indicating that it does not exist (I installed it using the command prompt).
import requests
...
0
votes
2
answers
461
views
How to parse HTML tables using html5lib and Beautiful Soup in Jupyter?
I'm Getting the value error trying to parse a page with BeautifulSoup and html5lib in Jupyter:
import pandas as pd
import requests
import html5lib
url = "https://worldpopulationreview.com/...
0
votes
2
answers
48
views
How can I get a data from in div class using a BeautifulSoup
I am learning BS4. I parsed some div class. But I want to get data in div code. `
[<div class="handlebarData theme_is_whitehot" data-enrollment='{"available":{"id":...
8
votes
3
answers
1k
views
AttributeError: module 'html5lib.treebuilders.etree' has no attribute 'getETreeModule'
Suggestions please, thanks :)
pip list --outdated --format=freeze
Gives the following error:
ERROR: Exception:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/pip/...
1
vote
1
answer
507
views
How to parse HTML with source mapping?
I want to use Python to parse HTML markup, and given one of the resultant DOM tree elements, get the start and end offsets of that element within the original, unmodified markup.
For example, given ...
0
votes
0
answers
182
views
<!DOCTYPE html> missing in Selenium Python page_source
I'm using Selenium for functional testing of a Django application and thought I'd try html5lib as a way of validating the html output. One of the validations is that the page starts with a <!...
0
votes
2
answers
274
views
xml.etree.ElementTree: How to replace like "innerHTML"?
I want to replace the <h1> tag of a html page.
But the content of the heading can be HTML (not just a string).
I want to insert foo <b>bold</b> bar
input:
start
<h1 class="...
0
votes
1
answer
178
views
How to replace the innerHTML of all <h1> tags with html5lib?
How to replace the innerHTML of all tags with html5lib?
input:
foo
<h1>Moonlight</h1>
bar
Desired output:
foo
<h1>Sunshine</h1>
bar
I would like to use html5lib, since it is ...
0
votes
1
answer
313
views
"ValueError: No tables found matching regex '.+'" at random times when scraping large amounts of data
this is my first project with pandas and selenium so I may be making a dumb mistake. I've written this function to go through a list of nba players and scrape their game logs into data frames. It all ...
0
votes
1
answer
64
views
Glitch in html5lib?
I'm getting this error. Is it a bug or is it a code error? What does it mean?
Traceback (most recent call last):
File "isc.py", line 8, in <module>
import requests, os, sys, bs4
...