9

Is there a way to get CSS classes from an HTML file using BeautifulSoup? Example snippet:

<style type="text/css">
 p.c3 {text-align: justify}
 p.c2 {text-align: left}
 p.c1 {text-align: center}
</style>

Perfect output would be:

cssdict = {
 'p.c3': {'text-align': 'justify'},
 'p.c2': {'text-align': 'left'},
 'p.c1': {'text-align': 'center'}
}

although something like this would do:

L = [
 ('p.c3', {'text-align': 'justify'}), 
 ('p.c2', {'text-align': 'left'}), 
 ('p.c1', {'text-align': 'center'})
]
ilke444
2,7711 gold badge20 silver badges32 bronze badges
asked Jul 16, 2012 at 9:13
5
  • What do you expect to get? The literal text "\n\n p.c3 {text-align: justify}\n\n..."? Please be explicit! Commented Jul 16, 2012 at 9:15
  • By "Get CSS classes" Do you mean "Get a list of HTML classes that are used in selectors in the stylesheet"? i.e. the result you want is ['c3', 'c2', 'c1']? Commented Jul 16, 2012 at 9:15
  • @Martin Pieters,@Quentin -- Updated the question. Commented Jul 16, 2012 at 9:23
  • So you want rulesets, not classes? You'll need to find a CSS parser. I don't think BeautifulSoup has any features along those lines (it can get the stylesheet, but not parse it). Commented Jul 16, 2012 at 9:24
  • @Quentin -- Rulesets yes, my question was wrongly put. Sorry for that. I am not sure if this(comments) is the right place to ask, but is there a reccomended css parser for doing that? Commented Jul 16, 2012 at 9:28

3 Answers 3

12

BeautifulSoup itself doesn't parse CSS style declarations at all, but you can extract such sections then parse them with a dedicated CSS parser.

Depending on your needs, there are several CSS parsers available for python; I'd pick cssutils (requires python 2.5 or up (including python 3)), it is the most complete in it's support, and supports inline styles too.

Other options are css-py and tinycss.

To grab and parse such all style sections (example with cssutils):

import cssutils
sheets = []
for styletag in tree.findAll('style', type='text/css')
 if not styletag.string: # probably an external sheet
 continue
 sheets.append(cssutils.parseStyle(styletag.string))

With cssutil you can then combine these, resolve imports, and even have it fetch external stylesheets.

answered Jul 16, 2012 at 10:16
Sign up to request clarification or add additional context in comments.

Comments

7

A BeautifulSoup & cssutils combo will do the trick nicely:

 from bs4 import BeautifulSoup as BSoup
 import cssutils
 selectors = {}
 with open(htmlfile) as webpage:
 html = webpage.read()
 soup = BSoup(html, 'html.parser')
 for styles in soup.select('style'):
 css = cssutils.parseString(styles.encode_contents())
 for rule in css:
 if rule.type == rule.STYLE_RULE:
 style = rule.selectorText
 selectors[style] = {}
 for item in rule.style:
 propertyname = item.name
 value = item.value
 selectors[style][propertyname] = value

BeautifulSoup parses all "style" tags in the html (head & body), .encode_contents() converts the BeautifulSoup objects into a byte format that cssutils can read, and then cssutils parses the individual CSS styles all the way down to the property/value level via rule.selectorText & rule.style.

Note: The "rule.STYLE_RULE" filters out only styles. The cssutils documentation details options for filtering media rules, comments and imports.

It'd be cleaner if you broke this down into functions, but you get the gist...

answered Aug 27, 2016 at 22:28

1 Comment

soup.select('style') only detects style declared on the page, and doesn't detect css styles declared in external CSS files
0

tinycss parser exists for explicitly parsing CSS in python. BeautifulSoup supports HTML tags and specific css classes cannot be searched unless you use regular expression. This even supports some amount of CSS3.

http://packages.python.org/tinycss/

PS: However, it works only from python 2.6 onwards.

answered Jul 16, 2012 at 9:43

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.