Looking for python code that can take an HTML page and insert any linked CSS style definitions used by that page into it - so any externally referenced css page(s) are not needed.
Needed to make single files to insert as email attachments from existing pages used on web site. Thanks for any help.
-
You can easily pull the CSS out, but what CSS path will you use to identify each piece of CSS? I'm not sure how easily you can do that programmatically.hughdbrown– hughdbrown2010年11月22日 15:23:19 +00:00Commented Nov 22, 2010 at 15:23
-
dup? stackoverflow.com/questions/781382/…khachik– khachik2010年11月22日 15:50:52 +00:00Commented Nov 22, 2010 at 15:50
-
No, not a dup. The aim here is to put everything into one file.Sven Marnach– Sven Marnach2010年11月22日 15:55:53 +00:00Commented Nov 22, 2010 at 15:55
-
Note that Outlook HTML mail reader is Word, anything but inline style="..." in each tag may result very odd sometimes.Paulo Scardine– Paulo Scardine2010年11月22日 19:29:32 +00:00Commented Nov 22, 2010 at 19:29
3 Answers 3
Sven's answer helped me, but it didn't work out of the box. The following did it for me:
import bs4 #BeautifulSoup 3 has been replaced
soup = bs4.BeautifulSoup(open("index.html").read())
stylesheets = soup.findAll("link", {"rel": "stylesheet"})
for s in stylesheets:
t = soup.new_tag('style')
c = bs4.element.NavigableString(open(s["href"]).read())
t.insert(0,c)
t['type'] = 'text/css'
s.replaceWith(t)
open("output.html", "w").write(str(soup))
Comments
You will have to code this yourself, but BeautifulSoup will help you a long way. Assuming all your files are local, you can do something like:
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(open("index.html").read())
stylesheets = soup.findAll("link", {"rel": "stylesheet"})
for s in stylesheets:
s.replaceWith('<style type="text/css" media="screen">' +
open(s["href"]).read()) +
'</style>')
open("output.html", "w").write(str(soup))
If the files are not local, you can use Pythons urllib or urllib2 to retrieve them.
3 Comments
You can use pynliner. Example from their documentation:
html = "html string"
css = "css string"
p = Pynliner()
p.from_string(html).with_cssString(css)