Trouble splitting strings with consecutive delimiters

Peter Otten __peter__ at web.de
Tue May 1 08:55:13 EDT 2012


deuteros wrote:
> I'm using regular expressions to split a string using multiple delimiters.
> But if two or more of my delimiters occur next to each other in the
> string, it puts an empty string in the resulting list. For example:
>> re.split(':|;|px', "width:150px;height:50px;float:right")
>> Results in
>> ['width', '150', '', 'height', '50', '', 'float', 'right']
>> Is there any way to avoid getting '' in my list without adding px; as a
> delimiter?

That looks like a CSS style; to parse it you should use a tool that was 
built for the job. The first one I came across (because it is included in 
the linux distro I'm using and has "css" in its name, so this is not an 
endorsement) is
http://packages.python.org/cssutils/
>>> import cssutils
>>> style = cssutils.parseStyle("width:150px;height:50px;float:right")
>>> for property in style.getProperties():
... print property.name, "-->", property.value
... 
width --> 150px
height --> 50px
float --> right
OK, so you still need to strip off the unit prefix manually:
>>> def strip_suffix(s, *suffixes):
... for suffix in suffixes:
... if s.endswith(suffix):
... return s[:-len(suffix)]
... return s
... 
>>> strip_suffix(style.float, "pt", "px")
u'right'
>>> strip_suffix(style.width, "pt", "px")
u'150'


More information about the Python-list mailing list

AltStyle によって変換されたページ (->オリジナル) /