Word division in IE
and other notes on the `nobr` markup
and on suggesting possible "word" breaks

Internet Explorer (IE) divides strings into two lines in a problematic way. It treats any hyphen as a potential word break point, thus even breaking "-a" to "-" and "a". Moreover, it treats several special characters as allowed break points, too. It even splits expressions like "f(0)" to "f" and "(0)". Other browsers have similar behavƒ…ior; Opera may even split "1/2" into "1/" and "2". You can prevent line breaks by using the nonstandard nobr markup or some other methods.

Consider using such methods, when you have a string that should not be broken across lines and contains any of the following characters:
-()[]{}«»%[·\/!?

As regards to the the dual problem "how can I suggest to a browser that a string can be broken", the nonstandard wbr markup may help sometimes, but it is best to avoid long strings that contain no spaces.

Content:

Summary: practical guidelines
Line breaks as a problem
The dual problem: how to suggest possible "word" breaks

Summary: practical guidelines

Any character other than letters and digits can cause problems in line breaking, either by causing a line break where it is not appropriate or by preventing a line break between words. Commas, periods, question marks, and exclamation marks are safe, though, when appearing immediately after a word and followed by a space. Otherwise, it is safest to deal with any punctuation mark or special character as follows:

Wrap a string that should not be broken inside nobr markup. Examples: F-1, f(0).
Use a no-break space between words or numbers that should be kept on the same line. Example: 5 m (displays as 5?m).
Insert a wbr tag if a line break is acceptable after a hyphen. Examples:
bird-cage
inter-organizational
(The nobr is used to prevent a line break before the hyphen on IE. So it deals with a different issue, but if you add tags for line breaking control, you might just as well solve both problems.)
Insert a wbr tag when a closing quotation mark is followed a space and an opening parenthesis or bracket. Example: He said “foo” (or something like that).
Insert a wbr tag before a string that begins with a character other than letter or digit. Example: the .htaccess file.
For a long word, use the soft hyphen in suitable word division point. Example: ethnobotany (or maybe even ethnobotany).
For a very long string that is not a word but e.g. a URL or a filename, consider presenting it outside normal inline text context. If it needs to be in text, consider adding wbr tags at permissible break points, if you can clearly indicate the continuation of the string to the next line. Example: Go to URL “http://test.example.com/demos/basic.html” now.
To maximize browser coverage (to include virtually all versions of IE and Opera), add the following after each wbr tag: <a class=wbr></a> (an element with empty content) with the CSS rule .wbr:after { content: "00200‰~B"; }

This document discusses the nobr markup mainly as a tool for preventing bad line breaks around hyphens and other punctuation or special characters. It can also be used to keep consecutive word on the same line, though for such purposes there is an alternative method which does not require nonstandard markup: using no-break spaces ( ) instead of normal spaces.

Line breaks as a problem

Undesired line breaks on IE

IE may break text into lines before or after a special character, even when no space intervenes. This depends on context and browser version. Such behavior has been observed in the following situations:

after a hyphen -
before a hyphen - (IE 8 and IE 9)
after a percent sign %
after a degree sign [
after a middle dot ƒL
before a left parenthesis (
after a right parenthesis )
before a left bracket [
after a right bracket ]
before a left brace {
after a right brace }
before a left guillemet ƒH
after a right guillemet ƒT
after an exclamation mark !
after a question mark ?
after a slash / (IE 6)

The list is most probably not comprehensive.

See my description of ISO Latin 1 for notes on the various uses of these and other characters in various contexts.

This means, for example, that "Latin-1" can be broken to "Latin-" at the end of one line and "1" at the beginning of the next one. It even leaves a lone "-" at the end of a line; this is very bad, not only because of the use of a hyphen as a minus sign but also because e.g. the Finnish language uses words with a leading "-" (to indicate that the first part of a word has been omitted). Even worse, "person(s)" can be broken to "person" and "(s)" and "a[0]" to "a" and "[0]".

As regards to guillemets , note that in the French quotation style, « comme ici », it seems to be sufficient to use a no-break space (instead of a normal space) between the guillemet and the first or last word of the quotation, which is recommendable anyway. The document HTML authoring in French contains some notes on spacing related to guillemets and other punctuation.

But in other guillemet usage styles, »wie hier«, several versions of IE seem to break after the initial guillemet (under some conditions, i.e. when the current line has room for that lone guillemet but not for the word immediately after it). It seems that this problem has been fixed in IE 6.0. But people using older versions could still see homeless guillemets unless you use nobr to prevent that Generally, it's probably best to accept that risk – except perhaps in headings and other very essential texts, where you might consider whether you wish to take some precautions.

Breaking after the percent sign % is very nasty in some situations where that character is used in a special meaning in a programming language or otherwise (e.g. URLs). In particular, expressions like %foo and %20 can be split so that % appears alone at the end of a line! Luckily IE does not seem to split strings like "%:n".

Breaking after the degree sign [ is harmful too, especially in contexts like "100 [C".

In my tests, IE has broken a line after the cent sign u too. But I haven't seen any real-life situation where that could occur. (IE does not break "u:n" for example.)

Moreover, IE treats the combination &# as breakable. This is very nasty in a document discussing character references like Ä. It seems that this problem has been fixed in IE 5.

IE also introduces a break point between the solidus and the reverse solidus in the combination /\. This is admittedly a rare character sequence, but I noticed this problem when actually using it in a specific context where it is essential.

The description above is probably not exhaustive. And the IE behavior may depend on version number, platform, and context (specific character sequences). I haven't tested much what IE does with other than ISO Latin 1 characters, but IE has broken a line after the following "Windows characters": ellipsis, permile, bullet, em dash, en dash, right single guillemet, right single quote, right double quote, and before the following: left single guillemet, left single quote, left double quote.

Preventing the line breaks

Quite often, splitting a word containing a hyphen is acceptable, even desirable. It can make the document look better, especially if there are long words in the text, including compound words containing a hyphen. But the problem is that IE treats any hyphen as a possible line break point and breaks even a three- or two-character string.

Moreover, breaking after some special characters, though occasionally useful, is a potential source of great confusion and even ambiguity.

There are several alternative approaches to preventing undesired line breaks, and since they operate at different level of text representation, they could even be used in a combination. The approaches are summarized in the following table. In the first four approaches, the example shows how to prevent a line break between the number and the unit in "100?m". In the other approaches, the example expression is "A-1".

Level	Method	Example	Notes
HTML markup	`nobr` element	`<nobr>100?m</nobr>`	Nonstandard, but works widely.
HTML markup	`nowrap` attribute	`<td nowrap>100?m</td>`	Only applicable to table cells. Severe limitations in support.
CSS style sheet	`white-space: nowrap`	`<span style="white-space: nowrap">100?m</span>`	Usually works when set on the innermost element.
Character level	no-break space	`100 m`	Works well to prevent inter-word breaks but not for the problems discussed here.
Character level	non-breaking hyphen	`A‑1`	Applies to preventing a break after a hyphen. Limited usefulness due to font problems.
Character level	word joiner	`A-⁠1`	Wide applicability in principle, but very limited usefulness due to lack of support.
Character level	zero-width no-break space	`A-1`	Very limited usefulness due to lack of consistent support on IE.

There are many quirks and oddities in browser support to these methods. There is page for testing the methods in some simple cases. Generally, nobr is the one that works most often.

The `nobr` markup

The safest way to prevent undesirable line breaks is the nonstandard, Netscape-invented (!) nobr markup. It has never been defined exactly. Browsers generally treat it in a command-like fashion:  is taken as "disallow line breaks from now on" and  says "line breaks allowed from now on". But it is safest to use it as text-level markup only. This should suffice, since we normally would use nobr for short pieces of text only, as in vis-a-vis or -a.

A very short quotation using guillemets could be put into a single nobr element. But generally the approach of making a quotation as a whole non-breakable is not suitable. Instead, you can put just the initial guillemet and the first word inside nobr markup, and similarly for the last word and the closing guillemet.

Moreover, the markup will prevent hyphenation too. If you now use, say, the markup ƒTAnf?hrungszeichenƒH, then future browsers that will apply hyphenation to words will not do that for this word. For such reasons, you may wish to make the scope (content) of the nobr element minimal, even though it looks slightly odd in the markup, e.g. ƒTAnf?hrungszeichenƒHA,

The nobr markup could also be used to keep images on one "line", side by side. As an alternative to using nobr, you could put images into a table.

No, ` ` won't do

We cannot, in general, solve the problems discussed here by using no-break spaces ( ). If you insert a no-break space character, you insert a character which is like a space but may not be replaced by a line break in formatting. Normally this is not desirable, since you don't want an expression like "person(s)" rendered as "person (s)", do you?

In special cases it might appear to be acceptable and even desirable to have extra spacing horizontally, especially in situations like f (0); but it is a moot point, as explained in my notes on mathematic notations in HTML.

For table cells, `nowrap` may work

In special cases, when the data is in a table cell and it is adequate to prevent all line breaks in it, you can use the nowrap attribute for the td or th element. That attribute is "deprecated", but it is still valid.

Note, however, that browsers seem to ignore it, if a fixed width is set for the cell so that the content doesn't fit into that without wrapping. This happens for a fixed width in pixels, em units, etc.

The CSS setting `white-space: nowrap`

Setting white-space: nowrap in CSS has the same effect on an element as wrapping it inside nobr markup. The reasons for preferring nobr are:

The nobr markup works even when CSS is disabled (see the usual CSS caveats).
Preventing line breaks is essential to the correct understanding of the content, not just a styling preference.
The CSS way is somewhat clumsier. Even when you have defined a suitable class, you need to write e.g.
A-1
as opposite to the simpler
A-1

If the text that needs to be kept on one line has already been made an element for some other reason, then it is slightly more convenient to use CSS than the nobr markup. For example, if your page discusses programming and mentions an expression like i-- marked up as <code>i--</code>, then it is simplest to just add an attribute there: <code class=nobr>--</code>. Of course, you would then need a rule like .nobr { white-space: nowrap?} in your CSS code.

Historically, the white-space property has been problematic. It was originally defined as relating to white space characters only, and its name suggests the same. It also oddly limited to block level elements. However, CSS 2.1 has cleared things up, and browser support has been good ever since IE?6.

There is a bug in some versions of IE: if you set white-space: nowrap for a table cell it may fail to work. To overcome this, put the cell contents inside an auxiliary element (usually span or div) and set white-space: nowrap on that element.

The Unicode approach: the word joiner (WJ) character

In theory, the Unicode standard says that the character to be used to prevent line breaks in general is the word joiner (WJ) character (U+2060, representable in HTML as ⁠):

[ˆø—p]

The word joiner character is the preferred choice for an invisible character to keep other characters together that would otherwise be split across the line at a direct break.

Source: Technical report Line Breaking Properties (UAX#14).

This would mean that in order to keep e.g. the string 2003”N03ŒŽ24“ú on one line you would use 2003-⁠03-⁠24. Here is the rendering on your browser: 2003-⁠03-⁠24. The rendering may contain indications of unrepresentable characters, since many browsers do not recognize the character as having the special status, so they will just try to display it; and most fonts do not contain any glyph for the character.

Even when a rare font (e.g., Code2000) containing a glyph for the word joiner is used, the method fails. IE still divides the text the same way as without the word joiner, apparently because it does not know the special meaning of this character.

It's no wonder that this method does not work (yet), since the word joiner character was introduced into Unicode as late as in Version 3.2 (date: 2002”N03ŒŽ27“ú). It generally takes many, many years before characters added to Unicode are widely supported in fonts.

We might also consider using the zero-width no-break space (ZWNBSP) character. Within text, it has the same meaning as the word joiner, and it has been in Unicode much longer. It is recognized by IE in the sense that the browser knows that it has no glyph, but it depends on IE version whether it prevents a line break; on IE?7, it does not. In principle, its use for the purpose of preventing line breaks has been officially discouraged, since it is also used as a byte order mark, and it was deemed appropriate to reserve it for that very purpose.

In practice, you can represent ZWNBSP as  or . Thus, in order to keep e.g. the string 2003”N03ŒŽ24“ú on one line you would use 2003-03-24. Here is the rendering on your browser: 2003-03-24.

Validation problems with `nobr`

Since nobr is not in any HTML specifications, it causes problems in validation. You can use nobr and still validate your pages, if you use a modified Document Type Definition. For an explanation, see Creating your own DTD for HTML validation . (If you use HTML5, you just have to ignore the error messages, since HTML5-based “validation” is based on the prose of HTML5 drafts, not on DTDs, and the drafts currently forbid the nobr markup as “obsolete”.)

Basically you would use a DTD containing the following modified clause (addition indicated with emphasis):

<!ENTITY % phrase "EM | STRONG | DFN | CODE | NOBR |
 SAMP | KBD | VAR | CITE | ABBR | ACRONYM" >

I have made modified versions of HTML 4.01 DTDs:

http://jkorpela.fi/html/strict.dtd (for Strict HTML 4.01)
http://jkorpela.fi/html/loose.dtd (for Transitional HTML 4.01)
http://jkorpela.fi/html/frameset.dtd (for framesets)

It is preferable to copy the DTD you need, instead of referring to the addresses above. You would use a DOCTYPE declaration like the following (instead of a normal DOCTYPE which uses the PUBLIC keyword), naturally replacing the address inside quotation marks as needed:

<!DOCTYPE HTML SYSTEM "http://jkorpela.fi/html/strict.dtd">

Note that the W3C validator has an internal limitation ("the number of tokens in a group must not exceed GRPCNT (64)") which prevents you from using it in cases like this; the addition of NOBR happens to exceed that limit! So you need to use e.g. the WDG validator instead or to modify the DTD more, by removing something (like I have actually removed acronym from the Transitional DTD mentioned above) when you add nobr.

Naturally, if you use a modified DTD, you can't claim conformance to HTML 4.01 or some other specification. If your customer or boss requires such conformance, you are out of luck. You might still try to explain to him that your use of nobr is a workaround to an IE bug/deficiency.

Special issues with hyphens and relatives

Alternative characters

The common hyphen character, or the Ascii hyphen, is semantically ambiguous, as explained in the document Dashes and hyphens . Therefore it could be replaced by other, semantically more specific Unicode characters, hoping that browsers do not regard them as line break points. This may also affect rendering.

However, many of the alternative characters are poorly supported in fonts and by browsers. Writing them can be an issue, too, but they can be presented using character references such as ‑ for the non-breaking hyphen character or − for the minus sign.

Demo
Character	Sample
Ascii hyphen	bird-cage
Minus sign	bird−cage
Unicode hyphen	bird‐cage
Non-breaking hyphen	bird‑cage
En dash	bird–cage
Em dash	bird—cage

The demo here illustrates how your browser handles some hyphen-like characters when a line break would be needed. The second column contains a sample expression in a box with fixed width. Some of the expressions are contrived, just to create comparability; in particular, the minus sign should never appear between letters without spaces.

Minus sign

The minus sign is defined in Unicode as a separate character, and this character is preferable in texts and formulas. However, in programming languages and in discussing them, the Ascii hyphen should be used, as it is the character used in programming by language definitions. Example:

The number −42 is written as -42 in JavaScript.

In addition to being semantically more appropriate and visually much more suitable, the minus sign has the property of disallowing a line break after it, when immediately followed by a non-space character. Thus, there is no line break issue with −42 (which can be written in HTML as −42).

When used as a binary operator (for subtraction), the minus sign should be preceded and followed by a space, according to standards on mathematical expressions. Undesired line breaks (normally, at least before the operator) can be prevented by using no-break spaces. For example, a?−?b can be written as a &minus b.

Unicode hyphen

The Unicode hyphen has mostly the same line breaking properties as the Ascii hyphen. There is not much point in using it, because it mostly looks the same as the common hyphen, if the font contains it at all.

Non-breaking hyphen

An undesired line break e.g. in a-b can in principle be prevented by writing it as a‑b. However, although such constructs are well supported by modern browsers, the support is not yet universal, and the data gets badly distorted when a browser does not support them. The main problem is that many common fonts do not contain the non-breaking hyphen, and browsers may render e.g. a small rectangle or question mark instead. (In fact, Arial Unicode MS and Lucida Sans Unicode are the only fairly common fonts that contain it.)

Dashes

In good old typesetting, The em dash is used e.g. for parenthetic remarks—like this—whereƒ…as the en dash indicates inclusive or continuing numbers or dates like "pages 233–235", among other things. On web pages, the hyphen has often been used as a surrogate for em dash or en dash. Spacing around the hyphen varies, and so does the use of a single hyphen vs. a double hyphen for the em dash.

These days, the en dash and the em dash can be used rather confidently. They have, however, some of the problems of hyphen characters. According to line breaking rules, and in the practices of some browsers, either of these characters allows a line break after it, even if no space intervenes. This complies with old typographic rules, but the en dash, it results in practically poor breaks in short expressions like 1–2. Therefore, just as for similar expressions with hyphens, there is often a reason to prevent line breaks: 1–2.

If you use the em dash in the—mostly American—style where it appears between words without spaces, use wbr after each em dash. Otherwise many browsers will treat the em dash as glueing words together and will not consider breaking a line after the em dash. Example: in the—mostly American—style where.

If hyphens are used as surrogates for dashes, the browser behavior of breaking a line after a hyphen often is highly undesirable. An especially bad case is
5 - 6
where a hyphen surrounded by some space is used as a surrogate for a dash, and no-break spaces are used to prevent (generally, on any browser) bad line division like
5
- 6
but IE, although it usually obeys the semantics of the no-break space character, may split after the hyphen (between it and the no-break space):
5 -
?6

Note that the second line is indented, since it begins with a no-break space character! The IE behavior violates the semantics of the no-break space at least as defined in UTR #14: Line Breaking Properties , according to which the action of non-breaking characters like the no-break space is "to glue together both left and right neighbor character such that they are kept on the same line".

Such madness can be prevented by using nobr. Then you might just as well use normal spaces instead of no-break spaces: 5 - 6. But since you need to do something special anyway, you might just as well use the en dash: 5–6.

Consecutive hyphens

Consecutive hyphens appear in some special cases, such as surrogates for em dashes, between surnames in official French usage, and as operators in programming languages. Some browsers breaks only after the last of consecutive hyphens, but some versions of IE may introduce breaks elsewhere as well.

It is thus useful to protect consecutive hyphens against wrong breaks. For example, the French expression Martin--Durant can be marked up as Martin--Durant.

Breaks before hyphens

IE 8 oddly treats the hyphen as allowing a break both before and after it. No explanation to this has been given. This behavior has been retained IE?9.

In particular, IE?8 may break a sequence of hyphens at any point. Breaking before a hyphen, e.g. breaking "foo-bar" to "foo" and "-bar" normally makes no sense, of course. There is no simple way to prevent it on IE?8. You can use various methods of preventing line breaks, as discussed in this document, but this may awkward when strings with hyphens occur frequently. For example, any normal compound like bird-cage would need to be coded e.g. as follows (if a break after the hyphen is regarded as acceptable): bird->cage.

An alternative approach is to use the trick developed by Microsoft for throwing IE?8 into “IE?7 standards mode.” One way to do this is to use the following tag (in the head part):
<meta http-equiv="X-UA-Compatible" content="IE=EmulateIE7">
This tag should be placed before any style sheet or reference to a style sheet. Unfortunately, the tag also disables all the CSS enhancements in IE?8 as compared with IE?7; for a list of them, consult Microsoft's document CSS Compatibility and Internet Explorer .

Special case: phone numbers

Telephone (and fax) numbers cause line breaking problems fairly often. This might happen in tables: if you make a telephone catalogue a table, putting the numbers into one column, a browser might break the content of a number cell into two lines. This is not very nice, but you can prevent it by using the nowrap attribute as explained above. Assuming that the numbers are written as suggested below, you could alternatively just use no-break spaces instead of spaces, and this is the right way to do when a phone number does not appear inside a table.

Problems are often caused by the varying ways that people use to present phone numbers. Consider the following example:
+358 (0)9 451 4319
When used inside running text, this could create the problem discussed here, since IE might split it after the closing parenthesis. The appropriate solution is to use a correct notation, as standardized in the E.123 recommendation by CCITT/ITU-T and in different national standards based on it. See E-Series Recommendations Excerpts for an overview. In particular, you should use either the national use notation like
(09) 451 4319
or the international use notation
+358 9 451 4319
or, when applicable, both of them. (The parentheses should be omitted from the national use notation if the first part of the number is not an area code that can be omitted when dialing inside the area but a part of the number that must always be dialed.) Luckily it seems that when you write numbers that way, using no-break spaces instead of normal spaces, IE doesn't split them even after the closing parenthesis.

Problems in other browsers

Opera seems to treat hyphens mostly the same way as IE. Opera may even split "-a" to "-" and "a". So it shares the same basic problem as IE, and the same workarounds are effective.

Opera partly behaves even wilder than IE. For example, it has been observed to treat a slash (solidus) as a break character in the sense that a line break is permitted after it. This is actually useful when long URLs appear in the text, and in some other cases. But Opera may even break "1/2" into "1/" and "2"! (For this particular case, you can avoid the problem using the vulgar fraction character ƒX.)

Some versions of Opera may even insert a line break before a plus sign e.g. in the expressions a+b, C++ and U+1234.

Firefox may break after a hyphen but only if there are several characters on both sides of it. Firefox may also break before a backslash (\).

Chrome has oddities like breaking after a question mark (?) and before some currency symbols (u, v etc. but not $) and before the degree sign ([) and plus-minus sign (ƒA).

I have composed a simple linebreaking test file that covers the nonalphanumeric characters in ISO Latin 1 as well as some common characters outside it.

Is it really a bug?

The HTML specifications say:

Except for preformatted elements - -, each block structuring element is regarded as a paragraph by taking the data characters in its content and the content of its descendant elements, concatenating them, and splitting the result into words, separated by space, tab, or record end characters (and perhaps hyphen characters). The sequence of words is typeset as a paragraph by breaking it into lines.

Source: HTML 2.0, section Characters, Words, and Paragraphs

In HTML, there are two types of hyphens: the plain hyphen and the soft hyphen. The plain hyphen should be interpreted by a user agent as just another character. The soft hyphen tells the user agent where a line break can occur.

Source: HTML 4.0, section Hyphenation

The former allows, with the words "(and perhaps hyphen characters)", line breaking at any hyphen. So from this perspective, IE's behavior with hyphens is not a bug, though of poor quality. The latter on the other hand seems to prohibit such line breaks. Only soft hyphens would be permissible line breaking positions. This is a a highly debatable approach, on two accounts: the soft hyphen problem (the meaning of the character and the way browsers handle it), and the fact that this would seem to mean that the normal hyphen is regarded as nonbreakable. It's hard to tell whether the latter was actually meant, but this part of the specification seems to imply that IE's breaking behavior is to be classified as a bug.

It is probably not the intent in any of the specifications to prohibit word division when it occurs according to the rules of the natural language used and is otherwise acceptable. Since IE does not recognize the language, it should not split words, or any strings of non-whitespace characters. For special characters, the issue is rather complicated; see Unicode Technical Report #14: Line Breaking Properties (technicality alert!), which seems to allow a "direct break" between an alphanumeric character and an opening parenthesis, for example. This is hardly a good idea in general, though. It would even allow very dishono(u)rable breaks! I have written some more notes on these issues: Unicode line breaking rules: explanations and criticism .

Alan Wood has drawn my attention to situations where a very long string contains hyphens and the presentation is much better, if a browser can use hyphens as potential line break points. Such strings occur in chemistry, in systematic names of chemical compounds, as demonstrated well by Alan's page Abamectin fact sheet .

Admittedly, in most cases a hyphen is permissible line break point, if reasonable constraints are imposed so that e.g. a string like "-a" is not broken. It's hard to tell where the limits should be drawn, though. I think the absolute minimum is that there must be at least two characters on each side of the hyphen. And even when reasonable constraints are used, I think it's wrong for a browser to split after hyphens unless there is an effective and standards-conforming way to prevent such splits when they are wrong. As this document shows, the condition is currently not fulfilled. Moreover, I think there is insufficient information as regards to whether the rule about allowing a break after a hyphen is really a universally acceptable default or whether it should be applied only when the document language is specified in HTML markup and it is known to the browser that the rules of that language allow such breaks.

The dual problem: how to suggest possible “word” breaks

Why is it a problem?

Web browsers generally don't do word division; the IE behavior discussed above is an exception and applies to some special characters only. We mostly just have to live with that, until browsers become better and start recognizing the language used in a document (hopefully from lang attributes) and applying language-specific hyphenation rules. Explicit hyphenation hints are often suggested as a potential solution, but they are a clumsy way. However, browsers now support the soft hyphen reasonably well, so that it may be feasible to to give hyphenation hints for words in special cases, e.g. for very long words.

Moreover, browsers generally don't break long strings that don't contain spaces. Sometimes we have "words" (sequences of characters with no spaces inside them) that are so long that the presentation becomes very bad if the browser can't split them. This typically happens in more or less technical contexts, like writing about long URLs. (Long URLs aren't usually a problem, because they appear only in attribute values in HTML markup, not as the textual content. But sometimes you have a real need to mention a URL e.g. as an example.)

Moreover, browsers may refuse to break even at spaces, i.e. to separate strings from each other even if there is a normal space between them. Internet Explorer 7 seems to behave that way when the latter string begins with a colon ":". Such strings are common in some special areas. For example, if you discuss CSS and mention "the pseudoelement :before", IE?7 would treat the string as indivisible, unless you include some explicit line breaking hint before the colon. This would result in poor rendering like the following:

If you discuss CSS and mention the pseudoelement :before in the text, IE?7 would treat the string "pseudoelement :before" as indivisible. That is, it would not put "pseudoelement" on one line and ":before" on the next.

IE?7 has similar problems with other characters, such as a period ".", at the start of a string.

Moreover, IE 7 does not divide between a closing quotation mark and an opening parenthesis. This happens both for typographically correct (for English) “curly” quotation marks and for "vertical" (ASCII) quotation marks.

This is simple naivistic but realistic “example” (intentionally with somewhat long words) indicating how IE refuses to wrap between a quoted string and a parenthetic string.

Real word division vs. splitting a string

There are two different issues here:

Normal word division, in which a hyphen is introduced if the word is split to two lines. On web pages, this can be achieved using the soft hyphen.
Splitting a string to two (or more) lines at some point without introducing a hyphen, e.g. dividing a long pathname to several lines, e.g.
C:\Program Files\Netscape\Communicator\ Program\NetHelp\Netscape

Splitting a word containing a hyphen, e.g. "bird-cage", belongs technically to the latter category: no new hyphen is introduced.

If a URL, filename or some similar construct is split over several lines, one should clearly indicate what's going on so that the user knows what the actual URL or filename is. Usually an explicit note is needed. In some cases however you could use the method for "delimiting a URI in context with the characters < and > or with quotation marks, as suggested in appendix C of RFC 3986.

Word division with soft hyphen

Word processors are usually capable of automatic and semi-automatic hyphenation, but web browsers aren’t. Currently the only way to make them do any word division (except breaking after a hyphen) is to use the soft hyphen character.The soft hyphen can be entered by using the entity reference . Alternatively, it can be written as such, but beware that when editing an HTML document, the soft hyphen is usually invisible. On Windows, it can be entered using Alt+0173.

There are different strategies you can apply:

Add soft hyphens manually. This can be feasible if you use them just to deal with the longest words and with crucial texts, like teaser texts on a front page in a narrow column.
Use a preprocessor that hyphenates your texts and inserts soft hyphens accordingly.
Use a server-side hyphenator.
Use a client-side hyphenator. Check out a JavaScript hyphenation demo in sample code collection.

The use of soft hyphens cannot deal with complexities in word division, such as changes in the word when hyphenated (see examples of this in Use of Soft Hyphen in UAX #14). However, there as a special case that can be handled: in some languages and styles, word division after a hyphen is done so that the next line starts with a hyphen, while a hyphen also appears at the end of the first line. E.g., the Polish word czerwono-niebieska would thus be split as follows:
czerwono-
-niebieska.
According to the Unicode standard, the recommended practice in such cases is a soft hyphen followed by a non-breaking hyphen, instead of a single normal hyphen. In HTML, it is safer to apply the idea by using a soft hyphen followed by an Ascii hyphen with HTML markup that prevents a division after the latter, e.g. czerwono-niebieska.

"Break everything": the `word-wrap` property

According to Microsoft's documents, IE supports, from version 5.5 onwards, a CSS property named word-wrap, to affect situations where some content exceeds the boundaries of its container. (IE?8 introduced the synonym -ms-word-wrap for this property.) The value normal is the normal behavior (content exceeds the boundaries of its container) whereas break-word means that "Content wraps to next line, and a word-break occurs when necessary." This formulation is misleading, since what really happens is splitting a string as described above. So this has nothing to do with normal processing of words of human languages, and the string just breaks, with no indication of its continuing on the next line.

The following example tests whether your browser supports this feature, which is not in any CSS specifications (though it is present in the CSS3 Text draft). The test has two div eleƒ…ments with a fixed width of 150 pixels and content that most probably exceeds the width, and the second element has word-wrap: break-word assigned to it.

supercalifragilisticexpialidocious

Some other browsers, such as as Firefox (3.1. and newer) and Safari, have started supporting word-wrap to some extent.

This feature can be used to produce “emergency breaks” as the least of evils in some situations. Perhaps most often, it could be useful when code expressions like programming language variables or lines in block sample of code are too long to fit into the available space. The presentation might be understandable to the user especially if accompanied with a note like “long lines in code samples may have been split for technical reasons”. Unfortunately, this does work too well: the setting may make browsers primarily break after spaces and some special characters like “/”.

<script src=
"//translate.google.com/translate_a/element.js?cb=googleSectionalElementInit&amp;ug=section&amp;hl=en">
<script>

In a technical context, if a paragraph contains a long variable name like supercalifragilisticexpialidociousObfuscationEnabled and its font face or color or other property distinguishes it from normal text, the user probably sees it as one identifier. Hyphenation would not be suitable, as the hyphen could be misinterpreted as being part of the identifier. Unfortunately, white-space: pre-wrap is not effective in such cases, unless the column is so narrow that a word does not fit into one line even if it is alone there. Instead, you need to scatter wbr tags around in the string.

Explicit breaks

You can of course specify explicit line breaks using br markup (or pre markup). This is often the best way to deal with the problem at hand. However, it is rather inflexible when the long strings occur within running text - where you would like to have the browser do its normal formatting so that the presentation adjusts to the available canvas width but so that line breaks may occur at some points inside the problematic strings.

The practical way: `wbr` ("word break")

Early history of `wbr`

According to Netscape's old HTML Tag Reference ,

The WBR tag marks a place where a line break can take place. It does not necessarily always result in a line break; rather, it says that line breaks are allowed at this place. You could use it, for example, inside a NOBR tag to allow a line break. Navigator 1.1

The description is vague, but actual implementations treat wbr as allowing a "string split" rather than normal word division, i.e. no hyphen is introduced.

Example of using `wbr`

The following markup samples illustrate possible uses of wbr:

Enter the command View/<wbr>Preferences/<wbr>Display/<wbr>Common.
We use the pseudoelement <wbr>:before.
Check your <wbr>.htaccess file.
This is an "example" <wbr>(really).

Moreover, wbr could be used to suggest possible break points using markup like <http://www.hut.fi/u/jkorpela/html/nobr.html>. Here is a more complicated example as displayed on your browser: <http://www.eduskunta.fi/triphome/bin/thw/trip?${base}=vpasia&${html}=vs5000&tunnus=HE+211%2F1992>. It's probably a good idea to use  only after such characters that visually suggest that there might be something that follows, such as a solidus, an equals sign, or an ampersand.

One might consider using nobr and wbr together to divide text into chunks which may not be split across lines but so that line breaks between the chunks are permitted. But there does not seem to be any particular reason to use wbr for such purposes. You can usually just put each chunk into a nobr element of its own, with normal spaces or line breaks between them of course.

Properties of `wbr`

The wbr element is empty (no end tag is used) and it has no attributes. That is, it is always just the tag .

It is comparable to   except that   forces a line break in rendering whereas  allows a line break.

Should you wish to use it in XHTML, you would have to use the XML convention for empty elements, i.e. .

Later developments in support to `wbr` (historical)

Other browsers such as IE have support to wbr, too. However, reports on browser support are often misleading. E.g., Opera was claimed to support it long before it actually did. On the other hand, in many cases where you would use wbr, this does not matter, since Opera automatically splits long strings e.g. after a "/" character.

It seems that around version 4.5 or so, Netscape itself dropped support to wbr. The support was restored in Netscape 6, later dropped again! Anyway, this is ancient history now, since Netscape browsers have very little users, and support to wbr exists in Mozilla Firefox.

Moreover, as if this were not mad enough, it seems that on IE 5, deviating from IE 4 and IE 6, wbr works only inside nobr markup, so you would have to use it too, around the string. But then Opera would treat the string as unbreakable, since it recognizes nobr but not wbr.

Sufficiently new versions of Opera will handle wbr in the intended way, if you use the following style sheet, which effectively tells the browser to insert U+200B, i.e. ZWSP after each wbr element:

wbr:after { content: "00200‰~B"; }

Newest versions of Opera seem to support wbr natively.

The IE 8 problem

For some odd reason, IE 8 stopped supporting wbr in “standards mode”, so now some additional trickery may be needed. IE 8 does not even recognize the wbr element in “standards mode,” so it does not help to assign a CSS rule for that element. Instead, we need to use extra markup that introduces an empty element, for which we can add ZWSP as generated content. (If we used ZWSP as character data in document content, we would have problems with older browsers, as described later.) For example, to allow a line break after “/” in “foo/bar”, we could write e.g.
foo/<a class="wbr"></a>bar
and use the CSS rule
.wbr:after { content: "00200‰~B"; }

The use of the a element instead of the span element here is just a convenience: shorter element name. As such, without any attributes except class, the a element is semantically empty.

Alternatively, you can use a trick for setting IE?8 to IE?7 emulation mode as described previously. It makes the wbr work, though it implies restrictions.

You might also consider ignoring this problem, if it does not result in too bad formatting in your case. This problem does not seem to exist in IE?9.

Simple line breaking, not hyphenation

Note that you should not use wbr for hyphenation hints inside words, since no hyphen is introduced if a browser breaks a line due to wbr.

Using wbr for suggested line break possibilities after hyphens, e.g. in words like bird-cage (using markup bird-cage) is probably safe. Since IE by default treats hyphens as allowing a line break after them, this would have no effect on IE, but it may help on other browsers.

Validation issue

Since wbr is not in any HTML specifications, it causes validation problems just as nobr does. To create a customized DTD that allows wbr, you can add WBR to the definition of %special and define WBR analogously with BR. There's a modified "loose" DTD for the purpose at <http://jkorpela.fi/html/loosewbr.dtd>

The Unicode way: ZWSP

Theoretically, you could suggest additional line break points in strings – not as hyphenation hints but as possibilies for breaking e.g. a URL into two lines, without introducing a hyphen – using special characters. Unicode Technical Report #14: Line Breaking Properties specifies that the zero width space character (ZWSP, U+200B , character reference in HTML: ) does not have width and "it is used to enable additional (invisible) break opportunities wherever space cannot be used". However, support to this character in browsers is still limited, and attempts to use it may cause confusion, since some old browsers just display a generic symbol for an undisplayable character in place of it.

Just for demonstration, here is how your browser displays a previous example modified to use ZWSP instead of wbr markup: <http://www.hut.fi/u/jkorpela/html/nobr.html>. Internet Explorer 6 recognizes the character in the sense of treating it so that a line break is allowed after it. But what it displays depends on the font in use and is typically a small box or a bullet, which is rather enigmatic. The problem is that IE does not simply suppress the character in rendering but uses the current font. The only commonly used fonts that have a glyph for ZWSP are Arial Unicode MS and Lucida Sans Unicode. In IE 7, the problem was fixed: the browser effectively treats ZWSP as an invisible control character.

Modern browsers generally handle ZWSP well, and it can be expected to work in the great majority of situations. Perhaps it might even be expected to work more often than the wbr markup.

The HTML 4 specification mentions ZWSP in the section White space . It defines ZWSP as a white space character but leaves it rather open how ZWSP should or might be treated in visual presentation. Note that the XHTML specification also discusses ZWSP, in section User Agent Conformance . It seems that those specifications treat ZWSP as a character to be used in some writing systems only, e.g. to denote grammatic boundaries, rather than as in the general meaning defined in the Unicode report: "This character does not have width. It is used to enable additional (invisible) break opportunities wherever SPACE cannot be used."

Tricks to create breaking opportunities

Several tricky ways to fool browsers into treating a string as breakable have been suggested, i.e. inserting a very small image or a space character in a very small font. Such ways can be regarded as simulations of zero-width spaces: characters or elements that are taken as separating words but have no width.

Some of them work some of the time, causing confusion when they don't, and if they work, they mostly work on browsers where the wbr method or the ZWSP method works, too.

Using a transparent single-pixel GIF and the markup
<img src="transp.gif" width="0" height="0" alt="">
between two characters has been observed to work on many browsers but to fail on some.

Tricks based on using a space in reduced font size are unreliable, because some modern browsers let the user specify a minimum font size and many people do so, to avoid illegibly small test.

A better alternative is to set the width of a space to zero. You would use markup like <a class=s> </a> and a CSS rule like .s { display: inline-block; width: 0; }. Old versions of IE do not recognize inline-block, but they still set the width to zero, since they apply the `width` property to inline elements, too.

How Unicode rules prevent line breaks

In addition to permitting line breaks where you wouldn't expect them, the Unicode line breaking rules prevent line breaks in intuitively odd situations. For example, if a word is followed by a string that begins with a full stop (period, "."), the rules prohibit a break between the words. And IE (at least versions IE 6 and IE?7) behaves that way, as we already noted.

Demo: .foo .foo .foo .foo .foo .foo .foo .foo .foo .foo .foo .foo .foo .foo .foo .foo .foo .foo .foo .foo .foo .foo .foo .foo .foo .foo .foo .foo .foo .foo .foo .foo .foo .foo .foo .foo .foo .foo .foo .foo .foo .foo .foo .foo .foo .foo .foo .foo .foo .foo .foo .foo .foo .foo

For example, a sequence of strings like ".foo" is unbreakable on IE, typically creating horizontal scrolling. Such strings are rare, but may arise in special situations.

The quick cure, if wrapping is important, is to insert the wbr tag before each "word" that begins with a period (.foo .foo .foo ...). The theoretically correct, standards-conforming solution would be to use zero-width spaces, , but it both fails in most browsing situations and usually fails miserably when it does not work.

Getting even more theoretical if possible, the full stop (period) character "." belongs to line breaking class IS, "infix separator (numeric)" according to Unicode rules, and this means that it prevents a line break before it (as well as after it when followed by a numeric character). The effect of a space is described so that a break is normally permitted after a space (and a space at the end of a line is then effectively treated as nonexistent). So in "foo .text" the space permits a break after it, the period prohibits a break before it, and spending just a few hours reading the Unicode Standard annex #14 you might find out that the latter wins. I'm just kidding, it says this rather explicitly in rule LB 8: Don't break before a character in class IS even after spaces.

A similar case is a closing quotation mark followed by whitespace and an opening parenthesis. IE does not break at whitespace in such cases, though this can be fixed as outlined above.

This text containing “English style” (typographer’s) quotation marks does not wrap on IE. The browser treats the closing quote and the opening parenthesis as tied together even over a space.

This text containing “English style” typographer’s quotation marks wraps properly. It does not contain parentheses, so it does not exhibit the problem discussed here.

This text containing “English style” (typographer’s) quotation marks wraps properly. There is a line break hint, , between the closing quote and the opening parenthesis.

HTML specifications are silent about Unicode line breaking rules, for the most of it. The specifications generally refer to Unicode as a basis for character definitions, so one might say that the spirit is that those rules should be obeyed. And this is what browsers have started doing, unfortunately.

Date of creation: 2000”N03ŒŽ14“ú. Last update: 2013”N01ŒŽ04“ú.

This page belongs to section Web authoring and surfing the free information site IT and communication by Jukka "Yucca" Korpela.

Word division in IE and other notes on the nobr markup and on suggesting possible "word" breaks