nobr markup
Internet Explorer (IE) divides strings into two lines in a
problematic way. It treats any hyphen as a potential
word break point, thus even breaking "-a" to "-"
and "a".
Moreover, it treats several special characters as allowed break points, too.
It even splits expressions like "f(0)" to "f" and
"(0)".
Other browsers have similar behavュior; Opera may even split
"1/2" into "1/" and "2".
You can prevent line breaks
by using
the nonstandard nobr markup or some other methods.
Consider using such methods, when you have a string that
should not be broken across lines and contains any of the following
characters:
-()[]{}«»%ー·\/!?
As regards to the the dual problem
"how can I suggest to a browser that
a string can be broken",
the nonstandard wbr markup may help sometimes,
but it is best to avoid long strings that contain no spaces.
Any character other than letters and digits can cause problems in line breaking, either by causing a line break where it is not appropriate or by preventing a line break between words. Commas, periods, question marks, and exclamation marks are safe, though, when appearing immediately after a word and followed by a space. Otherwise, it is safest to deal with any punctuation mark or special character as follows:
nobr
markup. Examples: <nobr>F-1</nobr>,
<nobr>f(0)</nobr>.
5 m (displays as 5?m).
wbr tag if a line break is acceptable
after a hyphen. Examples:<nobr>bird-<wbr>cage</nobr><nobr>inter-</nobr><wbr>organizationalnobr is used to prevent a line break before
the hyphen on IE. So it deals with a different issue, but if you add
tags for line breaking control, you might just as well solve both problems.)
wbr tag when a closing quotation mark
is followed a space and an opening parenthesis or bracket.
Example:
He said “foo”<wbr> (or something like that).
wbr tag before a string that begins
with a character other than letter or digit. Example:
the <wbr>.htaccess file.
ethno­botany
(or maybe even
eth­no­bot­a­ny).
wbr
tags at permissible break points, if you can clearly indicate the
continuation of the string to the next line. Example:
Go to URL “http://<wbr>test.example.com/<wbr>demos/<wbr>basic.html” now.
wbr tag:
<a class=wbr></a> (an element with empty content)
with the CSS rule .wbr:after { content: "00200円B"; }
This document discusses the nobr markup
mainly as a tool for preventing bad line breaks around hyphens and
other punctuation or special characters.
It can also be used to keep consecutive word on the same line,
though for such purposes
there is an alternative method which does not require nonstandard markup:
using
no-break spaces ( )
instead of normal spaces.
IE may break text into lines before or after a special character, even when no space intervenes. This depends on context and browser version. Such behavior has been observed in the following situations:
The list is most probably not comprehensive.
See my description of ISO Latin 1 for notes on the various uses of these and other characters in various contexts.
This means, for example, that "Latin-1" can be broken to "Latin-" at the end of one line and "1" at the beginning of the next one. It even leaves a lone "-" at the end of a line; this is very bad, not only because of the use of a hyphen as a minus sign but also because e.g. the Finnish language uses words with a leading "-" (to indicate that the first part of a word has been omitted). Even worse, "person(s)" can be broken to "person" and "(s)" and "a[0]" to "a" and "[0]".
As regards to guillemets , note that in the French quotation style, « comme ici », it seems to be sufficient to use a no-break space (instead of a normal space) between the guillemet and the first or last word of the quotation, which is recommendable anyway. The document HTML authoring in French contains some notes on spacing related to guillemets and other punctuation.
But in other guillemet usage styles,
»wie
hier«, several versions of IE
seem to break after the initial guillemet
(under some conditions, i.e. when
the current line has room for that lone guillemet
but not for the word immediately after it).
It seems that this problem has been fixed in IE 6.0.
But people using older versions could still see
homeless guillemets unless you use
nobr to prevent that
Generally, it's probably best to accept that risk –
except perhaps in headings and other very essential texts,
where you might consider whether you wish to take some precautions.
Breaking after the percent sign %
is very nasty in some situations where that character is used
in a special meaning in a programming language or otherwise
(e.g. URLs).
In particular, expressions like %foo and %20
can be split so that % appears alone at the end of a line!
Luckily IE does not seem to split strings like "%:n".
Breaking after the degree sign ー is harmful too, especially in contexts like "100 ーC".
In my tests, IE has broken a line after the cent sign 「 too. But I haven't seen any real-life situation where that could occur. (IE does not break "「:n" for example.)
Moreover, IE treats the combination &# as breakable.
This is very nasty in a document discussing
character references like
Ä. It seems that this problem
has been fixed in IE 5.
IE also introduces a break point between the solidus and the
reverse solidus in the combination /\.
This is admittedly a rare character sequence, but I noticed this problem
when actually using it in a specific context where
it is essential.
The description above is probably not exhaustive. And the IE behavior may depend on version number, platform, and context (specific character sequences). I haven't tested much what IE does with other than ISO Latin 1 characters, but IE has broken a line after the following "Windows characters": ellipsis, permile, bullet, em dash, en dash, right single guillemet, right single quote, right double quote, and before the following: left single guillemet, left single quote, left double quote.
Quite often, splitting a word containing a hyphen is acceptable, even desirable. It can make the document look better, especially if there are long words in the text, including compound words containing a hyphen. But the problem is that IE treats any hyphen as a possible line break point and breaks even a three- or two-character string.
Moreover, breaking after some special characters, though occasionally useful, is a potential source of great confusion and even ambiguity.
There are several alternative approaches to preventing undesired line breaks, and since they operate at different level of text representation, they could even be used in a combination. The approaches are summarized in the following table. In the first four approaches, the example shows how to prevent a line break between the number and the unit in "100?m". In the other approaches, the example expression is "A-1".
| Level | Method | Example | Notes |
|---|---|---|---|
| HTML markup | nobr element
| <nobr>100?m</nobr> | Nonstandard, but works widely. |
| HTML markup | nowrap attribute
| <td nowrap>100?m</td> | Only applicable to table cells. Severe limitations in support. |
| CSS style sheet | white-space: nowrap
| <span style="white-space: nowrap">100?m</span> | Usually works when set on the innermost element. |
| Character level | no-break space | 100 m | Works well to prevent inter-word breaks but not for the problems discussed here. |
| Character level | non-breaking hyphen | A‑1 | Applies to preventing a break after a hyphen. Limited usefulness due to font problems. |
| Character level | word joiner | A-⁠1 | Wide applicability in principle, but very limited usefulness due to lack of support. |
| Character level | zero-width no-break space | A-1 | Very limited usefulness due to lack of consistent support on IE. |
There are many quirks and oddities in browser support to these
methods. There is
page for testing the methods
in some simple cases.
Generally, nobr is the one that works most often.
nobr markupThe safest way to prevent undesirable line breaks is
the nonstandard,
Netscape-invented (!) nobr markup.
It has never been defined exactly.
Browsers generally treat it in a command-like fashion:
<nobr> is taken as "disallow line breaks
from now on" and
</nobr> says "line breaks allowed from now on".
But it is safest to use it as
text-level markup
only. This should suffice, since we normally would use
nobr for short pieces of text
only, as in
<nobr>vis-a-vis</nobr>
or
<nobr>-a</nobr>.
A very short quotation using guillemets could be put
into a single nobr element. But generally the approach
of making a quotation as a whole non-breakable is not suitable.
Instead, you can put just the initial guillemet and the first word inside
nobr markup, and similarly for the last word and
the closing guillemet.
Moreover,
the markup will prevent hyphenation too.
If you now use, say, the markup
<nobr>サAnf?hrungszeichenォ</nobr>,
then future browsers that will
apply hyphenation to words
will not do that for this word.
For such reasons, you may wish to make the scope (content)
of the nobr element minimal, even though it
looks slightly odd in the markup, e.g.
<nobr>サA</nobr>nf?hrungszeiche<nobr>nォA</nobr>,
The nobr markup
could also be used to keep images on one "line",
side by side.
As an alternative to using nobr,
you could put images into a
table.
won't doWe cannot, in general, solve the problems discussed here by using
no-break spaces ( ).
If you insert a no-break space character, you insert a character
which is like a space but may not be replaced by a line break
in formatting.
Normally this is not desirable, since you don't want an
expression like "person(s)" rendered as "person (s)",
do you?
In special cases it might appear to be acceptable and even desirable to have extra spacing horizontally, especially in situations like f (0); but it is a moot point, as explained in my notes on mathematic notations in HTML.
nowrap may work In special cases,
when the data is in a
table cell
and it is adequate to prevent all line breaks in it,
you can use the nowrap attribute for the td
or th element. That attribute is "deprecated", but
it is still valid.
Note, however, that browsers seem to ignore it, if a fixed width is set for the cell so that the content doesn't fit into that without wrapping. This happens for a fixed width in pixels, em units, etc.
white-space: nowrap Setting
white-space: nowrap in CSS has
the same effect on an element as wrapping it inside
nobr markup. The reasons for preferring nobr are:
nobr markup works even when CSS is disabled
(see the usual CSS caveats).
<span class=nobr>A-1</span><nobr>A-1</nobr>
If the text that needs to be kept on one line has already
been made an element for some other reason, then it is slightly
more convenient to use CSS than the nobr markup.
For example, if your page discusses programming and mentions an
expression like i-- marked up as
<code>i--</code>,
then it is simplest to just add an
attribute there:
<code class=. Of course, you would then need
a rule like
.nobr { white-space: nowrap?}
in your CSS code.
Historically, the white-space property
has been problematic. It was originally defined as relating to
white space characters only, and its name suggests the same.
It also oddly limited to block level elements. However, CSS 2.1
has cleared things up, and browser support has been good ever
since IE?6.
There is a bug in some versions of IE: if you set
white-space: nowrap for a table cell
it may fail to work. To overcome this, put the cell contents inside
an auxiliary element (usually span or div) and
set
white-space: nowrap on that element.
In theory, the Unicode standard says that
the character to be used to prevent line breaks in general is the
word joiner (WJ) character
(U+2060, representable in HTML as ⁠):
This would mean that in order to keep e.g. the string
2003年03月24日 on one line you would use
2003-⁠03-⁠24. Here is the
rendering on your browser:
2003-03-24. The rendering
may contain indications of unrepresentable characters, since
many browsers do not recognize the character as having the special status,
so they will just try to display it; and
most fonts do not contain any glyph for the character.
Even when a rare font (e.g., Code2000) containing a glyph for the word joiner is used, the method fails. IE still divides the text the same way as without the word joiner, apparently because it does not know the special meaning of this character.
It's no wonder that this method does not work (yet), since the word joiner character was introduced into Unicode as late as in Version 3.2 (date: 2002年03月27日). It generally takes many, many years before characters added to Unicode are widely supported in fonts.
We might also consider using the zero-width no-break space (ZWNBSP) character. Within text, it has the same meaning as the word joiner, and it has been in Unicode much longer. It is recognized by IE in the sense that the browser knows that it has no glyph, but it depends on IE version whether it prevents a line break; on IE?7, it does not. In principle, its use for the purpose of preventing line breaks has been officially discouraged, since it is also used as a byte order mark, and it was deemed appropriate to reserve it for that very purpose.
In practice, you can represent ZWNBSP as
 or
.
Thus, in order to keep e.g. the string
2003年03月24日 on one line you would use
2003-03-24. Here is the
rendering on your browser:
2003-03-24.
nobr Since nobr is not in any HTML specifications,
it causes problems in validation.
You can use nobr and still validate your pages,
if you use a modified Document Type Definition.
For an explanation, see
Creating your own DTD for HTML validation .
(If you use HTML5, you just have to ignore the error messages,
since HTML5-based “validation” is based on the prose
of HTML5 drafts, not on DTDs, and the drafts currently forbid
the nobr markup as “obsolete”.)
Basically you would use a DTD containing the following modified clause (addition indicated with emphasis):
<!ENTITY % phrase "EM | STRONG | DFN | CODE | NOBR | SAMP | KBD | VAR | CITE | ABBR | ACRONYM" >
I have made modified versions of HTML 4.01 DTDs:
http://jkorpela.fi/html/strict.dtd (for Strict HTML 4.01)
http://jkorpela.fi/html/loose.dtd (for Transitional HTML 4.01)
http://jkorpela.fi/html/frameset.dtd (for framesets)
It is preferable to copy the DTD you need, instead of
referring to the addresses above.
You would use a DOCTYPE declaration like the following
(instead of a normal DOCTYPE which uses the
PUBLIC keyword), naturally replacing the
address inside quotation marks as needed:
<!DOCTYPE HTML SYSTEM "http://jkorpela.fi/html/strict.dtd">
Note that the W3C validator
has an internal limitation ("the number of tokens in a group must not exceed
GRPCNT (64)") which prevents you from using it in cases like this;
the addition of NOBR happens to exceed that limit!
So you need to use e.g. the
WDG validator instead or to modify the DTD more, by
removing something (like I have actually removed acronym
from the Transitional DTD mentioned above) when you add
nobr.
Naturally, if you use a modified DTD, you can't
claim conformance to HTML 4.01 or some other specification.
If your customer or boss requires such conformance, you are
out of luck. You might still try to explain to him that your
use of nobr is a workaround to an IE bug/deficiency.
The common hyphen character, or the Ascii hyphen, is semantically ambiguous, as explained in the document Dashes and hyphens . Therefore it could be replaced by other, semantically more specific Unicode characters, hoping that browsers do not regard them as line break points. This may also affect rendering.
However, many of the alternative characters are poorly supported
in fonts and by browsers. Writing them can be an issue, too, but
they can be presented using
character references
such as
‑ for the non-breaking
hyphen character or − for
the minus sign.
| Character | Sample
|
|---|---|
| Ascii hyphen | bird-cage
|
| Minus sign | bird−cage
|
| Unicode hyphen | bird‐cage
|
| Non-breaking hyphen | bird‑cage
|
| En dash | bird–cage
|
| Em dash | bird—cage
|
The demo here illustrates how your browser handles some hyphen-like characters when a line break would be needed. The second column contains a sample expression in a box with fixed width. Some of the expressions are contrived, just to create comparability; in particular, the minus sign should never appear between letters without spaces.
The minus sign is defined in Unicode as a separate character, and this character is preferable in texts and formulas. However, in programming languages and in discussing them, the Ascii hyphen should be used, as it is the character used in programming by language definitions. Example:
The number −42 is written as -42 in JavaScript.
In addition to being semantically more appropriate and visually
much more suitable, the minus sign has the property of disallowing
a line break after it, when immediately followed by a non-space
character. Thus, there is no line break issue with
−42 (which can be written in HTML as −42).
When used as a binary operator (for subtraction), the minus sign
should be preceded and followed by a space, according to standards on
mathematical expressions. Undesired line breaks (normally,
at least before the operator) can be prevented by using no-break spaces. For
example, a?−?b can be written as
<i>a</i> &minus <i>b</i>.
The Unicode hyphen has mostly the same line breaking properties as the Ascii hyphen. There is not much point in using it, because it mostly looks the same as the common hyphen, if the font contains it at all.
An undesired line break e.g. in a-b can in principle be prevented by writing it as a‑b. However, although such constructs are well supported by modern browsers, the support is not yet universal, and the data gets badly distorted when a browser does not support them. The main problem is that many common fonts do not contain the non-breaking hyphen, and browsers may render e.g. a small rectangle or question mark instead. (In fact, Arial Unicode MS and Lucida Sans Unicode are the only fairly common fonts that contain it.)
In good old typesetting,
The em dash is used e.g. for
parenthetic remarks—like this—
These days, the en dash and the em dash can be used rather confidently.
They have, however, some of the problems of hyphen characters. According
to line breaking rules, and in the practices of some browsers, either
of these characters allows a line break after it, even if no space
intervenes. This complies with old typographic rules, but the en dash,
it results in practically poor breaks in short expressions like
1–2. Therefore, just as for similar expressions with
hyphens, there is often a reason to prevent line breaks:
<nobr>1–2</nobr>.
If you use the em dash in the—mostly American—style where
it appears between words without spaces, use wbr after each
em dash. Otherwise many browsers will treat the em dash as glueing
words together and will not consider breaking a line after the em dash.
Example: in the—<wbr>mostly
American—<wbr>style where.
If
hyphens are used as surrogates for dashes,
the browser behavior of breaking a line
after a hyphen often is highly undesirable. An especially bad case is
5 - 6
where a hyphen surrounded by some space is used as a surrogate for
a dash, and no-break spaces are used to prevent (generally, on
any browser) bad line division like
5
- 6
but IE, although it usually obeys the semantics of the no-break space
character, may split after the hyphen (between it and the no-break
space):
5 -
?6
Note that the second line is indented, since it begins with a no-break space character! The IE behavior violates the semantics of the no-break space at least as defined in UTR #14: Line Breaking Properties , according to which the action of non-breaking characters like the no-break space is "to glue together both left and right neighbor character such that they are kept on the same line".
Such madness can be prevented by
using nobr. Then you might just as well
use normal spaces instead of no-break spaces:
<nobr>5 - 6</nobr>.
But since you
need to do something special anyway, you might just as well
use the en dash: <nobr>5–6</nobr>.
Consecutive hyphens appear in some special cases, such as surrogates for em dashes, between surnames in official French usage, and as operators in programming languages. Some browsers breaks only after the last of consecutive hyphens, but some versions of IE may introduce breaks elsewhere as well.
It is thus useful to protect consecutive hyphens against wrong breaks.
For example, the French expression
Martin--Durant can be marked up as
Marti<nobr>n--</nobr>Durant.
IE 8 oddly treats the hyphen as allowing a break both before and after it. No explanation to this has been given. This behavior has been retained IE?9.
In particular, IE?8 may break a sequence of hyphens at any point.
Breaking before a hyphen,
e.g. breaking "foo-bar" to "foo" and "-bar"
normally makes no sense, of course. There is no simple way to prevent it
on IE?8. You can use various methods of preventing line breaks,
as discussed in this document, but this may
awkward when strings with hyphens occur frequently. For example,
any normal compound like bird-cage would need to be coded
e.g. as follows (if a break after the hyphen is regarded as
acceptable):
<nobr>bird-</nobr>>cage.
An alternative approach is to use the trick developed by Microsoft
for throwing IE?8 into
“IE?7 standards mode.” One way to do this is to use the
following tag (in the head part):
<meta http-equiv="X-UA-Compatible" content="IE=EmulateIE7">
This tag should be placed before any style sheet or
reference to a style sheet. Unfortunately, the tag also disables
all the CSS enhancements in IE?8 as compared with IE?7; for a list
of them, consult Microsoft's document
CSS Compatibility and Internet Explorer .
Telephone (and fax) numbers cause line breaking problems fairly
often. This might happen in tables: if you make a telephone catalogue a table, putting
the numbers into one column, a browser might break the content of
a number cell into two lines. This is not very nice, but you
can prevent it by using the
nowrap attribute
as explained above. Assuming that the numbers are written
as suggested below, you could alternatively just use
no-break spaces instead of spaces, and this is the right way to
do when a phone number does not appear inside a table.
Problems are often caused by the varying ways that people
use to present phone numbers. Consider the following example:
+358 (0)9 451 4319
When used inside running text, this could create the problem
discussed here, since IE might split it after the closing parenthesis.
The appropriate solution is to use a correct notation, as
standardized in the
E.123 recommendation by
CCITT/ITU-T
and in different national standards based on it.
See
E-Series Recommendations Excerpts for an overview.
In particular, you should use either the national use notation
like
(09) 451 4319
or the international use notation
+358 9 451 4319
or, when applicable, both of them.
(The parentheses should be omitted from the national use notation
if the first part of the number is not an area code
that can be omitted when dialing inside the area but a part of the
number that must always be dialed.)
Luckily it seems that when you write numbers that way, using
no-break spaces instead of normal spaces, IE doesn't split them
even after the closing parenthesis.
Opera seems to treat hyphens mostly the same way as IE. Opera may even split "-a" to "-" and "a". So it shares the same basic problem as IE, and the same workarounds are effective.
Opera partly behaves even wilder than IE. For example, it has been observed to treat a slash (solidus) as a break character in the sense that a line break is permitted after it. This is actually useful when long URLs appear in the text, and in some other cases. But Opera may even break "1/2" into "1/" and "2"! (For this particular case, you can avoid the problem using the vulgar fraction character ス.)
Some versions of Opera may even insert a line break before a plus sign e.g. in the expressions a+b, C++ and U+1234.
Firefox may break after a hyphen but only if there are several characters on both sides of it. Firefox may also break before a backslash (\).
Chrome has oddities like breaking after a question mark (?) and before some currency symbols (「, 」 etc. but not $) and before the degree sign (ー) and plus-minus sign (ア).
I have composed a simple linebreaking test file that covers the nonalphanumeric characters in ISO Latin 1 as well as some common characters outside it.
The HTML specifications say:
Except for preformatted elements - -, each block structuring element is regarded as a paragraph by taking the data characters in its content and the content of its descendant elements, concatenating them, and splitting the result into words, separated by space, tab, or record end characters (and perhaps hyphen characters). The sequence of words is typeset as a paragraph by breaking it into lines.
In HTML, there are two types of hyphens: the plain hyphen and the soft hyphen. The plain hyphen should be interpreted by a user agent as just another character. The soft hyphen tells the user agent where a line break can occur.
The former allows, with the words "(and perhaps hyphen characters)", line breaking at any hyphen. So from this perspective, IE's behavior with hyphens is not a bug, though of poor quality. The latter on the other hand seems to prohibit such line breaks. Only soft hyphens would be permissible line breaking positions. This is a a highly debatable approach, on two accounts: the soft hyphen problem (the meaning of the character and the way browsers handle it), and the fact that this would seem to mean that the normal hyphen is regarded as nonbreakable. It's hard to tell whether the latter was actually meant, but this part of the specification seems to imply that IE's breaking behavior is to be classified as a bug.
It is probably not the intent in any of the specifications to prohibit word division when it occurs according to the rules of the natural language used and is otherwise acceptable. Since IE does not recognize the language, it should not split words, or any strings of non-whitespace characters. For special characters, the issue is rather complicated; see Unicode Technical Report #14: Line Breaking Properties (technicality alert!), which seems to allow a "direct break" between an alphanumeric character and an opening parenthesis, for example. This is hardly a good idea in general, though. It would even allow very dishono(u)rable breaks! I have written some more notes on these issues: Unicode line breaking rules: explanations and criticism .
Alan Wood has drawn my attention to situations where a very long string contains hyphens and the presentation is much better, if a browser can use hyphens as potential line break points. Such strings occur in chemistry, in systematic names of chemical compounds, as demonstrated well by Alan's page Abamectin fact sheet .
Admittedly, in most cases a hyphen is permissible line break point, if reasonable constraints are imposed so that e.g. a string like "-a" is not broken. It's hard to tell where the limits should be drawn, though. I think the absolute minimum is that there must be at least two characters on each side of the hyphen. And even when reasonable constraints are used, I think it's wrong for a browser to split after hyphens unless there is an effective and standards-conforming way to prevent such splits when they are wrong. As this document shows, the condition is currently not fulfilled. Moreover, I think there is insufficient information as regards to whether the rule about allowing a break after a hyphen is really a universally acceptable default or whether it should be applied only when the document language is specified in HTML markup and it is known to the browser that the rules of that language allow such breaks.
Web browsers generally don't do word division; the IE behavior
discussed above is an exception and applies to some special characters only.
We mostly just have to live with that, until browsers become better and
start recognizing the language used in a document
(hopefully from lang attributes)
and applying language-specific hyphenation rules.
Explicit hyphenation hints are often suggested as a potential
solution, but they are a clumsy way.
However, browsers now support the
soft hyphen
reasonably well, so that it may be feasible to
to give hyphenation hints for words in special cases, e.g.
for very long words.
Moreover, browsers generally don't break long strings that don't contain spaces. Sometimes we have "words" (sequences of characters with no spaces inside them) that are so long that the presentation becomes very bad if the browser can't split them. This typically happens in more or less technical contexts, like writing about long URLs. (Long URLs aren't usually a problem, because they appear only in attribute values in HTML markup, not as the textual content. But sometimes you have a real need to mention a URL e.g. as an example.)
Moreover, browsers may refuse to break even at spaces,
i.e. to separate strings from each other even
if there is a normal space between them.
Internet Explorer 7 seems to behave that way when the latter string
begins with a colon ":". Such strings are common in some special areas.
For example, if you discuss CSS and mention
"the pseudoelement :before",
IE?7 would treat the string as indivisible,
unless you include some explicit line breaking hint before the colon. This would result in poor rendering like the following:
:before in the text, IE?7 would treat the
string "pseudoelement :before" as indivisible. That is, it
would not put "pseudoelement" on one line and ":before"
on the next.
IE?7 has similar problems with other characters, such as a period ".", at the start of a string.
Moreover, IE 7 does not divide between a closing quotation mark and an opening parenthesis. This happens both for typographically correct (for English) “curly” quotation marks and for "vertical" (ASCII) quotation marks.
There are two different issues here:
C:\Program Files\Netscape\Communicator\
Program\NetHelp\Netscape
Splitting a word containing a hyphen, e.g. "bird-cage", belongs technically to the latter category: no new hyphen is introduced.
If a URL, filename or some similar construct is split over several
lines, one should clearly indicate what's going on
so that the user knows what the actual URL or filename is.
Usually an explicit note is needed. In some cases however you could use the
method for "delimiting a URI in context
with the characters < and >
or with quotation marks,
as suggested in appendix C of
RFC 3986.
Word processors are usually capable of automatic and semi-automatic hyphenation, but web browsers aren’t. Currently the only way to make them do any word division (except breaking after a hyphen) is to use the soft hyphen character.
The soft hyphen can be entered by using the entity reference­. Alternatively,
it can be written as such, but beware that when editing
an HTML document, the soft hyphen is usually invisible.
On Windows, it can be entered
using Alt+0173.
There are different strategies you can apply:
The use of soft hyphens cannot deal with complexities in word
division, such as changes in the word when hyphenated
(see examples of this in
Use
of Soft Hyphen in UAX #14).
However, there as a special case that can be handled: in some languages
and styles, word division after a hyphen is done so that the next line
starts with a hyphen, while a hyphen also appears at the end of the first line.
E.g., the Polish word czerwono-niebieska would thus be
split as follows:
czerwono-
-niebieska.
According to the Unicode standard, the recommended practice
in such cases is
a soft hyphen followed by a non-breaking hyphen, instead of a single
normal hyphen. In HTML, it is safer to apply the idea by using a soft hyphen
followed by an Ascii hyphen with HTML markup that prevents a division
after the latter, e.g.
czerwono­<nobr>-nie</nobr>bieska.
word-wrap propertyAccording to Microsoft's documents,
IE supports, from version 5.5 onwards, a CSS property named
word-wrap, to affect situations where
some content exceeds the boundaries of its container.
(IE?8 introduced the synonym
-ms-word-wrap for this property.)
The value
normal is the normal behavior
(content exceeds the boundaries of its container) whereas
break-word means that
"Content wraps to next line, and a word-break occurs when necessary."
This formulation is misleading, since what really happens is
splitting a string as described above. So this has
nothing to do with normal processing of words of human
languages, and the string just breaks, with no indication of
its continuing on the next line.
The following example tests
whether your browser supports this feature, which is not in any
CSS specifications (though it is present in
the CSS3 Text
draft). The test has two div eleュments
with a fixed width
of 150 pixels and content that most probably exceeds the width,
and the second element has
word-wrap: break-word assigned to
it.
Some other browsers, such as as Firefox (3.1. and newer) and Safari,
have started supporting word-wrap to some extent.
This feature can be used to produce “emergency breaks” as the least of evils in some situations. Perhaps most often, it could be useful when code expressions like programming language variables or lines in block sample of code are too long to fit into the available space. The presentation might be understandable to the user especially if accompanied with a note like “long lines in code samples may have been split for technical reasons”. Unfortunately, this does work too well: the setting may make browsers primarily break after spaces and some special characters like “/”.
<script src=
"//translate.google.com/translate_a/element.js?cb=googleSectionalElementInit&ug=section&hl=en">
<script>
In a technical context, if a paragraph contains a long
variable name like
super
and its font face or color or other property distinguishes it from
normal text, the user probably sees it as one identifier.
Hyphenation would not be suitable, as the hyphen could be misinterpreted
as being part of the identifier. Unfortunately,
white-space: pre-wrap
is not effective in such cases, unless the column is so narrow that a word
does not fit into one line even if it is alone there.
Instead, you need to scatter wbr tags around in the string.
You can of course specify explicit line breaks using
br markup
(or pre markup).
This is often the best way to deal with the problem at hand.
However, it is rather inflexible when the long strings occur within
running text - where you would like to have the browser do its
normal formatting so that the presentation adjusts to the available
canvas width but so that line breaks may occur at some points inside
the problematic strings.
wbr ("word break") wbrAccording to Netscape's old HTML Tag Reference ,
The WBR tag marks a place where a line break can take place. It does not necessarily always result in a line break; rather, it says that line breaks are allowed at this place. You could use it, for example, inside a NOBR tag to allow a line break. Navigator 1.1
The description is vague, but actual implementations treat wbr
as allowing a "string split" rather than
normal word division, i.e. no hyphen is introduced.
wbrThe following markup samples illustrate possible uses
of wbr:
Enter the command View/<wbr>Preferences/<wbr>Display/<wbr>Common.
We use the pseudoelement <wbr>:before.
Check your <wbr>.htaccess file.
This is an "example" <wbr>(really).
Moreover,
wbr could be used to suggest possible break points
using markup like
<http://www.hut.fi/<wbr>u/<wbr>jkorpela/<wbr>html/<wbr>nobr.html>.
Here is a more complicated example as displayed on your browser:
<http://. It's probably a good idea to use <wbr>
only after such characters that visually suggest that there might be something
that follows, such as a solidus, an equals sign, or an ampersand.
One might consider using nobr and wbr
together to divide text into chunks which may not be split across
lines but so that line breaks between the chunks are permitted.
But there does not seem to be any particular reason to use wbr
for such purposes.
You can usually just put each chunk into a nobr
element of its own, with normal spaces or line breaks between them of
course.
wbrThe wbr element is empty (no end tag is used)
and it has no attributes. That is, it is always just the tag
<wbr>.
It is comparable to
<br> except that
<br> forces a line break in rendering whereas
<wbr> allows a line break.
Should you wish to use it in XHTML, you would have to use the
XML convention for empty elements, i.e.
<wbr />.
wbr (historical)Other browsers such as IE
have support to wbr, too.
However, reports on browser support are often misleading.
E.g., Opera was claimed to support it long before it actually did.
On the other hand,
in many cases where you would use wbr, this does not
matter, since
Opera automatically splits long strings e.g.
after a "/" character.
It seems that around version 4.5 or so,
Netscape itself
dropped support to wbr.
The support
was restored in Netscape 6, later dropped again!
Anyway,
this is ancient history now, since Netscape browsers
have very little users, and support to wbr
exists in Mozilla Firefox.
Moreover, as if this were not mad enough, it seems that on IE 5,
deviating from IE 4 and IE 6,
wbr works only inside nobr markup, so
you would have to use it too, around the string. But then Opera
would treat the string as unbreakable, since it recognizes
nobr but not wbr.
Sufficiently new versions of Opera will
handle wbr in the intended way, if you use the following
style sheet, which effectively tells the browser to insert U+200B,
i.e. ZWSP after each
wbr element:
wbr:after { content: "00200円B"; }Newest versions of Opera seem to support wbr
natively.
For some odd reason, IE 8 stopped supporting
wbr in “standards
mode”, so now some additional trickery may be needed.
IE 8 does not even recognize the wbr element in
“standards
mode,” so it does not help to assign a CSS rule for that
element. Instead, we need to use extra markup that introduces
an empty element, for which we can add ZWSP as generated content.
(If we used ZWSP as character data in document content, we would
have problems with older browsers, as described later.)
For example, to allow a line break after
“/” in
“foo/bar”,
we could write e.g.
foo/<wbr><a class="wbr"></a>bar
and use the CSS rule
.wbr:after { content: "00200円B"; }
The use of the a element instead of the
span element here is just a convenience:
shorter element name. As such,
without any attributes except class,
the a element
is semantically empty.
Alternatively, you can use a
trick for setting IE?8 to IE?7 emulation mode
as described previously.
It makes the wbr work, though it implies restrictions.
You might also consider ignoring this problem, if it does not result in too bad formatting in your case. This problem does not seem to exist in IE?9.
Note that you should not
use wbr for hyphenation hints inside words,
since no hyphen is introduced if a browser breaks a line
due to wbr.
Using wbr for suggested line break
possibilities after hyphens, e.g. in words like
bird-bird-<wbr>cage)
is probably safe. Since IE by default treats hyphens as allowing
a line break after them, this would have no effect on IE,
but it may help on other browsers.
Since wbr is not in any HTML specifications,
it causes validation problems just as nobr
does. To create a customized DTD that allows wbr, you can
add WBR to the definition of %special
and define WBR analogously with BR.
There's a modified "loose" DTD for the purpose at
<http:/
Theoretically, you could suggest additional line break points
in strings – not as hyphenation hints but as possibilies
for breaking e.g. a URL into two lines,
without introducing a hyphen – using special
characters.
Unicode Technical Report #14:
Line Breaking Properties specifies that the
zero width space character (ZWSP,
U+200B , character reference in HTML:
​)
does not have width and
"it is used to enable additional (invisible) break opportunities
wherever space cannot be used".
However, support to this character in browsers is still
limited, and attempts to use it may cause confusion,
since some old browsers just display
a generic symbol for an undisplayable character
in place of it.
Just for demonstration, here is how your browser displays a previous example
modified to
use ZWSP instead of wbr markup:
<http://www.hut.fi/u/jkorpela/html/nobr.html>.
Internet Explorer 6 recognizes the character in the
sense of treating it so that a line break is allowed after it.
But what it displays depends on the font in use and is typically
a small box or a bullet, which is rather enigmatic.
The problem is that IE does not simply suppress the character
in rendering but uses the current font. The only commonly used fonts
that have a glyph for ZWSP are
Arial Unicode MS and Lucida Sans Unicode.
In IE 7, the problem was fixed:
the browser effectively treats ZWSP as an invisible control character.
Modern browsers generally handle ZWSP well, and it can be expected
to work in the great majority of situations. Perhaps it might even be
expected to work more often than the wbr
markup.
The HTML 4 specification mentions ZWSP in the section White space . It defines ZWSP as a white space character but leaves it rather open how ZWSP should or might be treated in visual presentation. Note that the XHTML specification also discusses ZWSP, in section User Agent Conformance . It seems that those specifications treat ZWSP as a character to be used in some writing systems only, e.g. to denote grammatic boundaries, rather than as in the general meaning defined in the Unicode report: "This character does not have width. It is used to enable additional (invisible) break opportunities wherever SPACE cannot be used."
Several tricky ways to fool browsers into treating a string as breakable have been suggested, i.e. inserting a very small image or a space character in a very small font. Such ways can be regarded as simulations of zero-width spaces: characters or elements that are taken as separating words but have no width.
Some of them work
some of the time, causing confusion when they don't, and
if they work, they mostly work on browsers where the
wbr method
or the ZWSP method works, too.
Using
a transparent single-pixel GIF and the markup
<img src="transp.gif" width="0" height="0" alt="">
between two characters has been observed to work on many
browsers but to fail on some.
Tricks based on using a space in reduced font size are unreliable, because some modern browsers let the user specify a minimum font size and many people do so, to avoid illegibly small test.
A better alternative is to set the width of a space to zero.
You would use markup like
<a class=s> </a> and
a CSS rule like
.s { display: inline-block; width: 0; }.
Old versions of IE do not recognize
inline-block, but they still set the width to zero,
since they apply the `width` property to inline elements, too.
In addition to permitting line breaks where you wouldn't expect them, the Unicode line breaking rules prevent line breaks in intuitively odd situations. For example, if a word is followed by a string that begins with a full stop (period, "."), the rules prohibit a break between the words. And IE (at least versions IE 6 and IE?7) behaves that way, as we already noted.
For example, a sequence of strings like ".foo" is unbreakable on IE, typically creating horizontal scrolling. Such strings are rare, but may arise in special situations.
The quick cure, if wrapping is important, is to insert the wbr tag
before each "word" that begins with a period
(.foo <wbr>.foo <wbr>.foo ...).
The theoretically correct, standards-conforming solution would be to
use zero-width spaces, ​, but it both fails in most browsing
situations and usually fails miserably when it does not work.
Getting even more theoretical if possible, the full stop (period) character "." belongs to line breaking class IS, "infix separator (numeric)" according to Unicode rules, and this means that it prevents a line break before it (as well as after it when followed by a numeric character). The effect of a space is described so that a break is normally permitted after a space (and a space at the end of a line is then effectively treated as nonexistent). So in "foo .text" the space permits a break after it, the period prohibits a break before it, and spending just a few hours reading the Unicode Standard annex #14 you might find out that the latter wins. I'm just kidding, it says this rather explicitly in rule LB 8: Don't break before a character in class IS even after spaces.
A similar case is a closing quotation mark followed by whitespace and an opening parenthesis. IE does not break at whitespace in such cases, though this can be fixed as outlined above.
This text containing “English style” (typographer’s) quotation marks does not wrap on IE. The browser treats the closing quote and the opening parenthesis as tied together even over a space.
This text containing “English style” typographer’s quotation marks wraps properly. It does not contain parentheses, so it does not exhibit the problem discussed here.
This text containing
“English style” <wbr>,
between the closing quote and the opening parenthesis.
HTML specifications are silent about Unicode line breaking rules, for the most of it. The specifications generally refer to Unicode as a basis for character definitions, so one might say that the spirit is that those rules should be obeyed. And this is what browsers have started doing, unfortunately.