HTML 5

A vocabulary and associated APIs for HTML and XHTML

← 8. The HTML syntax – Table of contents – 8.2.3. 8.2.4 Tokenisation →

(削除) 8.2. (削除ここまで) (追記) 8.2 (追記ここまで) Parsing HTML documents

This section only applies to user agents, data mining tools, and conformance checkers.

The rules for parsing XML documents (and thus XHTML documents) into DOM trees are covered by the XML and Namespaces in XML specifications, and are out of scope of this specification. [XML] [XMLNS]

For HTML documents , user agents must use the parsing rules described in this section to generate the DOM trees. Together, these rules define what is referred to as the HTML parser .

While the HTML form of HTML5 bears a close resemblance to SGML and XML, it is a separate language with its own parsing rules.

Some earlier versions of HTML (in particular from HTML2 to HTML4) were based on SGML and used SGML parsing rules. However, few (if any) web browsers ever implemented true SGML parsing for HTML documents; the only user agents to strictly handle HTML as an SGML application have historically been validators. The resulting confusion — with validators claiming documents to have one representation while widely deployed Web browsers interoperably implemented a different representation — has wasted decades of productivity. This version of HTML thus returns to a non-SGML basis.

Authors interested in using SGML tools in their authoring pipeline are encouraged to use XML tools and the XML ~~(削除) serialisation (削除ここまで)~~ (追記) serialization (追記ここまで) of HTML5.

This specification defines the parsing rules for HTML documents, whether they are syntactically ~~(削除) valid (削除ここまで)~~ (追記) correct (追記ここまで) or not. Certain points in the parsing algorithm are said to be parse errors . The error handling for parse errors is well-defined: user agents must either act as described below when encountering such problems, or must abort processing at the first error that they encounter for which they do not wish to apply the rules described below.

Conformance checkers must report at least one parse error condition to the user if one or more parse error conditions exist in the document and must not report parse error conditions if none exist in the document. Conformance checkers may report more than one parse error condition if more than one parse error conditions exist in the document. Conformance checkers are not required to recover from parse errors.

Parse errors are only errors with the syntax of HTML. In addition to checking for parse errors, conformance checkers will also verify that the document obeys all the other conformance requirements described in this specification.

(削除) 8.2.1. (削除ここまで) (追記) 8.2.1 (追記ここまで) Overview of the parsing model

The input to the HTML parsing process consists of a stream of Unicode characters, which is passed through a tokenisation stage (lexical analysis) followed by a tree construction stage (semantic analysis). The output is a Document object.

Implementations that do not support scripting do not have to actually create a DOM Document object, but the DOM tree in such cases is still used as the model for the rest of the specification.

In the common case, the data handled by the tokenisation stage comes from the network, but it can also come from script , e.g. using the document.write() API.

There is only one set of state for the tokeniser stage and the tree construction stage, but the tree construction stage is reentrant, meaning that while the tree construction stage is handling one token, the tokeniser might be resumed, causing further tokens to be emitted and processed before the first token's processing is complete.

In the following example, the tree construction stage will be called upon to handle a "p" start tag token while handling the "script" start tag token:

...
<script>
 document.write('<p>');
</script>
...

(削除) 8.2.2. (削除ここまで) (追記) 8.2.2 (追記ここまで) The input stream

The stream of Unicode characters that consists the input to the tokenisation stage will be initially seen by the user agent as a stream of bytes (typically coming over the network or from the local file system). The bytes encode the actual characters according to a particular character encoding , which the user agent must use to decode the bytes into characters.

For XML documents, the algorithm user agents must use to determine the character encoding is given by the XML specification. This section does not apply to XML documents. [XML]

8.2.2.1. Determining the character encoding

In some cases, it might be impractical to unambiguously determine the encoding before parsing the document. Because of this, this specification provides for a two-pass mechanism with an optional pre-scan. Implementations are allowed, as described below, to apply a simplified parsing algorithm to whatever bytes they have available before beginning to parse the document. Then, the real parser is started, using a tentative encoding derived from this pre-parse and other out-of-band metadata. If, while the document is being loaded, the user agent discovers an encoding declaration that conflicts with this information, then the parser can get reinvoked to perform a parse of the document with the real encoding.

User agents must use the following algorithm (the encoding sniffing algorithm ) to determine the character encoding to use when decoding a document in the first pass. This algorithm takes as input any out-of-band metadata available to the user agent (e.g. the Content-Type metadata of the document) and all the bytes available so far, and returns an encoding and a confidence . The confidence is either tentative or certain . The encoding used, and whether the confidence in that encoding is tentative or confident , is used during the parsing to determine whether to change the encoding .

If the transport layer specifies an encoding, return that encoding with the confidence certain , and abort these steps.
The user agent may wait for more bytes of the resource to be available, either in this step or at any later step in this algorithm. For instance, a user agent might wait 500ms or 512 bytes, whichever came first. In general preparsing the source to find the encoding improves performance, as it reduces the need to throw away the data structures used when parsing upon finding the encoding information. However, if the user agent delays too long to obtain data to determine the encoding, then the cost of the delay could outweigh any performance improvements from the preparse.

For each of the rows in the following table, starting with the first one and going down, if there are as many or more bytes available than the number of bytes in the first column, and the first bytes of the file match the bytes given in the first column, then return the encoding given in the cell in the second column of that row, with the confidence certain , and abort these steps:

Bytes in Hexadecimal	~~(削除) Description (削除ここまで)~~ (追記) Encoding (追記ここまで)
FE FF	UTF-16BE ~~(削除) BOM (削除ここまで)~~
FF FE	UTF-16LE ~~(削除) BOM (削除ここまで)~~
EF BB BF	UTF-8 ~~(削除) BOM (削除ここまで)~~

(追記) This step looks for Unicode Byte Order Marks (BOMs). (追記ここまで)

Otherwise, the user agent will have to search for explicit character encoding information in the file itself. This should proceed as follows:

Let position be a pointer to a byte in the input stream, initially pointing at the first byte. If at any point during these substeps the user agent either runs out of bytes or decides that scanning further bytes would not be efficient, then skip to the next step of the overall character encoding detection algorithm. User agents may decide that scanning any bytes is not efficient, in which case these substeps are entirely skipped.

Now, repeat the following "two" steps until the algorithm aborts (either because user agent aborts, as described above, or because a character encoding is found):
1. If position points to:
  A sequence of bytes starting with: 0x3C 0x21 0x2D 0x2D (ASCII ' ' sequence) and comes after the 0x3C byte that was found. (The two 0x2D bytes can be the same as the those in the '<!--' sequence.)
  
  A sequence of bytes starting with: 0x3C, 0x4D or 0x6D, 0x45 or 0x65, 0x54 or 0x74, 0x41 or 0x61, and finally one of 0x09, 0x0A, 0x0B, 0x0C, 0x0D, ~~(削除) 0x20 (削除ここまで)~~ (追記) 0x20, 0x2F (追記ここまで) (case-insensitive ASCII '<meta' followed by a ~~(削除) space) (削除ここまで)~~ (追記) space or slash) (追記ここまで)
  1. Advance the position pointer so that it points at the next 0x09, 0x0A, 0x0B, 0x0C, 0x0D, (追記) 0x20, (追記ここまで) or ~~(削除) 0x20 (削除ここまで)~~ (追記) 0x2F (追記ここまで) byte (the one in sequence of characters matched above).
  2. Get an attribute and its value. If no attribute was sniffed, then skip this inner set of steps, and jump to the second step in the overall "two step" algorithm.
  3. ~~(削除) Examine the attribute's name: If it is 'charset' (削除ここまで)~~ If the attribute's ~~(削除) value (削除ここまで)~~ (追記) name (追記ここまで) is ~~(削除) a supported character encoding, (削除ここまで)~~ (追記) neither " (追記ここまで) (追記) charset (追記ここまで) (追記) " nor " (追記ここまで) (追記) content (追記ここまで) (追記) ", (追記ここまで) then return ~~(削除) the given encoding, with confidence tentative , and abort all (削除ここまで)~~ (追記) to step 2 in (追記ここまで) these (追記) inner (追記ここまで) steps. ~~(削除) Otherwise, do nothing with this attribute, and continue looking for other attributes. (削除ここまで)~~
  4. If ~~(削除) it (削除ここまで)~~ (追記) the attribute's name (追記ここまで) is ~~(削除) 'content' (削除ここまで)~~ (追記) " (追記ここまで) (追記) charset (追記ここまで) (追記) ", let (追記ここまで) (追記) charset (追記ここまで) (追記) be the attribute's value, interpreted as a character encoding. (追記ここまで)
  5. ~~(削除) The (削除ここまで)~~ (追記) Otherwise, the (追記ここまで) attribute's ~~(削除) value (削除ここまで)~~ (追記) name (追記ここまで) is ~~(削除) now parsed. Apply (削除ここまで)~~ (追記) " (追記ここまで) (追記) content (追記ここまで) (追記) ": apply (追記ここまで) the algorithm for extracting an encoding from a Content-Type , giving the attribute's value as the string to parse. If an encoding ~~(削除) was (削除ここまで)~~ (追記) is (追記ここまで) returned, ~~(削除) and (削除ここまで)~~ (追記) let (追記ここまで) (追記) charset (追記ここまで) (追記) be that encoding. Otherwise, return to step 2 in these inner steps. (追記ここまで)
  6. (追記) If (追記ここまで) (追記) charset (追記ここまで) (追記) is a UTF-16 encoding, change (追記ここまで) it (追記) to UTF-8. (追記ここまで)
  7. (追記) If (追記ここまで) (追記) charset (追記ここまで) is ~~(削除) the name of (削除ここまで)~~ a supported character encoding, then return ~~(削除) that (削除ここまで)~~ (追記) the given (追記ここまで) encoding, with ~~(削除) the (削除ここまで)~~ confidence tentative , and abort all these steps.
  8. ~~(削除) Otherwise, skip this 'content' attribute and continue on with any other attributes. Any other name (削除ここまで)~~
    ~~(削除) Do nothing with that attribute. Return (削除ここまで)~~ (追記) Otherwise, return (追記ここまで) to step ~~(削除) 1 (削除ここまで)~~ (追記) 2 (追記ここまで) in these inner steps.
  A sequence of bytes starting with a 0x3C byte (ASCII '<'), optionally a 0x2F byte (ASCII '/'), and finally a byte in the range 0x41-0x5A or 0x61-0x7A (an ASCII letter)
  1. Advance the position pointer so that it points at the next 0x09 (ASCII TAB), 0x0A (ASCII LF), 0x0B (ASCII VT), 0x0C (ASCII FF), 0x0D (ASCII CR), 0x20 (ASCII space), (追記) or (追記ここまで) 0x3E (ASCII '> ~~(削除) '), 0x3C (ASCII '<') (削除ここまで)~~ (追記) ') (追記ここまで) byte.
  2. ~~(削除) If the pointer points to a 0x3C (ASCII '<') byte, then return to the first step in the overall "two step" algorithm. (削除ここまで)~~ Repeatedly get an attribute until no further attributes can be found, then jump to the second step in the overall "two step" algorithm.
  A sequence of bytes starting with: 0x3C 0x21 (ASCII '<!')
  
  A sequence of bytes starting with: 0x3C 0x2F (ASCII '</')
  
  A sequence of bytes starting with: 0x3C 0x3F (ASCII '<?')
  
  Advance the position pointer so that it points at the first 0x3E byte (ASCII '> ') that comes after the 0x3C byte that was found.
  
  Any other byte
  
  Do nothing with that byte.
2. Move position so it points at the next byte in the input stream, and return to the first step of this "two step" algorithm.
When the above "two step" algorithm says to get an attribute , it means doing this:
1. If the byte at position is one of 0x09 (ASCII TAB), 0x0A (ASCII LF), 0x0B (ASCII VT), 0x0C (ASCII FF), 0x0D (ASCII CR), 0x20 (ASCII space), or 0x2F (ASCII '/') then advance position to the next byte and ~~(削除) start over. If the byte at position is 0x3C (ASCII '<'), then move position back to the previous byte, and stop looking for an attribute. There isn't one. (削除ここまで)~~ (追記) redo this substep. (追記ここまで)
2. If the byte at position is 0x3E (ASCII '> '), then ~~(削除) stop looking for (削除ここまで)~~ (追記) abort the "get (追記ここまで) an ~~(削除) attribute. (削除ここまで)~~ (追記) attribute" algorithm. (追記ここまで) There isn't one.
3. Otherwise, the byte at position is the start of the attribute name. Let attribute name and attribute value be the empty string.
4. Attribute name : Process the byte at position as follows:
  
  If it is 0x3D (ASCII '='), and the attribute name is longer than the empty string
  
  Advance position to the next byte and jump to the step below ~~(削除) labelled (削除ここまで)~~ (追記) labeled (追記ここまで) value .
  
  If it is 0x09 (ASCII TAB), 0x0A (ASCII LF), 0x0B (ASCII VT), 0x0C (ASCII FF), 0x0D (ASCII CR), or 0x20 (ASCII space)
  
  Jump to the step below ~~(削除) labelled (削除ここまで)~~ (追記) labeled (追記ここまで) spaces .
  
  If it is 0x2F (ASCII ~~(削除) '/'), 0x3C (ASCII '<'), (削除ここまで)~~ (追記) '/') (追記ここまで) or 0x3E (ASCII ~~(削除) '>') (削除ここまで)~~ (追記) '> ') (追記ここまで)
  
  ~~(削除) Stop looking for (削除ここまで)~~ (追記) Abort the "get (追記ここまで) an ~~(削除) attribute. (削除ここまで)~~ (追記) attribute" algorithm. (追記ここまで) The attribute's name is the value of attribute name , its value is the empty string.
  
  If it is in the range 0x41 (ASCII 'A') to 0x5A (ASCII 'Z')
  
  Append the Unicode character with codepoint b +0x20 to attribute name (where b is the value of the byte at position ).
  
  Anything else
  
  Append the Unicode character with the same codepoint as the value of the byte at position ) to attribute name . (It doesn't actually matter how bytes outside the ASCII range are handled here, since only ASCII characters can contribute to the detection of a character encoding.)
5. Advance position to the next byte and return to the previous step.
6. Spaces. If the byte at position is one of 0x09 (ASCII TAB), 0x0A (ASCII LF), 0x0B (ASCII VT), 0x0C (ASCII FF), 0x0D (ASCII CR), or 0x20 (ASCII space) then advance position to the next byte, then, repeat this step.
7. If the byte at position is not 0x3D (ASCII '='), ~~(削除) stop looking for an attribute. Move position back to (削除ここまで)~~ (追記) abort (追記ここまで) the ~~(削除) previous byte. (削除ここまで)~~ (追記) "get an attribute" algorithm. (追記ここまで) The attribute's name is the value of attribute name , its value is the empty string.
8. Advance position past the 0x3D (ASCII '=') byte.
9. Value. If the byte at position is one of 0x09 (ASCII TAB), 0x0A (ASCII LF), 0x0B (ASCII VT), 0x0C (ASCII FF), 0x0D (ASCII CR), or 0x20 (ASCII space) then advance position to the next byte, then, repeat this step.
10. Process the byte at position as follows:
  If it is 0x22 (ASCII '"') or 0x27 ("'")
  1. Let b be the value of the byte at position .
  2. Advance position to the next byte.
  3. If the value of the byte at position is the value of b , then ~~(削除) stop looking for (削除ここまで)~~ (追記) advance (追記ここまで) (追記) position (追記ここまで) (追記) to the next byte and abort the "get (追記ここまで) an ~~(削除) attribute. (削除ここまで)~~ (追記) attribute" algorithm. (追記ここまで) The attribute's name is the value of attribute name , and its value is the value of attribute value .
  4. Otherwise, if the value of the byte at position is in the range 0x41 (ASCII 'A') to 0x5A (ASCII 'Z'), then append a Unicode character to attribute value whose codepoint is 0x20 more than the value of the byte at position .
  5. Otherwise, append a Unicode character to attribute value whose codepoint is the same as the value of the byte at position .
  6. Return to the second step in these substeps.
  If it is ~~(削除) 0x3C (ASCII '<'), or (削除ここまで)~~ 0x3E (ASCII ~~(削除) '>') (削除ここまで)~~ (追記) '> ') (追記ここまで)
  
  ~~(削除) Stop looking for (削除ここまで)~~ (追記) Abort the "get (追記ここまで) an ~~(削除) attribute. (削除ここまで)~~ (追記) attribute" algorithm. (追記ここまで) The attribute's name is the value of attribute name , its value is the empty string.
  
  If it is in the range 0x41 (ASCII 'A') to 0x5A (ASCII 'Z')
  
  Append the Unicode character with codepoint b +0x20 to attribute value (where b is the value of the byte at position ). (追記) Advance (追記ここまで) (追記) position (追記ここまで) (追記) to the next byte. (追記ここまで)
  
  Anything else
  
  Append the Unicode character with the same codepoint as the value of the byte at position ) to attribute value . (追記) Advance (追記ここまで) (追記) position (追記ここまで) (追記) to the next byte. (追記ここまで)
11. Process the byte at position as follows:
  
  If it is 0x09 (ASCII TAB), 0x0A (ASCII LF), 0x0B (ASCII VT), 0x0C (ASCII FF), 0x0D (ASCII CR), 0x20 (ASCII space), ~~(削除) 0x3C (ASCII '<'), (削除ここまで)~~ or 0x3E (ASCII ~~(削除) '>') (削除ここまで)~~ (追記) '> ') (追記ここまで)
  
  ~~(削除) Stop looking for (削除ここまで)~~ (追記) Abort the "get (追記ここまで) an ~~(削除) attribute. (削除ここまで)~~ (追記) attribute" algorithm. (追記ここまで) The attribute's name is the value of attribute name and its value is the value of attribute value .
  
  If it is in the range 0x41 (ASCII 'A') to 0x5A (ASCII 'Z')
  
  Append the Unicode character with codepoint b +0x20 to attribute value (where b is the value of the byte at position ).
  
  Anything else
  
  Append the Unicode character with the same codepoint as the value of the byte at position ) to attribute value .
12. Advance position to the next byte and return to the previous step.
For the sake of interoperability, user agents should not use a pre-scan algorithm that returns different results than the one described above. (But, if you do, please at least let us know, so that we can improve this algorithm and benefit everyone...)
If the user agent has information on the likely encoding for this page, e.g. based on the encoding of the page when it was last visited, then return that encoding, with the confidence tentative , and abort these steps.
The user agent may attempt to autodetect the character encoding from applying frequency analysis or other algorithms to the data stream. If autodetection succeeds in determining a character encoding, then return that encoding, with the confidence tentative , and abort these steps. [UNIVCHARDET]
Otherwise, return an implementation-defined or user-specified default character encoding, with the confidence tentative . (追記) In non-legacy environments, the more comprehensive (追記ここまで) (追記) UTF-8 (追記ここまで) (追記) encoding is recommended. (追記ここまで) Due to its use in legacy content, windows-1252 is recommended as a default in predominantly Western ~~(削除) demographics. In non-legacy environments, the more comprehensive UTF-8 encoding is recommended (削除ここまで)~~ (追記) demographics (追記ここまで) instead. Since these encodings can in many cases be distinguished by inspection, a user agent may heuristically decide which to use as a default.

(追記) The (追記ここまで) (追記) document's character encoding (追記ここまで) (追記) must immediately be set to the value returned from this algorithm, at the same time as the user agent uses the returned value to select the decoder to use for the input stream. (追記ここまで)

8.2.2.2. Character encoding requirements

User agents must at a minimum support the UTF-8 and Windows-1252 encodings, but may support more.

It is not unusual for Web browsers to support dozens if not upwards of a hundred distinct character encodings.

User agents must support the preferred MIME name of every character encoding they support that has a preferred MIME name, and should support all the IANA-registered aliases. [IANACHARSET]

When (追記) comparing a string specifying a character encoding with the name or alias of a character encoding to determine if they are equal, user agents must ignore the all characters in the ranges U+0009 to U+000D, U+0020 to U+002F, U+003A to U+0040, U+005B to U+0060, and U+007B to U+007E (all whitespace and punctuation characters in ASCII) in both names, and then perform the comparison case-insensitively. (追記ここまで)

(追記) For instance, "GB_2312-80" and "g.b.2312(80)" are considered equivalent names. (追記ここまで)

(追記) When (追記ここまで) a user agent would otherwise use (追記) an encoding given in (追記ここまで) the ~~(削除) ISO-8859-1 encoding, (削除ここまで)~~ (追記) first column of the following table, (追記ここまで) it must instead use the (追記) encoding given in the cell in the second column of the same row. Any bytes that are treated differently due to this encoding aliasing must be considered (追記ここまで) (追記) parse errors (追記ここまで) .

(追記) Character encoding overrides (追記ここまで)
(追記) Input encoding (追記ここまで)	(追記) Replacement encoding (追記ここまで)	(追記) References (追記ここまで)
(追記) EUC-KR (追記ここまで)	(追記) Windows-949 (追記ここまで)	(追記) [EUCKR] (追記ここまで) (追記) [WIN949] (追記ここまで)
(追記) GB2312 (追記ここまで)	(追記) GBK (追記ここまで)	(追記) [GB2312] (追記ここまで) (追記) [GBK] (追記ここまで)
(追記) GB_2312-80 (追記ここまで)	(追記) GBK (追記ここまで)	(追記) [RFC1345] (追記ここまで) (追記) [GBK] (追記ここまで)
(追記) ISO-8859-1 (追記ここまで)	Windows-1252 ~~(削除) encoding. (削除ここまで)~~	(追記) [RFC1345] (追記ここまで) (追記) [WIN1252] (追記ここまで)
(追記) ISO-8859-9 (追記ここまで)	(追記) Windows-1254 (追記ここまで)	(追記) [RFC1345] (追記ここまで) (追記) [WIN1254] (追記ここまで)
(追記) ISO-8859-11 (追記ここまで)	(追記) Windows-874 (追記ここまで)	(追記) [ISO885911] (追記ここまで) (追記) [WIN874] (追記ここまで)
(追記) KS_C_5601-1987 (追記ここまで)	(追記) Windows-949 (追記ここまで)	(追記) [RFC1345] (追記ここまで) (追記) [WIN949] (追記ここまで)
(追記) TIS-620 (追記ここまで)	(追記) Windows-874 (追記ここまで)	(追記) [TIS620] (追記ここまで) (追記) [WIN874] (追記ここまで)
(追記) x-x-big5 (追記ここまで)	(追記) Big5 (追記ここまで)	(追記) [BIG5] (追記ここまで)

~~(削除) This (削除ここまで)~~ (追記) The (追記ここまで) requirement (追記) to treat certain encodings as other encodings according to the table above (追記ここまで) is a willful violation of the W3C Character Model specification. [CHARMOD]

User agents must not support the CESU-8, UTF-7, BOCU-1 and SCSU encodings. [CESU8] [UTF7] [BOCU1] [SCSU]

Support for UTF-32 is not recommended. This encoding is rarely used, and frequently misimplemented.

(追記) This specification does not make any attempt to support UTF-32 in its algorithms; support and use of UTF-32 can thus lead to unexpected behavior in implementations of this specification. (追記ここまで)

8.2.2.3. Preprocessing the input stream

Given an encoding, the bytes in the input stream must be converted to Unicode characters for the tokeniser, as described by the rules for that encoding, except that (追記) the (追記ここまで) leading U+FEFF BYTE ORDER MARK ~~(削除) characters (削除ここまで)~~ (追記) character, if any, (追記ここまで) must not be stripped by the encoding ~~(削除) layer. (削除ここまで)~~ (追記) layer (it is stripped by the rule below). (追記ここまで)

Bytes or sequences of bytes in the original byte stream that could not be converted to Unicode characters must be converted to U+FFFD REPLACEMENT CHARACTER code points.

(追記) Bytes or sequences of bytes in the original byte stream that did not conform to the encoding specification (e.g. invalid UTF-8 byte sequences in a UTF-8 input stream) are errors that conformance checkers are expected to report. (追記ここまで)

One leading U+FEFF BYTE ORDER MARK character must be ignored if any are present.

All U+0000 NULL characters in the input must be replaced by U+FFFD REPLACEMENT CHARACTERs. Any occurrences of such characters is a parse error .

(追記) Any occurrences of any characters in the ranges U+0001 to U+0008, U+000E to U+001F, U+007F to U+009F, U+D800 to U+DFFF , U+FDD0 to U+FDDF, and characters U+FFFE, U+FFFF, U+1FFFE, U+1FFFF, U+2FFFE, U+2FFFF, U+3FFFE, U+3FFFF, U+4FFFE, U+4FFFF, U+5FFFE, U+5FFFF, U+6FFFE, U+6FFFF, U+7FFFE, U+7FFFF, U+8FFFE, U+8FFFF, U+9FFFE, U+9FFFF, U+AFFFE, U+AFFFF, U+BFFFE, U+BFFFF, U+CFFFE, U+CFFFF, U+DFFFE, U+DFFFF, U+EFFFE, U+EFFFF, U+FFFFE, U+FFFFF, U+10FFFE, and U+10FFFF are (追記ここまで) (追記) parse errors (追記ここまで) .(追記) (These are all control characters or permanently undefined Unicode characters.) (追記ここまで)

U+000D CARRIAGE RETURN (CR) characters, and U+000A LINE FEED (LF) characters, are treated specially. Any CR characters that are followed by LF characters must be removed, and any CR characters not followed by LF characters must be converted to LF characters. Thus, newlines in HTML DOMs are represented by LF characters, and there are never any CR characters in the input to the tokenisation stage.

The next input character is the first character in the input stream that has not yet been consumed . Initially, the next input character is the first character in the input.

The insertion point is the position (just before a character or just before the end of the input stream) where content inserted using document.write() is actually inserted. The insertion point is relative to the position of the character immediately after it, it is not an absolute offset into the input stream. Initially, the insertion point is ~~(削除) uninitialised. (削除ここまで)~~ (追記) uninitialized. (追記ここまで)

The "EOF" character in the tables below is a conceptual character representing the end of the input stream . If the parser is a script-created parser , then the end of the input stream is reached when an explicit "EOF" character (inserted by the document.close() method) is consumed. Otherwise, the "EOF" character is not a real character in the stream, but rather the lack of any further characters.

8.2.2.4. Changing the encoding while parsing

When the parser requires the user agent to change the encoding , it must run the following steps. This might happen if the encoding sniffing algorithm described above failed to find an encoding, or if it found an encoding that was not the actual encoding of the file.

If the new encoding is ~~(削除) UTF-16, (削除ここまで)~~ (追記) a UTF-16 encoding, (追記ここまで) change it to UTF-8.
If the new encoding is identical or equivalent to the encoding that is already being used to interpret the input stream, then set the confidence to confident and abort these steps. This happens when the encoding information found in the file matches what the encoding sniffing algorithm determined to be the encoding, and in the second pass through the parser if the first pass found that the encoding sniffing algorithm described in the earlier section failed to find the right encoding.
If all the bytes up to the last byte converted by the current decoder have the same Unicode interpretations in both the current encoding and the new encoding, and if the user agent supports changing the converter on the fly, then the user agent may change to the new converter for the encoding on the fly. Set the (追記) document's character (追記ここまで) encoding (追記) and the encoding used to convert the input stream (追記ここまで) to the new encoding, set the confidence to confident , and abort these steps.
Otherwise, navigate to the document again, with replacement enabled , (追記) and using the same (追記ここまで) (追記) source browsing context (追記ここまで) , but this time skip the encoding sniffing algorithm and instead just set the encoding to the new encoding and the confidence to confident . Whenever possible, this should be done without actually contacting the network layer (the bytes should be re-parsed from memory), even if, e.g., the document is marked as not being cacheable.

~~(削除) While (削除ここまで)~~

(追記) 8.2.3 (追記ここまで) (追記) Parse state (追記ここまで)

(追記) 8.2.3.1. (追記ここまで) (追記) The insertion mode (追記ここまで)

(追記) Initially (追記ここまで) the ~~(削除) invocation (削除ここまで)~~ (追記) insertion mode (追記ここまで) (追記) is " (追記ここまで) (追記) initial (追記ここまで) (追記) ". It can change to " (追記ここまで) (追記) before html (追記ここまで) (追記) ", " (追記ここまで) (追記) before head (追記ここまで) (追記) ", " (追記ここまで) (追記) in head (追記ここまで) (追記) ", " (追記ここまで) (追記) in head noscript (追記ここまで) (追記) ", " (追記ここまで) (追記) after head (追記ここまで) (追記) ", " (追記ここまで) (追記) in body (追記ここまで) (追記) ", " (追記ここまで) (追記) in table (追記ここまで) (追記) ", " (追記ここまで) (追記) in caption (追記ここまで) (追記) ", " (追記ここまで) (追記) in column group (追記ここまで) (追記) ", " (追記ここまで) (追記) in table body (追記ここまで) (追記) ", " (追記ここまで) (追記) in row (追記ここまで) (追記) ", " (追記ここまで) (追記) in cell (追記ここまで) (追記) ", " (追記ここまで) (追記) in select (追記ここまで) (追記) ", " (追記ここまで) (追記) in select in table (追記ここまで) (追記) ", " (追記ここまで) (追記) in foreign content (追記ここまで) (追記) ", " (追記ここまで) (追記) after body (追記ここまで) (追記) ", " (追記ここまで) (追記) in frameset (追記ここまで) (追記) ", " (追記ここまで) (追記) after frameset (追記ここまで) (追記) ", " (追記ここまで) (追記) after after body (追記ここまで) (追記) ", and " (追記ここまで) (追記) after after frameset (追記ここまで) (追記) " during the course (追記ここまで) of ~~(削除) this (削除ここまで)~~ (追記) the parsing, as described in the (追記ここまで) (追記) tree construction (追記ここまで) (追記) stage. The insertion mode affects how tokens are processed and whether CDATA blocks are supported. (追記ここまで)

(追記) Seven of these modes, namely " (追記ここまで) (追記) in head (追記ここまで) (追記) ", " (追記ここまで) (追記) in body (追記ここまで) (追記) ", " (追記ここまで) (追記) in table (追記ここまで) (追記) ", " (追記ここまで) (追記) in table body (追記ここまで) (追記) ", " (追記ここまで) (追記) in row (追記ここまで) (追記) ", " (追記ここまで) (追記) in cell (追記ここまで) (追記) ", and " (追記ここまで) (追記) in select (追記ここまで) (追記) ", are special, in that the other modes defer to them at various times. When the (追記ここまで) algorithm (追記) below says that the user agent is to do something " (追記ここまで) (追記) using the rules for (追記ここまで) (追記) the (追記ここまで) (追記) m (追記ここまで) (追記) insertion mode", where (追記ここまで) (追記) m (追記ここまで) (追記) is one of these modes, the user agent must use the rules described under that (追記ここまで) (追記) insertion mode (追記ここまで) (追記) 's section, but must leave the (追記ここまで) (追記) insertion mode (追記ここまで) (追記) unchanged (unless the rules in that section themselves switch the (追記ここまで) (追記) insertion mode (追記ここまで) (追記) ). (追記ここまで)

(追記) When the insertion mode is switched to " (追記ここまで) (追記) in foreign content (追記ここまで) (追記) ", the (追記ここまで) (追記) secondary insertion mode (追記ここまで) (追記) is also set. This secondary mode is used within the rules for the " (追記ここまで) (追記) in foreign content (追記ここまで) (追記) " mode to handle HTML (i.e. not foreign) content. (追記ここまで)

(追記) When the steps below require the UA to (追記ここまで) (追記) reset the insertion mode appropriately (追記ここまで) ,(追記) it means the UA must follow these steps: (追記ここまで)

(追記) Let (追記ここまで) (追記) last (追記ここまで) (追記) be false. (追記ここまで)
(追記) Let (追記ここまで) (追記) node (追記ここまで) (追記) be the last node in the (追記ここまで) (追記) stack of open elements (追記ここまで) .
(追記) If (追記ここまで) (追記) node (追記ここまで) (追記) is the first node in the stack of open elements, then set (追記ここまで) (追記) last (追記ここまで) (追記) to true and set (追記ここまで) (追記) node (追記ここまで) (追記) to the (追記ここまで) (追記) context (追記ここまで) (追記) element. ( (追記ここまで) (追記) fragment case (追記ここまで) (追記) ) (追記ここまで)
(追記) If (追記ここまで) (追記) node (追記ここまで) (追記) is a (追記ここまで) (追記) select (追記ここまで) (追記) element, then switch the (追記ここまで) (追記) insertion mode (追記ここまで) (追記) to " (追記ここまで) (追記) in select (追記ここまで) (追記) " and abort these steps. ( (追記ここまで) (追記) fragment case (追記ここまで) (追記) ) (追記ここまで)
(追記) If (追記ここまで) (追記) node (追記ここまで) (追記) is a (追記ここまで) (追記) td (追記ここまで) (追記) or (追記ここまで) (追記) th (追記ここまで) (追記) element and (追記ここまで) (追記) last (追記ここまで) is (追記) false, then switch the (追記ここまで) (追記) insertion mode (追記ここまで) (追記) to " (追記ここまで) (追記) in cell (追記ここまで) (追記) " and abort these steps. (追記ここまで)
(追記) If (追記ここまで) (追記) node (追記ここまで) (追記) is a (追記ここまで) (追記) tr (追記ここまで) (追記) element, then switch the (追記ここまで) (追記) insertion mode (追記ここまで) (追記) to " (追記ここまで) (追記) in row (追記ここまで) (追記) " and abort these steps. (追記ここまで)
(追記) If (追記ここまで) (追記) node (追記ここまで) (追記) is a (追記ここまで) (追記) tbody (追記ここまで) ,(追記) thead (追記ここまで) ,(追記) or (追記ここまで) (追記) tfoot (追記ここまで) (追記) element, then switch the (追記ここまで) (追記) insertion mode (追記ここまで) (追記) to " (追記ここまで) (追記) in table body (追記ここまで) (追記) " and abort these steps. (追記ここまで)
(追記) If (追記ここまで) (追記) node (追記ここまで) (追記) is a (追記ここまで) (追記) caption (追記ここまで) (追記) element, then switch the (追記ここまで) (追記) insertion mode (追記ここまで) (追記) to " (追記ここまで) (追記) in caption (追記ここまで) (追記) " and abort these steps. (追記ここまで)
(追記) If (追記ここまで) (追記) node (追記ここまで) (追記) is a (追記ここまで) (追記) colgroup (追記ここまで) (追記) element, then switch the (追記ここまで) (追記) insertion mode (追記ここまで) (追記) to " (追記ここまで) (追記) in column group (追記ここまで) (追記) " and abort these steps. ( (追記ここまで) (追記) fragment case (追記ここまで) (追記) ) (追記ここまで)
(追記) If (追記ここまで) (追記) node (追記ここまで) (追記) is a (追記ここまで) (追記) table (追記ここまで) (追記) element, then switch the (追記ここまで) (追記) insertion mode (追記ここまで) (追記) to " (追記ここまで) (追記) in table (追記ここまで) (追記) " and abort these steps. (追記ここまで)
(追記) If (追記ここまで) (追記) node (追記ここまで) (追記) is an element from the (追記ここまで) (追記) MathML namespace (追記ここまで) ,(追記) then switch the (追記ここまで) (追記) insertion mode (追記ここまで) (追記) to " (追記ここまで) (追記) in foreign content (追記ここまで) (追記) ", let the (追記ここまで) (追記) secondary insertion mode (追記ここまで) (追記) be " (追記ここまで) (追記) in body (追記ここまで) (追記) ", and abort these steps. (追記ここまで)
(追記) If (追記ここまで) (追記) node (追記ここまで) (追記) is a (追記ここまで) (追記) head (追記ここまで) (追記) element, then switch the (追記ここまで) (追記) insertion mode (追記ここまで) (追記) to " (追記ここまで) (追記) in body (追記ここまで) (追記) " (" (追記ここまで) (追記) in body (追記ここまで) (追記) "! (追記ここまで) not (追記) " (追記ここまで) (追記) in head (追記ここまで) (追記) " (追記ここまで) !)(追記) and abort these steps. ( (追記ここまで) (追記) fragment case (追記ここまで) (追記) ) (追記ここまで)
(追記) If (追記ここまで) (追記) node (追記ここまで) (追記) is (追記ここまで) a ~~(削除) parse error, (削除ここまで)~~ (追記) body (追記ここまで) (追記) element, then switch the (追記ここまで) (追記) insertion mode (追記ここまで) (追記) to " (追記ここまで) (追記) in body (追記ここまで) (追記) " and abort these steps. (追記ここまで)
(追記) If (追記ここまで) (追記) node (追記ここまで) (追記) is a (追記ここまで) (追記) frameset (追記ここまで) (追記) element, then switch the (追記ここまで) (追記) insertion mode (追記ここまで) (追記) to " (追記ここまで) (追記) in frameset (追記ここまで) (追記) " and abort these steps. ( (追記ここまで) (追記) fragment case (追記ここまで) (追記) ) (追記ここまで)
(追記) If (追記ここまで) (追記) node (追記ここまで) (追記) is an (追記ここまで) (追記) html (追記ここまで) (追記) element, then: if the (追記ここまで) (追記) head (追記ここまで) (追記) element pointer (追記ここまで) (追記) is null, switch the (追記ここまで) (追記) insertion mode (追記ここまで) (追記) to " (追記ここまで) (追記) before head (追記ここまで) (追記) ", otherwise, switch the (追記ここまで) (追記) insertion mode (追記ここまで) (追記) to " (追記ここまで) (追記) after head (追記ここまで) (追記) ". In either case, abort these steps. ( (追記ここまで) (追記) fragment case (追記ここまで) (追記) ) (追記ここまで)
(追記) If (追記ここまで) (追記) last (追記ここまで) (追記) is true, then switch the (追記ここまで) (追記) insertion mode (追記ここまで) (追記) to " (追記ここまで) (追記) in body (追記ここまで) (追記) " and abort these steps. ( (追記ここまで) (追記) fragment case (追記ここまで) (追記) ) (追記ここまで)
(追記) Let (追記ここまで) (追記) node (追記ここまで) (追記) now be the node before (追記ここまで) (追記) node (追記ここまで) (追記) in the (追記ここまで) (追記) stack of open elements (追記ここまで) .
(追記) Return to step 3. (追記ここまで)

(追記) 8.2.3.2. (追記ここまで) (追記) The stack of open elements (追記ここまで)

(追記) Initially the (追記ここまで) (追記) stack of open elements (追記ここまで) (追記) is empty. The stack grows downwards; the topmost node on the stack is the first one added to the stack, and the bottommost node of the stack is the most recently added node in the stack (notwithstanding when the stack is manipulated in a random access fashion as part of (追記ここまで) (追記) the handling for misnested tags (追記ここまで) (追記) ). (追記ここまで)

(追記) The " (追記ここまで) (追記) before html (追記ここまで) (追記) " (追記ここまで) (追記) insertion mode (追記ここまで) (追記) creates the (追記ここまで) (追記) html (追記ここまで) (追記) root element node, which is then added to the stack. (追記ここまで)

(追記) In the (追記ここまで) (追記) fragment case (追記ここまで) ,(追記) the (追記ここまで) (追記) stack of open elements (追記ここまで) (追記) is initialized to contain an (追記ここまで) (追記) html (追記ここまで) (追記) element that is created as part of (追記ここまで) (追記) that algorithm (追記ここまで) .(追記) (The (追記ここまで) (追記) fragment case (追記ここまで) (追記) skips the " (追記ここまで) (追記) before html (追記ここまで) (追記) " (追記ここまで) (追記) insertion mode (追記ここまで) .)

(追記) The (追記ここまで) (追記) html (追記ここまで) (追記) node, however (追記ここまで) it is ~~(削除) still indicative (削除ここまで)~~ (追記) created, is the topmost node (追記ここまで) of ~~(削除) non-conforming (削除ここまで)~~ (追記) the stack. It never gets popped off the stack. (追記ここまで)

(追記) The (追記ここまで) (追記) current node (追記ここまで) (追記) is the bottommost node in this stack. (追記ここまで)

(追記) The (追記ここまで) (追記) current table (追記ここまで) (追記) is the last (追記ここまで) (追記) table (追記ここまで) (追記) element in the (追記ここまで) (追記) stack of open elements (追記ここまで) ,(追記) if there is one. If there is no (追記ここまで) (追記) table (追記ここまで) (追記) element in the (追記ここまで) (追記) stack of open elements (追記ここまで) (追記) ( (追記ここまで) (追記) fragment case (追記ここまで) (追記) ), then the (追記ここまで) (追記) current table (追記ここまで) (追記) is the first element in the (追記ここまで) (追記) stack of open elements (追記ここまで) (追記) (the (追記ここまで) (追記) html (追記ここまで) (追記) element). (追記ここまで)

(追記) Elements in the stack fall into the following categories: (追記ここまで)

(追記) Special (追記ここまで): (追記) The following HTML elements have varying levels of special parsing rules: (追記ここまで) (追記) address (追記ここまで) ,(追記) area (追記ここまで) ,(追記) base (追記ここまで) ,(追記) basefont (追記ここまで) ,(追記) bgsound (追記ここまで) ,(追記) blockquote (追記ここまで) ,(追記) body (追記ここまで) ,(追記) br (追記ここまで) ,(追記) center (追記ここまで) ,(追記) col (追記ここまで) ,(追記) colgroup (追記ここまで) ,(追記) dd (追記ここまで) ,(追記) dir (追記ここまで) ,(追記) div (追記ここまで) ,(追記) dl (追記ここまで) ,(追記) dt (追記ここまで) ,(追記) embed (追記ここまで) ,(追記) fieldset (追記ここまで) ,(追記) form (追記ここまで) ,(追記) frame (追記ここまで) ,(追記) frameset (追記ここまで) ,(追記) h1 (追記ここまで) ,(追記) h2 (追記ここまで) ,(追記) h3 (追記ここまで) ,(追記) h4 (追記ここまで) ,(追記) h5 (追記ここまで) ,(追記) h6 (追記ここまで) ,(追記) head (追記ここまで) ,(追記) hr (追記ここまで) ,(追記) iframe (追記ここまで) ,(追記) img (追記ここまで) ,(追記) input (追記ここまで) ,(追記) isindex (追記ここまで) ,(追記) li (追記ここまで) ,(追記) link (追記ここまで) ,(追記) listing (追記ここまで) ,(追記) menu (追記ここまで) ,(追記) meta (追記ここまで) ,(追記) noembed (追記ここまで) ,(追記) noframes (追記ここまで) ,(追記) noscript (追記ここまで) ,(追記) ol (追記ここまで) ,(追記) optgroup (追記ここまで) ,(追記) option (追記ここまで) ,(追記) p (追記ここまで) ,(追記) param (追記ここまで) ,(追記) plaintext (追記ここまで) ,(追記) pre (追記ここまで) ,(追記) script (追記ここまで) ,(追記) select (追記ここまで) ,(追記) spacer (追記ここまで) ,(追記) style (追記ここまで) ,(追記) tbody (追記ここまで) ,(追記) textarea (追記ここまで) ,(追記) tfoot (追記ここまで) ,(追記) thead (追記ここまで) ,(追記) title (追記ここまで) ,(追記) tr (追記ここまで) ,(追記) ul (追記ここまで) ,(追記) and (追記ここまで) (追記) wbr (追記ここまで) .
(追記) Scoping (追記ここまで): (追記) The following HTML elements introduce new (追記ここまで) (追記) scopes (追記ここまで) (追記) for various parts of the parsing: (追記ここまで) (追記) applet (追記ここまで) ,(追記) button (追記ここまで) ,(追記) caption (追記ここまで) ,(追記) html (追記ここまで) ,(追記) marquee (追記ここまで) ,(追記) object (追記ここまで) ,(追記) table (追記ここまで) ,(追記) td (追記ここまで) (追記) and (追記ここまで) (追記) th (追記ここまで) .
(追記) Formatting (追記ここまで): (追記) The following HTML elements are those that end up in the (追記ここまで) (追記) list of active formatting elements (追記ここまで) :(追記) a (追記ここまで) ,(追記) b (追記ここまで) ,(追記) big (追記ここまで) ,(追記) em (追記ここまで) ,(追記) font (追記ここまで) ,(追記) i (追記ここまで) ,(追記) nobr (追記ここまで) ,(追記) s (追記ここまで) ,(追記) small (追記ここまで) ,(追記) strike (追記ここまで) ,(追記) strong (追記ここまで) ,(追記) tt (追記ここまで) ,(追記) and (追記ここまで) (追記) u (追記ここまで) .
(追記) Phrasing (追記ここまで): (追記) All other elements found while parsing an HTML document. (追記ここまで)

(追記) Still need to add these new elements to the lists: (追記ここまで) (追記) event-source (追記ここまで) ,(追記) section (追記ここまで) ,(追記) nav (追記ここまで) ,(追記) article (追記ここまで) ,(追記) aside (追記ここまで) ,(追記) header (追記ここまで) ,(追記) footer (追記ここまで) ,(追記) datagrid (追記ここまで) ,(追記) command (追記ここまで)

(追記) The (追記ここまで) (追記) stack of open elements (追記ここまで) (追記) is said to (追記ここまで) (追記) have an element in scope (追記ここまで) (追記) or (追記ここまで) (追記) have an element in (追記ここまで) (追記) table scope (追記ここまで) (追記) when the following algorithm terminates in a match state: (追記ここまで)

(追記) Initialise (追記ここまで) (追記) node (追記ここまで) (追記) to be the (追記ここまで) (追記) current node (追記ここまで) (追記) (the bottommost node of the stack). (追記ここまで)
(追記) If (追記ここまで) (追記) node (追記ここまで) (追記) is the target node, terminate in a match state. (追記ここまで)
(追記) Otherwise, if (追記ここまで) (追記) node (追記ここまで) (追記) is a (追記ここまで) (追記) table (追記ここまで) (追記) element, terminate in a failure state. (追記ここまで)
(追記) Otherwise, if the algorithm is the "has an element in scope" variant (rather than the "has an element in table scope" variant), and (追記ここまで) (追記) node (追記ここまで) (追記) is one of the following, terminate in a failure state: (追記ここまで)
- (追記) applet (追記ここまで)
- (追記) caption (追記ここまで)
- (追記) td (追記ここまで)
- (追記) th (追記ここまで)
- (追記) button (追記ここまで)
- (追記) marquee (追記ここまで)
- (追記) object (追記ここまで)
(追記) Otherwise, if (追記ここまで) (追記) node (追記ここまで) (追記) is an (追記ここまで) (追記) html (追記ここまで) (追記) element, terminate in a failure state. (This can only happen if the (追記ここまで) (追記) node (追記ここまで) (追記) is the topmost node of the (追記ここまで) (追記) stack of open elements (追記ここまで) ,(追記) and prevents the next step from being invoked if there are no more elements in the stack.) (追記ここまで)
(追記) Otherwise, set (追記ここまで) (追記) node (追記ここまで) (追記) to the previous entry in the (追記ここまで) (追記) stack of open elements (追記ここまで) (追記) and return to step 2. (This will never fail, since the loop will always terminate in the previous step if the top of the stack is reached.) (追記ここまで)

(追記) Nothing happens if at any time any of the elements in the (追記ここまで) (追記) stack of open elements (追記ここまで) (追記) are moved to a new location in, or removed from, the (追記ここまで) (追記) Document (追記ここまで) (追記) tree. In particular, the stack is not changed in this situation. This can cause, amongst other strange effects, (追記ここまで) content (追記) to be appended to nodes that are no longer in the DOM. (追記ここまで)

(追記) In some cases (namely, when (追記ここまで) (追記) closing misnested formatting elements (追記ここまで) (追記) ), the stack is manipulated in a random-access fashion. (追記ここまで)

(追記) 8.2.3.3. (追記ここまで) (追記) The list of active formatting elements (追記ここまで)

(追記) Initially the (追記ここまで) (追記) list of active formatting elements (追記ここまで) (追記) is empty. It is used to handle mis-nested (追記ここまで) (追記) formatting element tags (追記ここまで) .

(追記) The list contains elements in the (追記ここまで) (追記) formatting (追記ここまで) (追記) category, and scope markers. The scope markers are inserted when entering (追記ここまで) (追記) applet (追記ここまで) (追記) elements, buttons, (追記ここまで) (追記) object (追記ここまで) (追記) elements, marquees, table cells, and table captions, and are used to prevent formatting from "leaking" into (追記ここまで) (追記) applet (追記ここまで) (追記) elements, buttons, (追記ここまで) (追記) object (追記ここまで) (追記) elements, marquees, and tables. (追記ここまで)

(追記) When the steps below require the UA to (追記ここまで) (追記) reconstruct the active formatting elements (追記ここまで) ,(追記) the UA must perform the following steps: (追記ここまで)

(追記) If there are no entries in the (追記ここまで) (追記) list of active formatting elements (追記ここまで) ,(追記) then there is nothing to reconstruct; stop this algorithm. (追記ここまで)
(追記) If the last (most recently added) entry in the (追記ここまで) (追記) list of active formatting elements (追記ここまで) (追記) is a marker, or if it is an element that is in the (追記ここまで) (追記) stack of open elements (追記ここまで) ,(追記) then there is nothing to reconstruct; stop this algorithm. (追記ここまで)
(追記) Let (追記ここまで) (追記) entry (追記ここまで) (追記) be the last (most recently added) element in the (追記ここまで) (追記) list of active formatting elements (追記ここまで) .
(追記) If there are no entries before (追記ここまで) (追記) entry (追記ここまで) (追記) in the (追記ここまで) (追記) list of active formatting elements (追記ここまで) ,(追記) then jump to step 8. (追記ここまで)
(追記) Let (追記ここまで) (追記) entry (追記ここまで) (追記) be the entry one earlier than (追記ここまで) (追記) entry (追記ここまで) (追記) in the (追記ここまで) (追記) list of active formatting elements (追記ここまで) .
(追記) If (追記ここまで) (追記) entry (追記ここまで) (追記) is neither a marker nor an element that is also in the (追記ここまで) (追記) stack of open elements (追記ここまで) ,(追記) go to step 4. (追記ここまで)
(追記) Let (追記ここまで) (追記) entry (追記ここまで) (追記) be the element one later than (追記ここまで) (追記) entry (追記ここまで) (追記) in the (追記ここまで) (追記) list of active formatting elements (追記ここまで) .
(追記) Perform a shallow clone of the element (追記ここまで) (追記) entry (追記ここまで) (追記) to obtain (追記ここまで) (追記) clone (追記ここまで) .(追記) [DOM3CORE] (追記ここまで)
(追記) Append (追記ここまで) (追記) clone (追記ここまで) (追記) to the (追記ここまで) (追記) current node (追記ここまで) (追記) and push it onto the (追記ここまで) (追記) stack of open elements (追記ここまで) (追記) so that it is the new (追記ここまで) (追記) current node (追記ここまで) .
(追記) Replace the entry for (追記ここまで) (追記) entry (追記ここまで) (追記) in the list with an entry for (追記ここまで) (追記) clone (追記ここまで) .
(追記) If the entry for (追記ここまで) (追記) clone (追記ここまで) (追記) in the (追記ここまで) (追記) list of active formatting elements (追記ここまで) (追記) is not the last entry in the list, return to step 7. (追記ここまで)

(追記) This has the effect of reopening all the formatting elements that were opened in the current body, cell, or caption (whichever is youngest) that haven't been explicitly closed. (追記ここまで)

(追記) The way this specification is written, the (追記ここまで) (追記) list of active formatting elements (追記ここまで) (追記) always consists of elements in chronological order with the least recently added element first and the most recently added element last (except for while steps 8 to 11 of the above algorithm are being executed, of course). (追記ここまで)

(追記) When the steps below require the UA to (追記ここまで) (追記) clear the list of active formatting elements up to the last marker (追記ここまで) ,(追記) the UA must perform the following steps: (追記ここまで)

(追記) Let (追記ここまで) (追記) entry (追記ここまで) (追記) be the last (most recently added) entry in the (追記ここまで) (追記) list of active formatting elements (追記ここまで) .
(追記) Remove (追記ここまで) (追記) entry (追記ここまで) (追記) from the (追記ここまで) (追記) list of active formatting elements (追記ここまで) .
(追記) If (追記ここまで) (追記) entry (追記ここまで) (追記) was a marker, then stop the algorithm at this point. The list has been cleared up to the last marker. (追記ここまで)
(追記) Go to step 1. (追記ここまで)

(追記) 8.2.3.4. (追記ここまで) (追記) The element pointers (追記ここまで)

(追記) Initially the (追記ここまで) (追記) head (追記ここまで) (追記) element pointer (追記ここまで) (追記) and the (追記ここまで) (追記) form (追記ここまで) (追記) element pointer (追記ここまで) (追記) are both null. (追記ここまで)

(追記) Once a (追記ここまで) (追記) head (追記ここまで) (追記) element has been parsed (whether implicitly or explicitly) the (追記ここまで) (追記) head (追記ここまで) (追記) element pointer (追記ここまで) (追記) gets set to point to this node. (追記ここまで)

(追記) The (追記ここまで) (追記) form (追記ここまで) (追記) element pointer (追記ここまで) (追記) points to the last (追記ここまで) (追記) form (追記ここまで) (追記) element that was opened and whose end tag has not yet been seen. It is used to make form controls associate with forms in the face of dramatically bad markup, for historical reasons. (追記ここまで)

(追記) 8.2.3.5. (追記ここまで) (追記) The scripting state (追記ここまで)

(追記) The (追記ここまで) (追記) scripting flag (追記ここまで) (追記) is set to "enabled" if the (追記ここまで) (追記) Document (追記ここまで) (追記) with which the parser is associated was (追記ここまで) (追記) with script (追記ここまで) (追記) when the parser was created, and "disabled" otherwise. (追記ここまで)