HTML: Charset and Encoding

By Xah Lee. Date: . Last updated: .

What is HTML Charset

HTML charset is a set of allowed characters and character encoding specification.

In HTML, you can declare the charset for the file, inside the head tag, like this:

<head>
<meta charset="utf-8" />
</head>

Declare Charset in HTML 4

For HTML 4, use this:

<meta http-equiv="Content-Type" content="text/html;charset=utf-8">

HTML/HTTP Charset is About Encoding, Not Character Set

HTTP's definition of charset (and the charset meta tag in HTML) is actually about character encoding.

Here is a excerpt:

[画像:rfc 2616 charset 2022年10月23日 VdJDs]
RFC 2616 @ http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.4

What is HTML4 or HTML5's Default Encoding?

By spec, there is no default encoding.

A encoding specification must came from one of:

  • The header line in HTTP Protocol.
  • The meta tag in html file. If none found, the browser makes a guess.
[画像:html5 whatwg char encoding 2019年06月07日 p38s7]
html5 whatwg char encoding 2019年06月07日 [source https://html.spec.whatwg.org/multipage/parsing.html#determining-the-character-encoding]

Reference

HTML 4 Default Charset, Encoding, and Declaration

[画像:W3C charset 2024年09月04日 YQZwj]
W3C: Internationalization: Document Character Set @ http://www.w3.org/International/questions/qa-doc-charset

How User Agent should determine the character encoding

[画像:W3C HTML 4.01 charset 2024年09月04日 4Xp8T]
HTML 4.01 Specification: 5 HTML Document Representation @ http://www.w3.org/TR/html401/charset.html#idx-character_encoding-6

AltStyle によって変換されたページ (->オリジナル) /