If you’re not familiar with the principles of character encoding, read the prerequisite Dive Into HTML 5 section on the subject.
When you see issues with Character Encoding, it’s traditionally in the form of text on your page that looks like this: in Firefox or in IE.
Usually, those characters mean that the character encoding used on the page is either ambiguous (not specified), or incorrect. We can use Firefox to determine that Character Encoding of a web page (Right Click and go to View Page Info; or use the “Character Encoding” entry in the View menu). Check to make sure that the encoding reported by Firefox is the same encoding used in your IDE. For example, Eclipse 3.5 has a “Set Encoding” option in the Edit menu.
The reason most English alphabetic and numeric characters are consistent independent of character encoding is due to consistency in the lower characters in each encoding. The characters making up the ASCII character set (0-127) are the same as the lowest 128 characters of ISO-8859-1, UTF-8, and others.
Managing your character encodings gets trickier as you add more architectural layers to your application. For example, character encodings may differ in your database, the properties files used to configure your application (java.util.Properties uses ISO-8859-1 by default), or maybe the XML or JSON file you’re loading from an external API.
Ever heard of HTML character entities? That’s the primary reason they exist — as a sort of encoding independent reference to a particular character. So, for example, the Œ character does not exist in the ISO-8859-1 character set. To display this character in a document with ISO-8859-1 encoding, use the equivalent HTML character entity:
Œ. For an easier reference, check out this full table of HTML character entities. If using ISO-8859-1 for your HTML document, any entity above Unicode index 255 will need to be escaped. If you’re using UTF-8 encoding, HTML character entities shouldn’t be required.
Setting the Character Encoding
To specify the character encoding for any file, you can set a
Content-Type header by configuring your web server or application. Apache lets you easily set different default character encodings for each individual file extension (
.js for example). Using the
Content-Type header is the most full proof and efficient 1 method to serve content.
But without access to the Apache configuration, how do we specify the character encoding?
In the HTML File
Just add the
charset attribute. If not specified, the HTML document’s character encoding is used by default (specified in the
Content-Type header or the appropriate meta tag, for example:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>).
@charset at-rule). HTML files can do it (
For Dynamically Created Script Tags
script dataType. jQuery even provides a
scriptCharset option for wrapping the above method for changing the charset on a dynamic script tag. Be warned, the jQuery Ajax function uses two different methods to load external script files (as of version 1.4.2). If a same-domain request, it uses an
XMLHttpRequest. If a cross-domain request, it uses a dynamic
script tag. So the
scriptCharset jQuery option only applies to cross-domain requests. We’ll need some other method to mitigate our character encoding issues (or just use dynamic script tags).
For XMLHttpRequest Objects
Our saving grace would be the
overrideMimeType method, if it weren’t poetically unavailable in Internet Explorer. Using this method, we can override the mime type and character encoding.
// Raw characters var string = "ñó"; // HTML character entities var string = "ñó"; // Escaped to Latin var string = "xf1xf3"; // Escaped to Unicode var string = "u00f1u00f3";
The easiest way to preemptively solve a lot of character encoding issues is to use UTF-8 for everything, and configure your web server/application to serve the UTF-8