Monday, May 20, 2024
 Popular · Latest · Hot · Upcoming
8
rated 0 times [  11] [ 3]  / answers: 1 / hits: 11112  / 10 Years ago, tue, april 1, 2014, 12:00:00

I have a page with UTF-8 header:



<meta charset=utf-8 />


And in the page I use the umbraco dictionary to fetch content in various languages.
When I print this in German on the page it appears fine:



<h1>@library.GetDictionaryItem(A)</h1>


resolves to:



<h1>Ä</h1> in German



However if I enter it via a script:



<script type=text/javascript charset=utf-8>
var a = @library.GetDictionaryItem(A);
alert(a);
</script>


The alert prints:



&#228;


If I do



<script type=text/javascript charset=utf-8>
var a = Ä;
alert(a);
</script>


The alert prints:



Ä


So what could explain this behaviour and how can I fix the alert?
As far as I can see everything is UTF-8 and the dictionary and the page encoding is fine. The problem happens within Javascript.



From what I can see from the table here, Javascript resolves the character into it's Numeric value. I used escape, encodeUrl, decodeUrl etc with no luck.



chr  HexCode  Numeric   HTML entity     escape(chr)  encodeURI(chr) 

ä xE4 &#228; &auml; %E4 %C3%A4

More From » utf-8

 Answers
7

(FWIW: Character entity &#228; is ä, not Ä.)



This has nothing to do with character encoding. You're outputting an HTML entity to a JavaScript string, and then asking the browser to display that JavaScript string without doing anything to interpret HTML (via alert). It's exactly as though you actually typed:



<h1>&#228;</h1>


...(which will show ä on the page), and



<script>
var a = &#228;;
alert(a);
</script>


...which won't. The HTML entity isn't being used anywhere that understands HTML entities. alert doesn't interpret HTML.



But if you did this:



<script>
var a = &#228;;
var div = document.createElement('div');
div.innerHTML = a;
document.body.appendChild(div);
</script>


...you'd see the character on the page, because we're giving the entity to something (innerHTML) that will interpret HTML. And so if you make that first line:



var a = @library.GetDictionaryItem(A);


...and then use a in an HTML context (as above), you'll get the ä in the document.



If you always get a decimal numeric character entity (like &#228;) from Umbraco, since those define unicode code points and JavaScript (mostly) uses unicode code points in its strings*, you can parse the entity easily enough:



function characterFromDecimalNumericEntity(str) {
var decNumEntRex = /^&#(d+);$/;
var match = decNumEntRex.exec(str);
var codepoint = match ? parseInt(match[1], 10) : null;
var character = codepoint ? String.fromCharCode(codepoint) : null;
return character;
}
alert(characterFromDecimalNumericEntity(&#228;)); // ä


Live Example



* Why mostly: JavaScript strings are made up of 16-bit characters that correspond to UTF-16 code units, not Unicode code points (you can't store a Unicode code point in 16 bits, you need 21). All characters from the Basic Multilingual Plane fit within one UTF-16 code unit, but characters from the Supplementary Multilingual Plane, Supplementary Ideographic Plane, and so on require two UTF-16 code units for a character. One of those characters will occupy two characters in a JavaScript string. The function above would fail for them. More in the JavaScript spec and the Unicode FAQ.


[#46386] Monday, March 31, 2014, 10 Years  [reply] [flag answer]
Only authorized users can answer the question. Please sign in first, or register a free account.
andreguym

Total Points: 125
Total Questions: 112
Total Answers: 103

Location: Wallis and Futuna
Member since Tue, Mar 30, 2021
3 Years ago
;