Sunday, May 12, 2024
 Popular · Latest · Hot · Upcoming
-3
rated 0 times [  4] [ 7]  / answers: 1 / hits: 31812  / 10 Years ago, fri, june 13, 2014, 12:00:00

I'm writing some javascript that processes website content. My efforts are being thwarted by SharePoint text editor's tendency to put the zero width space character in the text when the user presses backspace.
The character's unicode value is 8203, or B200 in hexadecimal. I've tried to use the default replace function to get rid of it. I've tried many variants, none of them worked:



var a = o​m; //the invisible character is between o and m

var b = a.replace(/u8203/g,'');
= a.replace(/uB200/g,'');
= a.replace(\uB200,'');


and so on and so forth. I've tried quite a few variations on this theme. None of these expressions work (tested in Chrome and Firefox) The only thing that works is typing the actual character in the expression:



var b = a.replace(​,''); //it's there, believe me


This poses potential problems. The character is invisible so that line in itself doesn't make sense. I can get around that with comments. But if the code is ever reused, and the file is saved using non-Unicode encoding, (or when it's deployed to SharePoint, there's not guarantee it won't mess up encoding) it will stop working. Is there a way to write this using the unicode notation instead of the character itself?



[My ramblings about the character]



In case you haven't met this character, (and you probably haven't, seeing as it's invisible to the naked eye, unless it broke your code and you discovered it while trying to locate the bug) it's a real a-hole that will cause certain types of pattern matching to malfunction. I've caged the beast for you:



[​] <- careful, don't let it escape.



If you want to see it, copy those brackets into a text editor and then iterate your cursor through them. You'll notice you'll need three steps to pass what seems like 2 characters, and your cursor will skip a step in the middle.


More From » regex

 Answers
57

The number in a unicode escape should be in hex, and the hex for 8203 is 200B (which is indeed a Unicode zero-width space), so:



var b = a.replace(/u200B/g,'');


Live Example:



var a = o​m; //the invisible character is between o and m
var b = a.replace(/u200B/g,'');
console.log(a.length = + a.length); // 3
console.log(a === 'om'? + (a === 'om')); // false
console.log(b.length = + b.length); // 2
console.log(b === 'om'? + (b === 'om')); // true

[#70586] Wednesday, June 11, 2014, 10 Years  [reply] [flag answer]
Only authorized users can answer the question. Please sign in first, or register a free account.
sonja

Total Points: 541
Total Questions: 113
Total Answers: 114

Location: Anguilla
Member since Sun, Jan 29, 2023
1 Year ago
;