I've been experimenting with various bits of Java code trying to come up with something that will encode a string containing quotes, spaces and exotic Unicode characters and produce output that's identical to JavaScript's encodeURIComponent function.
My torture test string is: A B ±
If I enter the following JavaScript statement in Firebug:
encodeURIComponent('A B ± ');
—Then I get:
%22A%22%20B%20%C2%B1%20%22
Here's my little test Java program:
import java.io.UnsupportedEncodingException;
import java.net.URLEncoder;
public class EncodingTest
{
public static void main(String[] args) throws UnsupportedEncodingException
{
String s = A B ± ;
System.out.println(URLEncoder.encode returns
+ URLEncoder.encode(s, UTF-8));
System.out.println(getBytes returns
+ new String(s.getBytes(UTF-8), ISO-8859-1));
}
}
—This program outputs:
URLEncoder.encode returns %22A%22+B+%C2%B1+%22
getBytes returns A B ±
Close, but no cigar! What is the best way of encoding a UTF-8 string using Java so that it produces the same output as JavaScript's encodeURIComponent
?
EDIT: I'm using Java 1.4 moving to Java 5 shortly.