Question

132

Javascript unescape() vs. Python urllib.unquote()

rated 0 times [ 136] [ 4] / answers: 1 / hits: 6923 / 10 Years ago, fri, april 18, 2014, 12:00:00

From reading various posts, it seems like JavaScript's unescape() is equivalent to Pythons urllib.unquote(), however when I test both I get different results:

In browser console:

unescape('%u003c%u0062%u0072%u003e');

output: <br>

In Python interpreter:

import urllib

urllib.unquote('%u003c%u0062%u0072%u003e')

output: %u003c%u0062%u0072%u003e

I would expect Python to also return <br>. Any ideas as to what I'm missing here?

Thanks!

Answers

Only authorized users can answer the question. Please sign in first, or register a free account.

breap

Add To Favorites

Follow

Total Points: 606

Total Questions: 96

Total Answers: 108

Location: Djibouti

Member since Sun, Feb 27, 2022

2 Years ago

breap questions

1 Chain query/mutation calls with RTK Query using React Hooks

Thu, Jun 24, 21, 00:00, 3 Years ago

1 How to go to another screen in top tab navigation in react native

Wed, Nov 11, 20, 00:00, 4 Years ago

1 React Storybook not Recognizing a Story

Wed, Mar 18, 20, 00:00, 4 Years ago

1 How to call javascript Function in Oracle APEX on button click in dynamic action?

Tue, Nov 5, 19, 00:00, 5 Years ago

1 Issue with Discord.js avatarURL

Mon, Oct 7, 19, 00:00, 5 Years ago

View All

answered 10 Years ago cherish · Accepted Answer

%uxxxx is a non standard URL encoding scheme that is not supported by urllib.parse.unquote() (Py 3) / urllib.unquote() (Py 2).

It was only ever part of ECMAScript ECMA-262 3rd edition; the format was rejected by the W3C and was never a part of an RFC.

You could use a regular expression to convert such codepoints:

try:

    unichr  # only in Python 2

except NameError:

    unichr = chr  # Python 3



re.sub(r'%u([a-fA-F0-9]{4}|[a-fA-F0-9]{2})', lambda m: unichr(int(m.group(1), 16)), quoted)

This decodes both the %uxxxx and the %uxx form ECMAScript 3rd ed can decode.

Demo:

>>> import re

>>> quoted = '%u003c%u0062%u0072%u003e'

>>> re.sub(r'%u([a-fA-F0-9]{4}|[a-fA-F0-9]{2})', lambda m: chr(int(m.group(1), 16)), quoted)

'<br>'

>>> altquoted = '%u3c%u0062%u0072%u3e'

>>> re.sub(r'%u([a-fA-F0-9]{4}|[a-fA-F0-9]{2})', lambda m: chr(int(m.group(1), 16)), altquoted)

'<br>'

but you should avoid using the encoding altogether if possible.