Parsing unescaped urls in django

Follow me for more content or contact for work opportunities:
Twitter / LinkedIn

Modern browsers escape urls automatically before sending them to the server, but what happens if your application serves http requests to clients that doesn't escape urls?

The answer is that can get unexpected results if you server works in Django (and probably in any python framework/application). That's because python's BaseHTTPServer.BaseHTTPRequestHandler handles urls according to standards, not from a human point of view.

Let's see with an example, consider the next request:

http://vaig.be/identify_myself?name=Marc Garcia&country=Catalonia

if you request it with a browser, it will escape the space in the url, so the server will get:

http://vaig.be/identify_myself?name=Marc%20Garcia&country=Catalonia

but what if the client uses, for example, python's urllib2.urlopen without escaping (using urllib.quote)? Of course it is a mistake, but you, as server side developer can't control your clients.

In that case the whole request that server receives is:

GET http://vaig.be/identify_myself?name=Marc Garcia&country=Catalonia HTTP/1.1

and after being processed (splitted) by python's BaseHTTPServer.BaseHTTPRequestHandler, what we'll get from django is:

request.method == 'GET' request.META['QUERY_STRING'] == 'name=Marc' request.META['SERVER_PROTOCOL'] == 'Garcia&country=Catalonia HTTP/1.1'

so our request.GET dictionary will look like:

request.GET == {'name': 'Marc'}

what is not the expected value (from a human point of view).

So, what we can do for avoiding this result is quite easy (and of course tricky), and is getting the GET values not from django request.GET dictionary but from the one returned by this function:

def _manual_GET(request):     if ' ' in request.META['SERVER_PROTOCOL']:         query_string = ' '.join(             [request.META['QUERY_STRING']] +             request.META['SERVER_PROTOCOL'].split(' ')[:-1]         )
        args = query_string.split('&')         result = {}         for arg in args:             key, value = arg.split('=', 1)             result[key] = value         return result     else:         return request.GET

Follow me for more content or contact for work opportunities:
Twitter / LinkedIn