Unicode|Common problem and resolution
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 10: ordinal not in range(128)
Jeff Epler jepler at unpythonic.netFri Oct 8 01:54:13 CEST 2004
- Previous message: UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 10: ordinal not in range(128)
- Next message: UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 10: ordinal not in range(128)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
If you compare a unicode string to a byte string, and the byte-string
has byte values >127, you will get an error like this:
>>> u'a' == '\xc0'
Traceback (most recent call last):
File "", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0: ordinal not in range(128)
There is no sensible way for Python to perform this comparison, because
the byte string '\xc0' could be in any encoding. If the encoding of the
byte string is latin-1, it's LATIN CAPITAL LETTER A WITH GRAVE. If it's
koi8-r encoded, it's CRYILLIC SMALL LETTER YU. Python refuses to guess
in this case.
It doesn't matter whether the unicode string contains any characters
that are non-ASCII characters.
To correct your function, you'll have to know what encoding the byte
string is in, and convert it to unicode using the decode() method,
and compare that result to the unicode string.
Jeff
No comments:
Post a Comment