Cyrillic text extraction in Python/Django -
i'm using urllib2 open russian website , extract text it. however, instead of coming out "Беллона" it's coming out "Áåëëîíà". what's easiest way around this?
figure out encoding webpage uses (probably utf-8 or iso 8859-5), , convert text unicode this:
ustring = unicode(read_string, encoding=...)
if need determine encoding of webpage dynamically, see this answer.
Comments
Post a Comment