Cyrillic text extraction in Python/Django -


i'm using urllib2 open russian website , extract text it. however, instead of coming out "Беллона" it's coming out "Áåëëîíà". what's easiest way around this?

figure out encoding webpage uses (probably utf-8 or iso 8859-5), , convert text unicode this:

ustring = unicode(read_string, encoding=...) 

if need determine encoding of webpage dynamically, see this answer.


Comments

Popular posts from this blog

jasper reports - Fixed header in Excel using JasperReports -

media player - Android: mediaplayer went away with unhandled events -

python - ('The SQL contains 0 parameter markers, but 50 parameters were supplied', 'HY000') or TypeError: 'tuple' object is not callable -