python - The most effective way to assign unique integer id to a string? -
the program write processes large number of objects, each own unique id, string of complicated structure (dozen of unique fields of object joined separator) , big length.
since have process lot of these objects fast , need reffer them id while processing , have no power change format (i retrieve them externally, network), want map complicated string id own internal integer id , further use comparison, transfering them further other processes, etc.
what i'm going use simple dict keys string id of object , integer values internal integer id of it.
my question is: there better way in python this? may there way calculate hash manually, whatever? may dict not best solution?
as numbers: there 100k of such unique objects in system @ time, integer capacity more enough.
for comparison purposes, can intern
strings , compare them is
instead of ==
, simple pointer comparison , should fast (or faster than) comparing 2 integers:
>>> 'foo' * 100 'foo' * 100 false >>> intern('foo' * 100) intern('foo' * 100) true
intern
guarantees id(intern(a)) == id(intern(b))
iff a == b
. sure intern
string input. note intern
called sys.intern
in python 3.x.
but when have pass these strings other processes, dict
solution seems best. in such situations is
str_to_id = {} s in strings: str_to_id.setdefault(s, len(str_to_id))
so integer capacity more enough
python integers bigints, should never problem.
Comments
Post a Comment