I have a set of unique strings, I want to create a unique integer identifier for each string.
Usage I want a function to move back and forth, if I give it an integer it returns the corresponding string and vice versa.
Here is how I am doing it
def str_to_int(S):
integers = list(range(len(S)))
my_dict = dict(zip(S,integers))
rev_dict = dict(zip(integers,S))
return my_dict, rev_dict
If I need to get the integer identifier of an item of S
, I need to call the function and then the appropriate returned dictionary.
I want something simpler, given an integer or a string, it knows, somehow automatically if it's an int or str and return the other identifier (i.e. if I give an int it returns the str identifier and vice versa). Is it possible to do it in a single function ? (if possible without being obliged to recreate dictionaries for each call)
Edit: I thought of doing to functions str_to_int(S:set, string:str)->int
and int_to_str(S:set ,integer:int)->str
but the problem is 1) that's two functions, 2) each time two dictionaries are created.
3 Answers 3
Since what you want is something a little more complicated than what a normal dictionary can do, I think you want to encapsulate all of this in a class that behaves the way you want the dict to behave. You can make it "look like" a dict by implementing __getitem__
, something like:
from typing import Dict, List, Set, Union, overload
class StringTable:
"""Associate strings with unique integer IDs."""
def __init__(self, strings: Set[str]):
"""Initialize the string table with the given set of strings."""
self._keys: List[str] = []
self._ids: Dict[str, int] = {}
for key in strings:
self._ids[key] = len(self._keys)
self._keys.append(key)
@overload
def __getitem__(self, o: int) -> str: ...
@overload
def __getitem__(self, o: str) -> int: ...
def __getitem__(self, o: Union[int, str]) -> Union[str, int]:
"""Accepts either a string or int and returns its counterpart."""
if isinstance(o, int):
return self._keys[o]
elif isinstance(o, str):
return self._ids[o]
else:
raise TypeError("Bad argument!")
def __len__(self) -> int:
return len(self._keys)
Now you can use it like:
strings = {"foo", "bar", "baz"}
bijection = StringTable(strings)
for s in strings:
print(s, bijection[s])
assert bijection[bijection[s]] == s
etc.
-
\$\begingroup\$ Didn't know about
overload
and that usage ofEllipsis
, nice! \$\endgroup\$Graipher– Graipher2020年02月11日 08:45:05 +00:00Commented Feb 11, 2020 at 8:45
I'm not sure why you insist on doing this only with a single function, but we surely can. Also, just pass in the prebuilt dictionaries so that you don't need to build them on every call.
def build_mapping(S):
integers = list(range(len(S)))
return dict(zip(S, integers)), dict(zip(integers, S))
def get_value(key, conv, rev_conv):
return conv[key] if isinstance(key, str) else rev_conv[key]
S = ['foo', 'bar', 'baz', 'hello', 'world']
conv, rev_conv = build_mapping(S)
key = 'hello'
key2 = 3
# print "3 hello"
print(get_value(key, conv, rev_conv), get_value(key2, conv, rev_conv))
-
\$\begingroup\$ I thought it's more readable using a single function and it's less lines of code. But, I guess, you don't recommend it. Am I right? \$\endgroup\$user218022– user2180222020年02月10日 19:57:22 +00:00Commented Feb 10, 2020 at 19:57
If the keys are strings and ints, they can't collide, so they can go in the same dict.
strings = ['one', 'alpha', 'blue']
# mapping from strings to ints and ints to strings
two_way_dict = {}
for i,s in enumerate(strings):
two_way_dict.update([(s,i), (i,s)])
bijection = two_way_dict.get
#example
bijection('alpha') -> 1
bijection(1) -> 'alpha'