Dictionary of (named) tuples in Python and speed/RAM performance

Question 1

I'm creating a dictionary d of one million of items which are tuples, and ideally I'd like to access them with:

d[1634].id # or d[1634]['id']
d[1634].name # or d[1634]['name']
d[1634].isvalid # or d[1634]['isvalid']

rather than d[1634][0], d[1634][1], d[1634][2] which is less explicit.

According to my test:

import os, psutil, time, collections, typing
Tri = collections.namedtuple('Tri', 'id,name,isvalid')
Tri2 = typing.NamedTuple("Tri2", [('id', int), ('name', str), ('isvalid', bool)])
t0 = time.time()
# uncomment only one of these 4 next lines:
d = {i: (i+1, 'hello', True) for i in range(1000000)} # tuple
# d = {i: {'id': i+1, 'name': 'hello', 'isvalid': True} for i in range(1000000)} # dict
# d = {i: Tri(id=i+1, name='hello', isvalid=True) for i in range(1000000)} # namedtuple
# d = {i: Tri2(id=i+1, name='hello', isvalid=True) for i in range(1000000)} # NamedTuple
print('%.3f s %.1f MB' % (time.time()-t0, psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2))
"""
tuple: 0.257 s 193.3 MB
dict: 0.329 s 363.6 MB
namedtuple: 1.253 s 193.3 MB (collections)
NamedTuple: 1.250 s 193.5 MB (typing)
"""

using a dict doubles the RAM usage, compared to a tuple
using a namedtuple or NamedTuple multiplies by 5 the time spent, compared to a tuple!

Question: is there a tuple-like data structure in Python 3 which allows to access the data with x.id, x.name, etc. and also is RAM and CPU efficient?

Notes:

in my real use case, the tuple is something like a C-struct of type (uint64, uint64, bool).
I've also tried with:
- slots (to avoid the interal object's __dict__, see Usage of __slots__?)
- dataclass:
```
@dataclasses.dataclass
class Tri3:
 id: int
 ...
```
- ctypes.Structure:
```
class Tri7(ctypes.Structure):
 _fields_ = [("id", ctypes.c_int), ...]
```
but it was not better (all of them ~ 1.2 sec.), nothing close to a genuine tuple in terms of performance
Here are other options: C-like structures in Python

Question 2

Unless you are frequently rebuilding the entire dictionary, I wouldn't worry too much about how long it takes to do so. In practice, you are probably going to be loading data from a file, and the IO is going to take longer than instantiating the objects.

Question 3

what performance do you care about? Creation or lookup?

Question 4

The psutil method you use gives you just a snapshot of the current memory usage. It is notoriously inaccurate in the face of garbage-collection etc. In fact, multiple runs will show you how it fluctuates.

Question 5

@anon01 both are important

Question 6

two more methods to consider: a @dataclass (slower than your namedtuple), and making a larger container at once such as a pandas.DataFrame (also slower, but quite compact). The latter also wouldn't give you directly a dot accessor; instead you'd have to do: df.iloc[1634].name.

Question 7

Cython's cdef-classes might be what you want: They use less memory than the pure Python classes, even if it comes at costs of more overhead when accessing members (because fields are stored as C-values and not Python-objects).

For example:

%%cython
cdef class CTuple:
 cdef public unsigned long long int id
 cdef public str name
 cdef public bint isvalid
 
 def __init__(self, id, name, isvalid):
 self.id = id
 self.name = name
 self.isvalid = isvalid

which can be used as wished:

ob=CTuple(1,"mmm",3)
ob.id, ob.name, ob.isvalid # prints (2, "mmm", 3)

Timings/memory consumption:

First, the baseline on my machine:

0.258 s 252.4 MB # tuples
0.343 s 417.5 MB # dict
1.181 s 264.0 MB # namedtuple collections

with CTuple we get:

0.306 s 191.0 MB

which is almost as fast and needs considerable less memory.

If the C-type of members isn't clear at compile time, one could use simple python-objects:

%%cython
cdef class PTuple:
 cdef public object id
 cdef public object name
 cdef public object isvalid
 
 def __init__(self, id, name, isvalid):
 self.id = id
 self.name = name
 self.isvalid = isvalid

The timings are a little bit surprising:

0.648 s 249.8 MB

I didn't expect it to be so much slower than the CTuple-version, but at least it is twice as fast as named tuples.

One disadvantage of this approach is that it needs compilation. Cython however offers cython.inline which can be used to compile Cython-code created on-the-fly.

I've released cynamedtuple which can be installed via pip install cynamedtuple, and is based on the prototype bellow:

import cython
# for generation of cython code:
tab = " "
def create_members_definition(name_to_ctype):
 members = []
 for my_name, my_ctype in name_to_ctype.items():
 members.append(tab+"cdef public "+my_ctype+" "+my_name)
 return members
def create_signature(names):
 return tab + "def __init__(self,"+", ".join(names)+"):"
def create_initialization(names):
 inits = [tab+tab+"self."+x+" = "+x for x in names]
 return inits
def create_cdef_class_code(classname, names):
 code_lines = ["cdef class " + classname + ":"]
 code_lines.extend(create_members_definition(names))
 code_lines.append(create_signature(names.keys()))
 code_lines.extend(create_initialization(names.keys()))
 return "\n".join(code_lines)+"\n"
# utilize cython.inline to generate and load pyx-module:
def create_cnamedtuple_class(classname, names):
 code = create_cdef_class_code(classname, names)
 code = code + "GenericClass = " + classname +"\n"
 ret = cython.inline(code)
 return ret["GenericClass"]

Which can be used as follows, to dynamically define CTuple from above:

CTuple = create_cnamedtuple_class("CTuple", 
 {"id":"unsigned long long int", 
 "name":"str",
 "isvalid":"bint"})
ob = CTuple(1,"mmm",3)
...

Another alternative could be to use jit-compilation and Numba's jitted-classes which offer this possibility. They however seem to be much slower:

from numba import jitclass, types
spec = [
 ('id', types.uint64), 
 ('name', types.string),
 ('isvalid', types.uint8),
]
@jitclass(spec)
class NBTuple(object):
 def __init__(self, id, name, isvalid):
 self.id = id
 self.name = name
 self.isvalid = isvalid

and the results are:

20.622 s 394.0 MB

so numba jitted classes are not (yet?) a good choice.

Question 8

Wonderful solution @ead! Is it possible to use a once-for-all Cython-compiled version? Then I could compile this module only once, once for all, and then just do MyCTuple = GenericCTuple(types=[ctypes.unsignedlonglongint, ctypes.str, ctypes.bool], names=['id', 'name', 'isvalid']), and then MyCTuple(1, "hello", True) in the future. If this is possible, I think CTuple would be cool to have on PyPi!

Question 9

After testing this @ead, I just tried with ctypes.Structure: Tri7(ctypes.Structure): _fields_ = [("id", ctypes.c_int), ("name", ctypes.c_int), ("isvalid", ctypes.c_bool)] (here I took the second param as int to avoid the char pointers complexity just for the test), but strangely it's not very good as well: 1.3 sec in my same test vs 0.3s for tuples.

Question 10

@Basj that is not that surprising: while tuple and also CTuple are pure C-code at work. ctypes.Structure is mostly pure Python, so it will be somewhat slower (that is the reason for NameTuples being slower as well)

Question 11

Thanks again for your great answer, and for the update, that confirms that recompilation is necessary for each new type of CTuple (I had a hope it would be possible with only one compilation once, and then only combinations of already compiled elements, but I think this is not possible ;)).

Question 12

@Basj There is now pip install cynamedtuple available.

Question 13

You can try to reverse it (store as struct of arrays) and access the values as x['id'][1634]. In other words, x is a dictionary with three keys and value for each key is a list. It will be space efficient.

Or you can use pandas dataframes. Dataframes are stored in a matrix form where the rows have numeric IDs and columns have labels (strings like 'name' etc.). For a dataframe df, df.iloc[i] points to the $i^th$ row and you can access the name in that row by df.iloc[i].name or df.iloc[i]['name']

Question 14

x['id'][K] as proposed assumes K is an int from 0 to MAX. There was not such limitation In the original form d[K].id

Question 15

There is one another fast and compact way with the help of recordclass library:

pip3 install recordclass
import recordclass
TriDO = recordclass.make_dataclass("TriDO", 
 [('id', int), ('name', str), ('isvalid', bool)],
 fast_new=True)

Here are the values performance counters (linux, 64-bit, python3.9, recordclass >= 0.15):

tuple:

t0 = time.time()
d = {i: (i+1, 'hello', True) for i in range(1000000)}
print('%.3f s %.1f MB' % (time.time()-t0,
 psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2))
0.235 s 259.1 MB

dict:

t0 = time.time()
d = {i: {'id':i+1, 'name':'hello', 'isvalid':True} for i in range(1000000)}
print('%.3f s %.1f MB' % (time.time()-t0,
 psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2))
0.332 s 428.6 MB

namedtuple:

t0 = time.time()
d = {i: Tri(i+1, 'hello', True) for i in range(1000000)}
print('%.3f s %.1f MB' % (time.time()-t0,
 psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2))
1.195 s 275.6 MB

'NamedTuple:'

t0 = time.time()
d = {i: Tri2(i+1, 'hello', True) for i in range(1000000)}
print('%.3f s %.1f MB' % (time.time()-t0,
 psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2))
1.059 s 275.2 MB

dataobject:

t0 = time.time()
d = {i: TriDO(i+1, 'hello', True) for i in range(1000000)}
print('%.3f s %.1f MB' % (time.time()-t0,
 psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2))
0.256 s 244.2 MB

Here more accurate timings:

%timeit d = {i:(i+1, 'hello', True) for i in range(1000000)} # tuple
162 ms ± 756 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit d = {i:{'id':i+1, 'name':'hello', 'isvalid':True} for i in range(1000000)} # dict
250 ms ± 2.77 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit d = {i:Tri(i+1,'hello',True) for i in range(1000000)} # namedtuple
318 ms ± 422 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit d = {i:Tri2(i+1,'hello',True) for i in range(1000000)} # NamedTuple
330 ms ± 5.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit d = {i:TriDO(i+1,'hello',True) for i in range(1000000)} # dataobject
188 ms ± 823 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Question 16

You could keep tuples and name your indices (2nd option below)

from enum import IntEnum
# int variable
ID, NAME, IS_VALID = 0, 1, 2
# IntEnum
class Index(IntEnum):
 ID = 0
 NAME = 1
 IS_VALID = 2
 
# Create tuples
d = {i: (i+1, 'hello', True) for i in range(int(1e6))} 
t0 = time.time()
# check data access performance
# uncomment only one of these 3 next lines:
# for i in range(len(d)): _ = d[i][0], d[i][1], d[i][2]
# for i in range(len(d)): _ = d[i][ID], d[i][NAME], d[i][IS_VALID]
for i in range(len(d)): _ = d[i][Index.ID], d[i][Index.NAME], d[i][Index.IS_VALID]
print('%.3f s' % (time.time()-t0))
"""
int 0.307 s
int variable 0.312 s
IntEnum 0.749 s
"""

ead 34.8k8 gold badges101 silver badges171 bronze badges · Accepted Answer · 2020-12-09 09:47:58Z

Cython's cdef-classes might be what you want: They use less memory than the pure Python classes, even if it comes at costs of more overhead when accessing members (because fields are stored as C-values and not Python-objects).

For example:

%%cython
cdef class CTuple:
 cdef public unsigned long long int id
 cdef public str name
 cdef public bint isvalid
 
 def __init__(self, id, name, isvalid):
 self.id = id
 self.name = name
 self.isvalid = isvalid

which can be used as wished:

ob=CTuple(1,"mmm",3)
ob.id, ob.name, ob.isvalid # prints (2, "mmm", 3)

Timings/memory consumption:

First, the baseline on my machine:

0.258 s 252.4 MB # tuples
0.343 s 417.5 MB # dict
1.181 s 264.0 MB # namedtuple collections

with CTuple we get:

0.306 s 191.0 MB

which is almost as fast and needs considerable less memory.

If the C-type of members isn't clear at compile time, one could use simple python-objects:

%%cython
cdef class PTuple:
 cdef public object id
 cdef public object name
 cdef public object isvalid
 
 def __init__(self, id, name, isvalid):
 self.id = id
 self.name = name
 self.isvalid = isvalid

The timings are a little bit surprising:

0.648 s 249.8 MB

I didn't expect it to be so much slower than the CTuple-version, but at least it is twice as fast as named tuples.

One disadvantage of this approach is that it needs compilation. Cython however offers cython.inline which can be used to compile Cython-code created on-the-fly.

I've released cynamedtuple which can be installed via pip install cynamedtuple, and is based on the prototype bellow:

import cython
# for generation of cython code:
tab = " "
def create_members_definition(name_to_ctype):
 members = []
 for my_name, my_ctype in name_to_ctype.items():
 members.append(tab+"cdef public "+my_ctype+" "+my_name)
 return members
def create_signature(names):
 return tab + "def __init__(self,"+", ".join(names)+"):"
def create_initialization(names):
 inits = [tab+tab+"self."+x+" = "+x for x in names]
 return inits
def create_cdef_class_code(classname, names):
 code_lines = ["cdef class " + classname + ":"]
 code_lines.extend(create_members_definition(names))
 code_lines.append(create_signature(names.keys()))
 code_lines.extend(create_initialization(names.keys()))
 return "\n".join(code_lines)+"\n"
# utilize cython.inline to generate and load pyx-module:
def create_cnamedtuple_class(classname, names):
 code = create_cdef_class_code(classname, names)
 code = code + "GenericClass = " + classname +"\n"
 ret = cython.inline(code)
 return ret["GenericClass"]

Which can be used as follows, to dynamically define CTuple from above:

CTuple = create_cnamedtuple_class("CTuple", 
 {"id":"unsigned long long int", 
 "name":"str",
 "isvalid":"bint"})
ob = CTuple(1,"mmm",3)
...

Another alternative could be to use jit-compilation and Numba's jitted-classes which offer this possibility. They however seem to be much slower:

from numba import jitclass, types
spec = [
 ('id', types.uint64), 
 ('name', types.string),
 ('isvalid', types.uint8),
]
@jitclass(spec)
class NBTuple(object):
 def __init__(self, id, name, isvalid):
 self.id = id
 self.name = name
 self.isvalid = isvalid

and the results are:

20.622 s 394.0 MB

so numba jitted classes are not (yet?) a good choice.

Wonderful solution @ead! Is it possible to use a once-for-all Cython-compiled version? Then I could compile this module only once, once for all, and then just do MyCTuple = GenericCTuple(types=[ctypes.unsignedlonglongint, ctypes.str, ctypes.bool], names=['id', 'name', 'isvalid']), and then MyCTuple(1, "hello", True) in the future. If this is possible, I think CTuple would be cool to have on PyPi!
After testing this @ead, I just tried with ctypes.Structure: Tri7(ctypes.Structure): _fields_ = [("id", ctypes.c_int), ("name", ctypes.c_int), ("isvalid", ctypes.c_bool)] (here I took the second param as int to avoid the char pointers complexity just for the test), but strangely it's not very good as well: 1.3 sec in my same test vs 0.3s for tuples.
@Basj that is not that surprising: while tuple and also CTuple are pure C-code at work. ctypes.Structure is mostly pure Python, so it will be somewhat slower (that is the reason for NameTuples being slower as well)
Thanks again for your great answer, and for the update, that confirms that recompilation is necessary for each new type of CTuple (I had a hope it would be possible with only one compilation once, and then only combinations of already compiled elements, but I think this is not possible ;)).

CollectivesTM on Stack Overflow

Dictionary of (named) tuples in Python and speed/RAM performance

4 Answers 4

6 Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

4 Answers 4

6 Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related