-
Notifications
You must be signed in to change notification settings - Fork 4
Description
Current Behavior
Using a simple test program to add 100 000 nodes to a graph in a fresh ArangoDB database is very slow (3 minutes) and consumes way too much RAM (350 MB) compared to regular NetworkX (which takes 0.38 seconds and consumes 60MB, storing all nodes in memory rather than in a database on the disk). Also, the more nodes I'm adding, the more memory it uses, signifying memory leaks.
Expected Behavior
According to https://arangodb.com/introducing-the-arangodb-networkx-persistence-layer/, it should:
Handle Big Graphs Without Breaking a Sweat
Store massive graphs that would otherwise overwhelm memory in NetworkX, thanks to ArangoDBʼs ability to scale up.
... which definitely should NEVER use more memory than NetworkX. It could be a little bit slower due to the DB overhead, but not by such a huge factor (for instance SQLite3 can easily insert 100 000 rows in less than a second).
Steps to Reproduce
Save the attached tiny scripts in a directory (say ~/t), create new databases in arangosh using:
db._createDatabase('testdb1') db._createDatabase('testdb2')
and then run:
(v) user@box:~/t$ export DATABASE_HOST=http://127.0.0.1:8529 (v) user@box:~/t$ export DATABASE_PASSWORD=root (v) user@box:~/t$ export DATABASE_USERNAME=root (v) user@box:~/t$ export DATABASE_NAME=testdb1 (v) user@box:~/t$ /usr/bin/time -v ./test_nx_arango.py [18:25:57 +0100] [INFO]: NetworkX-cuGraph is unavailable: No module named 'cupy'. [18:25:57 +0100] [INFO]: Graph 'MyGraph1' created. 100000 Command being timed: "./test_nx_arango.py" User time (seconds): 143.44 System time (seconds): 7.35 Percent of CPU this job got: 84% Elapsed (wall clock) time (h:mm:ss or m:ss): 2:58.27 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 348476 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 230 Minor (reclaiming a frame) page faults: 83816 Voluntary context switches: 201036 Involuntary context switches: 2283 Swaps: 0 File system inputs: 66888 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 (v) user@box:~/t$ export DATABASE_NAME=testdb2 (v) user@box:~/t$ /usr/bin/time -v ./test_nx_arango_batch.py [18:29:50 +0100] [INFO]: NetworkX-cuGraph is unavailable: No module named 'cupy'. [18:29:50 +0100] [INFO]: Graph 'MyGraph1' created. 100000 Command being timed: "./test_nx_arango_batch.py" User time (seconds): 144.68 System time (seconds): 6.37 Percent of CPU this job got: 85% Elapsed (wall clock) time (h:mm:ss or m:ss): 2:57.05 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 355252 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 85774 Voluntary context switches: 199885 Involuntary context switches: 2611 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 (v) user@box:~/t$ /usr/bin/time -v ./test_nx.py 100000 Command being timed: "./test_nx.py" User time (seconds): 0.38 System time (seconds): 0.07 Percent of CPU this job got: 99% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.46 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 59880 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 14032 Voluntary context switches: 1 Involuntary context switches: 11 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0
Observe much larger Maximum resident set size and way more time than test_nx.py, which uses pure NetworkX.
Environment
OS: Debian 12.9
Python version: 3.11.2
NetworkX version: 3.4
NetworkX-ArangoDB version: 1.3.0
NetworkX-cuGraph version (if applicable): N/A
ArangoDB version: 3.12.4-1
I've also used GNU time 1.9-0.2 from a Debian package time, rather than the bash builtin, to show memory usage.
Additional context
See attached files, or just paste them right in:
test_nx_arango.py:
#! /usr/bin/env python3 import nx_arangodb as nxa G = nxa.Graph(name='MyGraph1') for i in range(100_000): G.add_node(f'node_{i}') print(G.number_of_nodes())
test_nx_arango_batch.py:
#! /usr/bin/env python3 import nx_arangodb as nxa G = nxa.Graph(name='MyGraph1') G.add_nodes_from([ f'node_{i}' for i in range(100_000) ]) print(G.number_of_nodes())
test_nx.py:
#! /usr/bin/env python3 import networkx as nx G = nx.Graph(name='MyGraph1') for i in range(100_000): G.add_node(f'node_{i}') print(G.number_of_nodes())