I'm trying to optimize a function in a process which is going to be run millions of times. The function is supposed to partition the vertices of a graph, which will be used to make a quotient graph. Theoretically, the process is:
- Input a graph, G
- For each vertex, v in G, we choose a random edge of v (the same edge can be chosen for two different vertices)
- We have a subgraph S, where we only use the edges chosen in (2)
- We partition our vertices by which connected component they are in
Initially, I simply created S in NetworkX, but found that the process of creating a proper graph was unnecessarily slow. So next, I tried to recreate the bare-bones of the process where I locally created an adjacency dictionary and used that with the NetworkX documentation for finding connected components. Finally, I found it fastest to simply have a dictionary of which connected component each vertex is in, yet still it seems a bit redundant, as for each edge chosen, I need to update the dictionary for the entire connected component.
My fastest version:
def sp3(G):
nodes = list(G.nodes)
P = []
dic = {}
for v in nodes:
dic[v] = frozenset([v])
for v in nodes:
nb = random.choice(list(G.neighbors(v)))
U = frozenset.union(dic[v],dic[nb])
for a in U:
dic[a] = U
for v in nodes:
P.append(dic[v])
P = list(set(P))
return P
Slower version:
def sp1(G):
nodes = list(G.nodes)
dic = {v:set() for v in nodes}
for v in nodes:
nb = random.choice(list(G.neighbors(v)))
dic[v].add(nb)
dic[nb].add(v)
P = list(CCbfs(dic))
return P
def CCbfs(dic):
seen = set()
for v in dic.keys():
if v not in seen:
c = frozenset(bfs(dic, v))
yield c
seen.update(c)
def bfs(dic,n):
d = dic
seen = set()
nextlevel = {n}
while nextlevel:
thislevel = nextlevel
nextlevel = set()
for v in thislevel:
if v not in seen:
yield v
seen.add(v)
nextlevel.update(d[v])
I've been using cProfile and got the following on a 2690 vertex, nearly maximal planar graph (in reality I'm working with ~600 vertex subgraphs of this graph, but for convenience I just ran it on the whole thing):
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 99.996 99.996 {built-in method builtins.exec}
1 0.000 0.000 99.996 99.996 <string>:1(<module>)
1 1.083 1.083 99.996 99.996 Imp2.py:1201(SPit)
2500 12.531 0.005 52.974 0.021 Imp2.py:1108(sp1)
2500 22.591 0.009 45.939 0.018 Imp2.py:1085(sp3)
13450000 9.168 0.000 29.084 0.000 random.py:256(choice)
13450000 12.218 0.000 18.138 0.000 random.py:224(_randbelow)
768750 2.602 0.000 13.118 0.000 Imp2.py:149(CCbfs)
7491250 7.274 0.000 9.910 0.000 Imp2.py:156(bfs)
13450000 6.418 0.000 9.225 0.000 graph.py:1209(neighbors)
2500 6.728 0.003 6.728 0.003 Imp2.py:1110(<dictcomp>)
20988370 4.325 0.000 4.325 0.000 {method 'getrandbits' of '_random.Random' objects}
6725000 3.312 0.000 3.312 0.000 {method 'union' of 'frozenset' objects}
13455000 2.810 0.000 2.810 0.000 {built-in method builtins.iter}
20175000 2.541 0.000 2.541 0.000 {method 'add' of 'set' objects}
7491250 2.329 0.000 2.329 0.000 {method 'update' of 'set' objects}
13455000 1.780 0.000 1.780 0.000 {built-in method builtins.len}
13450000 1.595 0.000 1.595 0.000 {method 'bit_length' of 'int' objects}
6725000 0.640 0.000 0.640 0.000 {method 'append' of 'list' objects}
5000 0.029 0.000 0.041 0.000 graph.py:646(nodes)
5000 0.012 0.000 0.012 0.000 reportviews.py:167(__init__)
5000 0.006 0.000 0.009 0.000 reportviews.py:174(__iter__)
5000 0.003 0.000 0.006 0.000 reportviews.py:171(__len__)
2500 0.001 0.000 0.001 0.000 {method 'keys' of 'dict' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
It seems like I spend so much time just going through the for loops in sp3... is there some way to fix this? (Also, I've been told to check if there's a faster implementation than random.choice
when I'm using really small lists usually of size 6, give or take, so any insight on that is welcome.)
Edit: here's something to test the code, in reality I'm using an adjacency matrix created from some text files but it's really messy and this should mostly capture the properties
import networkx as nx
import random
def SPit(m=25,n):
G = nx.grid_2d_graph(m,m)
i=0
while i<n:
x1 = sp1(G)
x3 = sp3(G)
i+=1
def sp3(G):
nodes = list(G.nodes)
P = []
dic = {}
for v in nodes:
dic[v] = frozenset([v])
for v in nodes:
nb = random.choice(list(G.neighbors(v)))
U = frozenset.union(dic[v],dic[nb])
for a in U:
dic[a] = U
for v in nodes:
P.append(dic[v])
P = list(set(P))
return P
def sp1(G):
nodes = list(G.nodes)
dic = {v:set() for v in nodes}
for v in nodes:
nb = random.choice(list(G.neighbors(v)))
dic[v].add(nb)
dic[nb].add(v)
P = list(CCbfs(dic))
return P
def CCbfs(dic):
seen = set()
for v in dic.keys():
if v not in seen:
c = frozenset(bfs(dic, v))
yield c
seen.update(c)
def bfs(dic,n):
d = dic
seen = set()
nextlevel = {n}
while nextlevel:
thislevel = nextlevel
nextlevel = set()
for v in thislevel:
if v not in seen:
yield v
seen.add(v)
nextlevel.update(d[v])
```
1 Answer 1
I'm just going to review sp3
.
P = [] <snip 8 lines> for v in nodes: P.append(dic[v])
It's generally preferable to declare your variables as near as possible to the first use. That way someone reading the code has to keep less information in their memory.
P
is not an informative name. I would usually guess that it's short for Point
, but that doesn't seem to fit.
for v in nodes: P.append(dic[v]) P = list(set(P)) return P
This could be shortened to
return list(set(dic.values()))
But it's not obvious why it's important to return a list
rather than a set
.
dic = {} for v in nodes: dic[v] = frozenset([v]) for v in nodes: nb = random.choice(list(G.neighbors(v))) U = frozenset.union(dic[v],dic[nb]) for a in U: dic[a] = U
Again, dic
is not a very informative name. I can see it's a dictionary from the declaration, but what does it mean? Perhaps components
would be a better name?
Again, I don't understand the point of conversion to list
. random.choice
is documented as accepting any sequence. If you've profiled and discovered that it's faster when passed a list, that's the kind of valuable information which should be captured in a comment.
This algorithm is not very efficient. Wikipedia's article Disjoint-set data structure contains pseudocode for a more efficient approach. Basically you replace the mapping to a set with a mapping to another vertex and build a tree. There are various optimisations which are simple to implement and harder to analyse but give an essentially linear time algorithm.
NB if you can arrange for the vertices to be numbers from 0 to n-1 then you can replace the dict
with a list, which should be slightly faster.
Also, I've been told to check if there's a faster implementation than random.choice when I'm using really small lists usually of size 6, give or take, so any insight on that is welcome.
I was expecting the answer to be no, but looking at the implementation of random.choice
(and, in particular, _randbelow_with_getrandbits
) I think that you probably can do better. Certainly it's worth profiling against an implementation rather more like Java's nextInt(int).
Basically, if you want a random number from 0
inclusive to n
exclusive, Python finds the smallest power of two which is at least n
, takes that many random bits, and tries again if the result is n
or greater. When n
is one more than a power of two it takes two tries on average. Java, on the other hand, always takes 31 random bits and calculates \2ドル^{31} \bmod n\$ to figure out the threshold at which to discard. So for n=5
it does one expensive operation (%
) but has an extremely low probability of needing to try again.
Which approach is fastest probably depends on your CPU. Given that you know the statistics of your list sizes you could also consider taking the Java approach but pre-calculating a lookup table to do the %
operations just once at start-up.
-
\$\begingroup\$ type(G.neighbors) is 'dict_keyiterator', which is why I make it a list. but otherwise your smaller comments definitely helped. currently I'm struggling to implement Union Find faster, I think it's because I need to keep track of the names of my nodes and I'm handling that wrong. once I've finally mustered up my best efforts on doing UF and the randomness, if I'm still having trouble, is it recommended I simply make an edit or a new post? \$\endgroup\$Zach Hunter– Zach Hunter2019年04月24日 15:57:15 +00:00Commented Apr 24, 2019 at 15:57
-
1\$\begingroup\$ Don't edit the question to update the code. Making a new post is one of your options. \$\endgroup\$Peter Taylor– Peter Taylor2019年04月24日 16:14:59 +00:00Commented Apr 24, 2019 at 16:14
-
\$\begingroup\$ The reason python does random choice that way is that unlike Java's implementation, it is unbiased. For small numbers it doesn't matter much, but if for example you wanted to choose a random number between 1 and 1.5*2^30, java's approach would do a very bad job of evenly distributing the results. \$\endgroup\$Oscar Smith– Oscar Smith2019年04月25日 16:25:46 +00:00Commented Apr 25, 2019 at 16:25
-
\$\begingroup\$ @OscarSmith, no. Java's approach is also unbiased; it uses a different technique to Python to achieve that, which is arguably better for weak PRNGs because it uses all of the bits output by the PRNG. At a guess the reason for Python's approach is because it doesn't require special-casing for big integers. \$\endgroup\$Peter Taylor– Peter Taylor2019年04月25日 18:43:49 +00:00Commented Apr 25, 2019 at 18:43
-
\$\begingroup\$ I guess I don't understand what you mean by calculates
2^{31} mod n
to figure out the threshold at which to discard. Can you clarify what that means? \$\endgroup\$Oscar Smith– Oscar Smith2019年04月25日 20:13:21 +00:00Commented Apr 25, 2019 at 20:13
sp3
? \$\endgroup\$