I am creating what strives to be a truly random number generator. As of right now, the implementation uses a mix of mainly nonsense system data and some pseudo-random numbers to generate numbers that are different and use the last half of the number (which, statistically, I believe is sufficiently random). The advice I would like:
Obviously, a general "review", as always.
How can I optimize this code to be faster? Currently, generating ten 1000 digit numbers takes two seconds, ten 10000 digit numbers takes 35-37 seconds, and ten 100000 digit numbers is exponentially slower.
How practical is this? Currently the generator uses some methods which may be considered "hacky"; ergo, using the last half of a number, or including pseudo-random numbers in the algorithm. Is this OK? I believe it should be, as long as the numbers generated are indeed random, but I would like other's opinions on this.
Another thing to note is that when testing some sample batches with this online tester, I got p-values around 0.24, which I believe is the optimal range. If you answer, don't feel pressured to answer all three questions.
import os, re, time, random
from functools import reduce
def rand_num(digits=3):
nums = []
strs = []
rand = []
add = True
all_info = re.sub('(.*?: *)','',os.popen("cat /proc/meminfo").read()).replace('kB','').splitlines()+list(filter(None,re.findall(r'(\w*)',os.popen('ps').read())))
all_info = list(x.strip() for x in all_info)
nums.append(random.sample(range(digits*70), digits*35))
nums = nums[0]
for x in all_info:
if x.isdigit():
nums.append(int(x))
else:
strs.append(x)
jugglenum = 0
for x in nums:
if add == True:
jugglenum += x
add = False
else:
jugglenum -= x
add = True
jugglenum = abs(jugglenum)
for i in range(digits):
rand.append(int(time.time()*(os.getpid()%jugglenum)))
for i, x in enumerate(rand):
rand[i] = ''.join(random.sample(str(x), len(str(x))))
rand = [int(x)//10 for x in rand]
largenum = str(reduce(lambda x, y: x*y, rand))[:digits]
return ''.join(random.sample(random.sample(largenum,len(largenum)),len(largenum)))
An executable program on repl.it.
Yes, I know, this is similar to random.SystemRandom
or os.urandom
. This program I am developing now is really my first look into this concept and I don't plan for this to be my final program by any means. This is just something that I thought I would play around with for the time being.
So before you go and tell me that this is pointless and could have no application, I know. This is just my own personal experiment so I can get better at this sort of program.
-
\$\begingroup\$ Welcome to Code Review! Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers. \$\endgroup\$Sᴀᴍ Onᴇᴌᴀ– Sᴀᴍ Onᴇᴌᴀ ♦2018年11月20日 00:56:31 +00:00Commented Nov 20, 2018 at 0:56
3 Answers 3
Couple of minor simplifications:
nums =[]
...
nums.append(random.sample(...))
nums = nums[0]
could be replaced with:
nums = random.sample(...)
The add
/jugglenum
loop calculation could be done with:
jugglenum = abs(sum(nums[::2]) - sum(nums[1::2]))
This code:
for i, x in enumerate(rand):
rand[i] = ''.join(random.sample(str(x), len(str(x))))
could be replaced with:
rand = [ ''.join(random.sample(s, len(s)) for s in (str(x) for x in rand)]
which eliminates the enumeration and avoids turning x
into a str(x)
twice.
I don’t see strs
being used anywhere, so the code that calculates it can be removed.
-
\$\begingroup\$ Interestingly enough, your changes acctually made the program significantly slower (which was much unexpected to me considering how messy my code is). \$\endgroup\$moltarze– moltarze2018年11月19日 11:49:18 +00:00Commented Nov 19, 2018 at 11:49
-
1\$\begingroup\$ Oops. Did you profile it to see which of the 3 changes made it slower? My money is on the
iter
+zip
+sum
change, which is good because I think I over complicated that cleanup.jugglenum = abs(sum(nums[::2])-sum(nums[1::2]))
is 3 lines shorter, clearer, and probably faster. \$\endgroup\$AJNeufeld– AJNeufeld2018年11月19日 15:36:58 +00:00Commented Nov 19, 2018 at 15:36 -
1\$\begingroup\$ OK, that definitely fixed the speed. Interestingly enough, however, your list comp is actually significantly slower than the enumeration method. I'm really scratching my head with this one - shouldn't list comps be much faster than enumerates? \$\endgroup\$moltarze– moltarze2018年11月19日 22:05:06 +00:00Commented Nov 19, 2018 at 22:05
-
1\$\begingroup\$ Actually: disregard that. I was having problems with other parts of the code. You can actually run it if you want, it's much faster now. \$\endgroup\$moltarze– moltarze2018年11月19日 22:17:24 +00:00Commented Nov 19, 2018 at 22:17
Some comments:
all_info = re.sub('(.*?: *)','',os.popen("cat /proc/meminfo").read()).replace('kB','').splitlines()+list(filter(None,re.findall(r'(\w*)',os.popen('ps').read())))
"can" is not "should". You "should" not do this. This should be probably upwards of four lines. Also, popen
is deprecated in favour of subprocess
. Even so, don't use cat
. Simply open the file.
More generally, you should do some reading on entropy.
So, after some great suggestions from other users here (thanks a lot everyone!) I was able to improve the program and reduce its size by about 10%. Here is the updated program:
import os, re, time, random, subprocess as sp
from functools import reduce
def rand_num(digits=3):
rand=[];strs=[]
all_info = re.sub('(.*?: *)','',open('/proc/meminfo').read()).replace('kB','').splitlines()+list(filter(None,re.findall(r'(\w*)',sp.Popen('ps', stdout=sp.PIPE, shell=True).stdout.read().decode())))
all_info = list(x.strip() for x in all_info)
nums = random.sample(range(digits*70), digits*35)
nums.extend([int(x) for x in all_info if x.isdigit()])
strs.extend([x for x in all_info if not x.isdigit()])
jugglenum = abs(sum(nums[::2])-sum(nums[1::2]))
for i in range(digits):
rand.append(int(time.time()*(os.getpid()%jugglenum)))
rand = [''.join(random.sample(str(x), len(str(x)))) for x in rand[:]]
rand = [int(x)//10 for x in rand]
largenum = str(reduce(lambda x, y: x*y, rand))[:digits]
return ''.join(random.sample(random.sample(largenum,len(largenum)),len(largenum)))
I managed to get most of the large functions down to one-liners that run much faster and produce impressive results. I also managed to reduce the size of the function by about 10%, from over 1000 bytes to about 900 bytes.
I ran both scripts with timeit
, testing how long it took to do ten runs producing numbers of a set length. Here's the results to produce ten 1, 10, 100, 1000, and 10000 digit numbers with both scripts:
Old script:
Ones: 0.4990974200045457
Tens: 0.6853595590000623
Hundreds: 1.0144964690043707
Thousands: 5.584901581998565
Ten-thousands: 40.79681804100255
----
New script:
Ones: 0.11410925599921029
Tens: 0.17456803799723275
Hundreds: 0.3204780189989833
Thousands: 2.4103602649993263
Ten-thousands: 30.67583657600335
-
\$\begingroup\$ Still: don't do ten things on one line. You're shooting yourself and every developer that touches the code after you in both feet. \$\endgroup\$Reinderien– Reinderien2018年11月20日 01:11:33 +00:00Commented Nov 20, 2018 at 1:11
-
\$\begingroup\$ @Reinderien Which line are you talking about? I find it useful to do that both for speed, and because it makes sense to execute an operation that is all mentally grouped together just in one go. \$\endgroup\$moltarze– moltarze2018年11月20日 01:14:19 +00:00Commented Nov 20, 2018 at 1:14
-
\$\begingroup\$ I'm talking about this line:
all_info = re.sub('(.*?: *)','',open('/proc/meminfo').read()).replace('kB','').splitlines()+list(filter(None,re.findall(r'(\w*)',sp.Popen('ps', stdout=sp.PIPE, shell=True).stdout.read().decode())))
- what you're doing is called premature optimization. Grouping things onto one line might strip off a few nanoseconds from parse time, but since your script is only parsed once, doing this has absolutely no effect on execution time. \$\endgroup\$Reinderien– Reinderien2018年11月20日 01:15:48 +00:00Commented Nov 20, 2018 at 1:15 -
\$\begingroup\$ But don't take my word for it - measure both and post the results :) \$\endgroup\$Reinderien– Reinderien2018年11月20日 01:17:09 +00:00Commented Nov 20, 2018 at 1:17
-
\$\begingroup\$ @Reinderien Speaking of which, one of the times just finished: 1072.1309 seconds for generating ten hundred-thousand digit numbers with the old script. I'll see if it finishes for the new script soon enough; I might have to stop it for the night. \$\endgroup\$moltarze– moltarze2018年11月20日 01:18:43 +00:00Commented Nov 20, 2018 at 1:18
Explore related questions
See similar questions with these tags.