Pseudo-truly random number generator

Question 1

I am creating what strives to be a truly random number generator. As of right now, the implementation uses a mix of mainly nonsense system data and some pseudo-random numbers to generate numbers that are different and use the last half of the number (which, statistically, I believe is sufficiently random). The advice I would like:

Obviously, a general "review", as always.
How can I optimize this code to be faster? Currently, generating ten 1000 digit numbers takes two seconds, ten 10000 digit numbers takes 35-37 seconds, and ten 100000 digit numbers is exponentially slower.
How practical is this? Currently the generator uses some methods which may be considered "hacky"; ergo, using the last half of a number, or including pseudo-random numbers in the algorithm. Is this OK? I believe it should be, as long as the numbers generated are indeed random, but I would like other's opinions on this.

Another thing to note is that when testing some sample batches with this online tester, I got p-values around 0.24, which I believe is the optimal range. If you answer, don't feel pressured to answer all three questions.

import os, re, time, random
from functools import reduce
def rand_num(digits=3):
 nums = []
 strs = []
 rand = []
 add = True
 all_info = re.sub('(.*?: *)','',os.popen("cat /proc/meminfo").read()).replace('kB','').splitlines()+list(filter(None,re.findall(r'(\w*)',os.popen('ps').read())))
 all_info = list(x.strip() for x in all_info)
 nums.append(random.sample(range(digits*70), digits*35))
 nums = nums[0]
 for x in all_info:
 if x.isdigit():
 nums.append(int(x))
 else:
 strs.append(x)
 jugglenum = 0
 for x in nums:
 if add == True:
 jugglenum += x
 add = False
 else:
 jugglenum -= x
 add = True
 jugglenum = abs(jugglenum)
 for i in range(digits):
 rand.append(int(time.time()*(os.getpid()%jugglenum)))
 for i, x in enumerate(rand):
 rand[i] = ''.join(random.sample(str(x), len(str(x))))
 rand = [int(x)//10 for x in rand]
 largenum = str(reduce(lambda x, y: x*y, rand))[:digits]
 return ''.join(random.sample(random.sample(largenum,len(largenum)),len(largenum)))

An executable program on repl.it.

Yes, I know, this is similar to random.SystemRandom or os.urandom. This program I am developing now is really my first look into this concept and I don't plan for this to be my final program by any means. This is just something that I thought I would play around with for the time being.

So before you go and tell me that this is pointless and could have no application, I know. This is just my own personal experiment so I can get better at this sort of program.

Question 2

Welcome to Code Review! Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.

Question 3

Couple of minor simplifications:

nums =[]
...
nums.append(random.sample(...))
nums = nums[0]

could be replaced with:

nums = random.sample(...)

The add/jugglenum loop calculation could be done with:

jugglenum = abs(sum(nums[::2]) - sum(nums[1::2]))

This code:

for i, x in enumerate(rand):
 rand[i] = ''.join(random.sample(str(x), len(str(x))))

could be replaced with:

rand = [ ''.join(random.sample(s, len(s)) for s in (str(x) for x in rand)]

which eliminates the enumeration and avoids turning x into a str(x) twice.

I don’t see strs being used anywhere, so the code that calculates it can be removed.

Question 4

Interestingly enough, your changes acctually made the program significantly slower (which was much unexpected to me considering how messy my code is).

Question 5

Oops. Did you profile it to see which of the 3 changes made it slower? My money is on the iter+zip+sum change, which is good because I think I over complicated that cleanup. jugglenum = abs(sum(nums[::2])-sum(nums[1::2])) is 3 lines shorter, clearer, and probably faster.

Question 6

OK, that definitely fixed the speed. Interestingly enough, however, your list comp is actually significantly slower than the enumeration method. I'm really scratching my head with this one - shouldn't list comps be much faster than enumerates?

Question 7

Actually: disregard that. I was having problems with other parts of the code. You can actually run it if you want, it's much faster now.

Question 8

Some comments:

all_info = re.sub('(.*?: *)','',os.popen("cat /proc/meminfo").read()).replace('kB','').splitlines()+list(filter(None,re.findall(r'(\w*)',os.popen('ps').read())))

"can" is not "should". You "should" not do this. This should be probably upwards of four lines. Also, popen is deprecated in favour of subprocess. Even so, don't use cat. Simply open the file.

More generally, you should do some reading on entropy.

Question 9

So, after some great suggestions from other users here (thanks a lot everyone!) I was able to improve the program and reduce its size by about 10%. Here is the updated program:

import os, re, time, random, subprocess as sp
from functools import reduce
def rand_num(digits=3):
 rand=[];strs=[]
 all_info = re.sub('(.*?: *)','',open('/proc/meminfo').read()).replace('kB','').splitlines()+list(filter(None,re.findall(r'(\w*)',sp.Popen('ps', stdout=sp.PIPE, shell=True).stdout.read().decode())))
 all_info = list(x.strip() for x in all_info)
 nums = random.sample(range(digits*70), digits*35)
 nums.extend([int(x) for x in all_info if x.isdigit()])
 strs.extend([x for x in all_info if not x.isdigit()])
 jugglenum = abs(sum(nums[::2])-sum(nums[1::2]))
 for i in range(digits):
 rand.append(int(time.time()*(os.getpid()%jugglenum)))
 rand = [''.join(random.sample(str(x), len(str(x)))) for x in rand[:]]
 rand = [int(x)//10 for x in rand]
 largenum = str(reduce(lambda x, y: x*y, rand))[:digits]
 return ''.join(random.sample(random.sample(largenum,len(largenum)),len(largenum)))

I managed to get most of the large functions down to one-liners that run much faster and produce impressive results. I also managed to reduce the size of the function by about 10%, from over 1000 bytes to about 900 bytes.

I ran both scripts with timeit, testing how long it took to do ten runs producing numbers of a set length. Here's the results to produce ten 1, 10, 100, 1000, and 10000 digit numbers with both scripts:

Old script:
Ones: 0.4990974200045457
Tens: 0.6853595590000623
Hundreds: 1.0144964690043707
Thousands: 5.584901581998565
Ten-thousands: 40.79681804100255
----
New script:
Ones: 0.11410925599921029
Tens: 0.17456803799723275
Hundreds: 0.3204780189989833
Thousands: 2.4103602649993263
Ten-thousands: 30.67583657600335

Question 10

Still: don't do ten things on one line. You're shooting yourself and every developer that touches the code after you in both feet.

Question 11

@Reinderien Which line are you talking about? I find it useful to do that both for speed, and because it makes sense to execute an operation that is all mentally grouped together just in one go.

Question 12

I'm talking about this line:

all_info = re.sub('(.*?: *)','',open('/proc/meminfo').read()).replace('kB','').splitlines()+list(filter(None,re.findall(r'(\w*)',sp.Popen('ps', stdout=sp.PIPE, shell=True).stdout.read().decode())))

- what you're doing is called premature optimization. Grouping things onto one line might strip off a few nanoseconds from parse time, but since your script is only parsed once, doing this has absolutely no effect on execution time.

Question 13

But don't take my word for it - measure both and post the results :)

Question 14

@Reinderien Speaking of which, one of the times just finished: 1072.1309 seconds for generating ten hundred-thousand digit numbers with the old script. I'll see if it finishes for the new script soon enough; I might have to stop it for the night.

AJNeufeld AJNeufeld 35.3k5 gold badges41 silver badges103 bronze badges · Answer 1 · 2018-11-19 01:32:07Z

3

\$\begingroup\$

Couple of minor simplifications:

nums =[]
...
nums.append(random.sample(...))
nums = nums[0]

could be replaced with:

nums = random.sample(...)

The add/jugglenum loop calculation could be done with:

jugglenum = abs(sum(nums[::2]) - sum(nums[1::2]))

This code:

for i, x in enumerate(rand):
 rand[i] = ''.join(random.sample(str(x), len(str(x))))

could be replaced with:

rand = [ ''.join(random.sample(s, len(s)) for s in (str(x) for x in rand)]

which eliminates the enumeration and avoids turning x into a str(x) twice.

I don’t see strs being used anywhere, so the code that calculates it can be removed.

Share

edited Nov 19, 2018 at 23:42

answered Nov 19, 2018 at 1:32

AJNeufeld's user avatar

AJNeufeld AJNeufeld

35.3k5 gold badges41 silver badges103 bronze badges

\$\endgroup\$

4

\$\begingroup\$ Interestingly enough, your changes acctually made the program significantly slower (which was much unexpected to me considering how messy my code is). \$\endgroup\$

moltarze
– moltarze

2018年11月19日 11:49:18 +00:00
Commented Nov 19, 2018 at 11:49
1

\$\begingroup\$ Oops. Did you profile it to see which of the 3 changes made it slower? My money is on the iter+zip+sum change, which is good because I think I over complicated that cleanup. jugglenum = abs(sum(nums[::2])-sum(nums[1::2])) is 3 lines shorter, clearer, and probably faster. \$\endgroup\$

AJNeufeld
– AJNeufeld

2018年11月19日 15:36:58 +00:00
Commented Nov 19, 2018 at 15:36
1

\$\begingroup\$ OK, that definitely fixed the speed. Interestingly enough, however, your list comp is actually significantly slower than the enumeration method. I'm really scratching my head with this one - shouldn't list comps be much faster than enumerates? \$\endgroup\$

moltarze
– moltarze

2018年11月19日 22:05:06 +00:00
Commented Nov 19, 2018 at 22:05
1

\$\begingroup\$ Actually: disregard that. I was having problems with other parts of the code. You can actually run it if you want, it's much faster now. \$\endgroup\$

moltarze
– moltarze

2018年11月19日 22:17:24 +00:00
Commented Nov 19, 2018 at 22:17

Add a comment |

Reinderien Reinderien 71k5 gold badges76 silver badges256 bronze badges · Answer 2 · 2018-11-19 02:02:05Z

Some comments:

all_info = re.sub('(.*?: *)','',os.popen("cat /proc/meminfo").read()).replace('kB','').splitlines()+list(filter(None,re.findall(r'(\w*)',os.popen('ps').read())))

"can" is not "should". You "should" not do this. This should be probably upwards of four lines. Also, popen is deprecated in favour of subprocess. Even so, don't use cat. Simply open the file.

More generally, you should do some reading on entropy.

Reinderien – Reinderien · Answer 3 · 2018-11-20 01:02:10Z

So, after some great suggestions from other users here (thanks a lot everyone!) I was able to improve the program and reduce its size by about 10%. Here is the updated program:

import os, re, time, random, subprocess as sp
from functools import reduce
def rand_num(digits=3):
 rand=[];strs=[]
 all_info = re.sub('(.*?: *)','',open('/proc/meminfo').read()).replace('kB','').splitlines()+list(filter(None,re.findall(r'(\w*)',sp.Popen('ps', stdout=sp.PIPE, shell=True).stdout.read().decode())))
 all_info = list(x.strip() for x in all_info)
 nums = random.sample(range(digits*70), digits*35)
 nums.extend([int(x) for x in all_info if x.isdigit()])
 strs.extend([x for x in all_info if not x.isdigit()])
 jugglenum = abs(sum(nums[::2])-sum(nums[1::2]))
 for i in range(digits):
 rand.append(int(time.time()*(os.getpid()%jugglenum)))
 rand = [''.join(random.sample(str(x), len(str(x)))) for x in rand[:]]
 rand = [int(x)//10 for x in rand]
 largenum = str(reduce(lambda x, y: x*y, rand))[:digits]
 return ''.join(random.sample(random.sample(largenum,len(largenum)),len(largenum)))

I managed to get most of the large functions down to one-liners that run much faster and produce impressive results. I also managed to reduce the size of the function by about 10%, from over 1000 bytes to about 900 bytes.

I ran both scripts with timeit, testing how long it took to do ten runs producing numbers of a set length. Here's the results to produce ten 1, 10, 100, 1000, and 10000 digit numbers with both scripts:

Old script:
Ones: 0.4990974200045457
Tens: 0.6853595590000623
Hundreds: 1.0144964690043707
Thousands: 5.584901581998565
Ten-thousands: 40.79681804100255
----
New script:
Ones: 0.11410925599921029
Tens: 0.17456803799723275
Hundreds: 0.3204780189989833
Thousands: 2.4103602649993263
Ten-thousands: 30.67583657600335

Still: don't do ten things on one line. You're shooting yourself and every developer that touches the code after you in both feet.
@Reinderien Which line are you talking about? I find it useful to do that both for speed, and because it makes sense to execute an operation that is all mentally grouped together just in one go.
I'm talking about this line: all_info = re.sub('(.*?: *)','',open('/proc/meminfo').read()).replace('kB','').splitlines()+list(filter(None,re.findall(r'(\w*)',sp.Popen('ps', stdout=sp.PIPE, shell=True).stdout.read().decode()))) - what you're doing is called premature optimization. Grouping things onto one line might strip off a few nanoseconds from parse time, but since your script is only parsed once, doing this has absolutely no effect on execution time.
But don't take my word for it - measure both and post the results :)
@Reinderien Speaking of which, one of the times just finished: 1072.1309 seconds for generating ten hundred-thousand digit numbers with the old script. I'll see if it finishes for the new script soon enough; I might have to stop it for the night.

Stack Exchange Network

Pseudo-truly random number generator

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Pseudo-truly random number generator

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions