To reap the benefits of numpy
, you need to use numpy
arrays and functions throughout.
The simplest fix is using numpy.diff
:
import numpy as np
for _ in range(10000):
input = 100 * np.random.random(size=190)
# size = len(input) - 1
# differences = [(input[i + 1] - input[i]) for i in np.arange(0, size)]
differences = np.diff(input)
good_sigma = np.std(differences[20:60])
upperbound = 3 * good_sigma
lowerbound = -3 * good_sigma
discard = np.where((lowerbound < differences) & (differences < upperbound))
average_discard = np.average(input[np.min(discard):])
Swapping the two difference
calls result in 1.8s for your list version vs 0.6s using numpy.diff
(on my machine). Note that this includes the generation of ten-thousand random input vectors.
Apart from that, here are a few more tips:
Avoid shadowing built-in functions/variables.
input
is already an existing function that takes user input.While the
0 + 3 * sigma
helps to realize that the mean is at zero, a simple comment should be enough for that and save a few unneeded cycles.In general, comments should explain why you are doung something the way you are doing it, instead of what you are doing. The latter should be clear from the code itself (which is not really the case with your code). This is especially important for example for the
np.std
call. While you explain that using standard deviation is a good way to measure spread (duh), you don't explain why you only take it for `differences[20:60]. If it is not possible to understand what the code does from looking at it, encapsulate the hard-to-understand code in a properly named function with a descriptive docstring.Try to avoid creating copies of arrays by slicing.
Some small improvements can be made by getting the right index of where to start discarding (this assumes that later you also discard all values after the first to be discarded value). For this I used the slightly faster
numpy.argmax
, as described [here][1]. It could be even faster if your array of floats was sorted. It uses max, becauseTrue == 1
and it returns the first maximum that appears.The algorithm as written seems to start discarding whenever the difference is smaller than three sigma, whereas normally you would probably want to start discarding as soon as the difference is larger than three sigma. For this you would have to turn around your logical operators (not done in code below).
To reap the benefits of numpy
, you need to use numpy
arrays and functions throughout.
The simplest fix is using numpy.diff
:
import numpy as np
for _ in range(10000):
input = 100 * np.random.random(size=190)
# size = len(input) - 1
# differences = [(input[i + 1] - input[i]) for i in np.arange(0, size)]
differences = np.diff(input)
good_sigma = np.std(differences[20:60])
upperbound = 3 * good_sigma
lowerbound = -3 * good_sigma
discard = np.where((lowerbound < differences) & (differences < upperbound))
average_discard = np.average(input[np.min(discard):])
Swapping the two difference
calls result in 1.8s for your list version vs 0.6s using numpy.diff
(on my machine). Note that this includes the generation of ten-thousand random input vectors.
Apart from that, here are a few more tips:
Avoid shadowing built-in functions/variables.
input
is already an existing function that takes user input.While the
0 + 3 * sigma
helps to realize that the mean is at zero, a simple comment should be enough for that and save a few unneeded cycles.In general, comments should explain why you are doung something the way you are doing it, instead of what you are doing. The latter should be clear from the code itself (which is not really the case with your code). This is especially important for example for the
np.std
call. While you explain that using standard deviation is a good way to measure spread (duh), you don't explain why you only take it for `differences[20:60]. If it is not possible to understand what the code does from looking at it, encapsulate the hard-to-understand code in a properly named function with a descriptive docstring.Try to avoid creating copies of arrays by slicing.
Some small improvements can be made by getting the right index of where to start discarding (this assumes that later you also discard all values after the first to be discarded value). For this I used the slightly faster
numpy.argmax
, as described [here][1]. It could be even faster if your array of floats was sorted. It uses max, becauseTrue == 1
and it returns the first maximum that appears.The algorithm as written seems to start discarding whenever the difference is smaller than three sigma, whereas normally you would probably want to start discarding as soon as the difference is larger than three sigma. For this you would have to turn around your logical operators (not done in code below).
To reap the benefits of numpy
, you need to use numpy
arrays and functions throughout.
The simplest fix is using numpy.diff
:
import numpy as np
for _ in range(10000):
input = 100 * np.random.random(size=190)
# size = len(input) - 1
# differences = [(input[i + 1] - input[i]) for i in np.arange(0, size)]
differences = np.diff(input)
good_sigma = np.std(differences[20:60])
upperbound = 3 * good_sigma
lowerbound = -3 * good_sigma
discard = np.where((lowerbound < differences) & (differences < upperbound))
average_discard = np.average(input[np.min(discard):])
Swapping the two difference
calls result in 1.8s for your list version vs 0.6s using numpy.diff
(on my machine). Note that this includes the generation of ten-thousand random input vectors.
Apart from that, here are a few more tips:
Avoid shadowing built-in functions/variables.
input
is already an existing function that takes user input.While the
0 + 3 * sigma
helps to realize that the mean is at zero, a simple comment should be enough for that and save a few unneeded cycles.In general, comments should explain why you are doung something the way you are doing it, instead of what you are doing. The latter should be clear from the code itself (which is not really the case with your code). This is especially important for example for the
np.std
call. While you explain that using standard deviation is a good way to measure spread (duh), you don't explain why you only take it for `differences[20:60]. If it is not possible to understand what the code does from looking at it, encapsulate the hard-to-understand code in a properly named function with a descriptive docstring.Try to avoid creating copies of arrays by slicing.
Some small improvements can be made by getting the right index of where to start discarding (this assumes that later you also discard all values after the first to be discarded value). For this I used the slightly faster
numpy.argmax
, as described [here][1]. It could be even faster if your array of floats was sorted. It uses max, becauseTrue == 1
and it returns the first maximum that appears.The algorithm as written seems to start discarding whenever the difference is smaller than three sigma, whereas normally you would probably want to start discarding as soon as the difference is larger than three sigma. For this you would have to turn around your logical operators (not done in code below).
To reap the benefits of numpy
, you need to use numpy
arrays and functions throughout.
The simplest fix is using numpy.diff
:
import numpy as np
for _ in range(10000):
input = 100 * np.random.random(size=190)
# size = len(input) - 1
# differences = [(input[i + 1] - input[i]) for i in np.arange(0, size)]
differences = np.diff(input)
good_sigma = np.std(differences[20:60])
upperbound = 3 * good_sigma
lowerbound = -3 * good_sigma
discard = np.where((lowerbound < differences) & (differences < upperbound))
average_discard = np.average(input[np.min(discard):])
Swapping the two difference
calls result in 1.8s for your list version vs 0.6s using numpy.diff
(on my machine). Note that this includes the generation of ten-thousand random input vectors.
Apart from that, here are a few more tips:
Avoid shadowing built-in functions/variables.
input
is already an existing function that takes user input.While the
0 + 3 * sigma
helps to realize that the mean is at zero, a simple comment should be enough for that and save a few unneeded cycles.In general, comments should explain why you are doung something the way you are doing it, instead of what you are doing. The latter should be clear from the code itself (which is not really the case with your code). This is especially important for example for the
np.std
call. While you explain that using standard deviation is a good way to measure spread (duh), you don't explain why you only take it for `differences[20:60]. If it is not possible to understand what the code does from looking at it, encapsulate the hard-to-understand code in a properly named function with a descriptive docstring.Try to avoid creating copies of arrays by slicing.
Some small improvements can be made by getting the right index of where to start discarding (this assumes that later you also discard all values after the first to be discarded value). For this I used the slightly faster
numpy.argmax
, as described [here][1]. It could be even faster if your array of floats was sorted. It uses max, becauseTrue == 1
and it returns the first maximum that appears.The algorithm as written seems to start discarding whenever the difference is smaller than three sigma, whereas normally you would probably want to start discarding as soon as the difference is larger than three sigma. For this you would have to turn around your logical operators (not done in code below).
To reap the benefits of numpy
, you need to use numpy
arrays and functions throughout.
The simplest fix is using numpy.diff
:
import numpy as np
for _ in range(10000):
input = 100 * np.random.random(size=190)
# size = len(input) - 1
# differences = [(input[i + 1] - input[i]) for i in np.arange(0, size)]
differences = np.diff(input)
good_sigma = np.std(differences[20:60])
upperbound = 3 * good_sigma
lowerbound = -3 * good_sigma
discard = np.where((lowerbound < differences) & (differences < upperbound))
average_discard = np.average(input[np.min(discard):])
Swapping the two difference
calls result in 1.8s for your list version vs 0.6s using numpy.diff
(on my machine). Note that this includes the generation of ten-thousand random input vectors.
Apart from that, here are a few more tips:
Avoid shadowing built-in functions/variables.
input
is already an existing function that takes user input.While the
0 + 3 * sigma
helps to realize that the mean is at zero, a simple comment should be enough for that and save a few unneeded cycles.In general, comments should explain why you are doung something the way you are doing it, instead of what you are doing. The latter should be clear from the code itself (which is not really the case with your code). This is especially important for example for the
np.std
call. While you explain that using standard deviation is a good way to measure spread (duh), you don't explain why you only take it for `differences[20:60]. If it is not possible to understand what the code does from looking at it, encapsulate the hard-to-understand code in a properly named function with a descriptive docstring.Try to avoid creating copies of arrays by slicing.
Some small improvements can be made by getting the right index of where to start discarding (this assumes that later you also discard all values after the first to be discarded value). For this I used the slightly faster
numpy.argmax
, as described [here][1]. It could be even faster if your array of floats was sorted. It uses max, becauseTrue == 1
and it returns the first maximum that appears.The algorithm as written seems to start discarding whenever the difference is smaller than three sigma, whereas normally you would probably want to start discarding as soon as the difference is larger than three sigma. For this you would have to turn around your logical operators (not done in code below).
To reap the benefits of numpy
, you need to use numpy
arrays and functions throughout.
The simplest fix is using numpy.diff
:
import numpy as np
for _ in range(10000):
input = 100 * np.random.random(size=190)
# size = len(input) - 1
# differences = [(input[i + 1] - input[i]) for i in np.arange(0, size)]
differences = np.diff(input)
good_sigma = np.std(differences[20:60])
upperbound = 3 * good_sigma
lowerbound = -3 * good_sigma
discard = np.where((lowerbound < differences) & (differences < upperbound))
average_discard = np.average(input[np.min(discard):])
Swapping the two difference
calls result in 1.8s for your list version vs 0.6s using numpy.diff
(on my machine). Note that this includes the generation of ten-thousand random input vectors.
Apart from that, here are a few more tips:
Avoid shadowing built-in functions/variables.
input
is already an existing function that takes user input.While the
0 + 3 * sigma
helps to realize that the mean is at zero, a simple comment should be enough for that and save a few unneeded cycles.In general, comments should explain why you are doung something the way you are doing it, instead of what you are doing. The latter should be clear from the code itself (which is not really the case with your code). This is especially important for example for the
np.std
call. While you explain that using standard deviation is a good way to measure spread (duh), you don't explain why you only take it for `differences[20:60]. If it is not possible to understand what the code does from looking at it, encapsulate the hard-to-understand code in a properly named function with a descriptive docstring.Try to avoid creating copies of arrays by slicing.
Some small improvements can be made by getting the right index of where to start discarding (this assumes that later you also discard all values after the first to be discarded value). For this I used the slightly faster
numpy.argmax
, as described [here][1]. It could be even faster if your array of floats was sorted. It uses max, becauseTrue == 1
and it returns the first maximum that appears.The algorithm as written seems to start discarding whenever the difference is smaller than three sigma, whereas normally you would probably want to start discarding as soon as the difference is larger than three sigma. For this you would have to turn around your logical operators (not done in code below).