8
\$\begingroup\$

This takes an array of numbers then splits it into all possible combinations of the number array of size 4 then in another array puts the leftovers. As I want to take the difference in averages of the first column and the second.

import itertools
#defines the array of numbers and the two columns
number = [53, 64, 68, 71, 77, 82, 85]
col_one = []
col_two = []
#creates an array that holds the first four
results = itertools.combinations(number,4)
for x in results:
 col_one.append(list(x))
#attempts to go through and remove those numbers in the first array
#and then add that array to col_two
for i in range(len(col_one)):
 holder = list(number)
 for j in range(4):
 holder.remove(col_one[i][j])
 col_two.append(holder) 
col_one_average = []
col_two_average = []
for k in col_one:
 col_one_average.append(sum(k)/len(k))
for l in col_two:
 col_two_average.append(sum(l)/len(l))
dif = []
for i in range(len(col_one_average)):
 dif.append(col_one_average[i] - col_two_average[i])
print dif

So for example, if I have

a = [1,2,3]

and I want to split it into an array of size 2 and 1, I get

col_one[0] = [1,2]

and

col_two[0] = [3]

then

col_one[1] = [1,3]

and

col_two[1] = [2]

After I get all those I find the average of col_one[0] - average of col_two[0].

I hope that makes sense. I'm trying to do this for a statistics class, so if there is a 'numpy-y' solution, I'd love to hear it.

Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Mar 3, 2011 at 16:05
\$\endgroup\$
0

2 Answers 2

9
\$\begingroup\$
import itertools
import numpy
number = [53, 64, 68, 71, 77, 82, 85]
results = itertools.combinations(number,4)
# convert the combination iterator into a numpy array
col_one = numpy.array(list(results))
# calculate average of col_one
col_one_average = numpy.mean(col_one, axis = 1).astype(int)
# I don't actually create col_two, as I never figured out a good way to do it
# But since I only need the sum, I figure that out by subtraction
col_two_average = (numpy.sum(number) - numpy.sum(col_one, axis = 1)) / 3
dif = col_one_average - col_two_average
print dif
answered Mar 3, 2011 at 21:28
\$\endgroup\$
1
  • 2
    \$\begingroup\$ Using np.fromiter(combinations( is far faster than np.array(list(combinations(, (0.1 seconds vs 2 seconds, for instance) but it's also more complicated: numpy-discussion.10968.n7.nabble.com/… \$\endgroup\$ Commented Apr 14, 2013 at 17:05
5
\$\begingroup\$

Not using numpy or scipy, but there are several things that can be improved about your code:

  • This is minor, but in your comments you call your lists arrays, but it in python they're called lists
  • Variable names like col_one and col_two aren't very meaningful. Maybe you should call them combinations and rests or something like that.
  • You should definitely refactor your code into functions
  • You often use index-based loops where it is not necessary. Where possible you should iterate by element, not by index.
  • You're also often setting lists to the empty list and then appending to them in a loop. It is generally more pythonic and often faster to use list comprehensions for this.

If I were to write the code, I'd write something like this:

import itertools
def average(lst):
 """Returns the average of a list or other iterable"""
 return sum(lst)/len(lst)
def list_difference(lst1, lst2):
 """Returns the difference between two iterables, i.e. a list containing all
 elements of lst1 that are not in lst2"""
 result = list(lst1)
 for x in lst2:
 result.remove(x)
 return result
def differences(numbers, n):
 """Returns a list containing the difference between a combination and the remaining
 elements of the list for all combinations of size n of the given list"""
 # Build lists containing the combinations of size n and the rests
 combinations = list(itertools.combinations(numbers, n))
 rests = [list_difference(numbers, row) for row in col_one]
 # Create a lists of averages of the combinations and the rests
 combination_averages = [average(k) for k in combinations]
 rest_averages = [average(k) for k in rests]
 # Create a list containing the differences between the averages
 # using zip to iterate both lists in parallel
 diffs = [avg1 - avg2 for avg1, avg2 in zip(combination_averages, rest_averages)]
 return diffs
print differences([53, 64, 68, 71, 77, 82, 85], 4)
answered Mar 3, 2011 at 17:05
\$\endgroup\$
0

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.