The situation is this: I have a bunch of servers which are synced so that they all have the same data. Unfortunately, a disaster happened and the servers are all out of sync. My task is to re-sync all of the servers so that they all have the same sets of data. That is, I need to ensure that each server has a copy of every data set.
The first line of input denotes the number of servers. Following that will be one line of input for each server with a space-separated list of data set ID's. i.e, x y z. Data set id's are positive integers.
I must output a list of instructions to most optimally re-sync the servers in the format of: < data set ID>< FROM>< TO>
#input = {
#1:[1, 3, 4],
#2:[1, 2, 3],
#3:[1, 3],
#4:[1, 4, 2]}
numCenters = int(raw_input('Number of data centers: '))
input = {}
print("Input data set information as: x y z")
# Grab dataset ID information from stdin
for x in range(1, numCenters+1):
dataSet = raw_input("Data set %s: " % (x))
input[x] = sorted(map(int, dataSet.split()))
#Map datasets (the numbers / dataset ID) to data centers (the individual lists) that they belong to
#New dictionary for the map
alldatasets = {}
for k,v in input.iteritems():
for dataset in v: #
if dataset not in alldatasets:
alldatasets[dataset] = [] # Make a dictionary with the key as the dataset ID,
alldatasets[dataset].append(k) # and the value as a list of which datacenters have that value.
allsets = list(alldatasets.keys())
print("One Possible Correct Output:\n")
for Id,datacenter in input.iteritems():
for sets in allsets: #go through every datacenter, and compare the datasets it has to a list of all datasets.
if sets not in datacenter:
print("%s %s %s" % (sets, alldatasets[sets][0], Id))
print("done")
Input:
4 1 3 4 1 2 3 1 3 1 4 2
Output:
One possible correct output:
2 2 1 4 1 2 2 2 3 4 1 3 3 1 4 done
I am looking to improve the code that I have written. Either optimizing its run time, making it look better, or any other comments are welcome. I am moreso looking for optimization feedback, though.
2 Answers 2
Your code is full of variables named somethingSet — but they aren't actually sets! Why not?
This solution, which takes advantage of Python's built-in set
operations, is shorter. Just being able to write have_nots = all_centers - haves
is worth it.
from collections import defaultdict
num_centers = int(raw_input('Number of data centers: '))
print("Input data set information as: x y z")
all_centers = set(xrange(1, num_centers + 1))
centers_with_data = defaultdict(set)
# Grab dataset ID information from stdin
for center in range(1, num_centers + 1):
for data_set in map(int, raw_input("Data set %s: " % (center)).split()):
centers_with_data[data_set].add(center)
print "One possible solution:\n"
for data_set, haves in centers_with_data.iteritems():
have_nots = all_centers - haves
donor = next(iter(haves)) # Pick an arbitrary source
for acceptor in have_nots:
print "%d %d %d" % (data_set, donor, acceptor)
-
\$\begingroup\$ I noticed that the output for input (1,3,5,4,7),(1,3),(2) for 3 servers is different than the output in my original program. Why? There are the same amount of steps so why is it different? \$\endgroup\$Ozera– Ozera2014年12月13日 04:54:41 +00:00Commented Dec 13, 2014 at 4:54
-
\$\begingroup\$ As you say, the solution is not unique. Python doesn't guarantee any order when iterating through hashes and sets. \$\endgroup\$200_success– 200_success2014年12月13日 06:46:50 +00:00Commented Dec 13, 2014 at 6:46
I'm not a Python expert but I have similar issues in other technologies. You can try to improve your iterations with some high order functions like map/filter/reduce etc.
For example, here:
for Id,datacenter in input.iteritems():
for sets in allsets: #go through every datacenter, and compare the datasets it has to a list of all datasets.
if sets not in datacenter:
print("%s %s %s" % (sets, alldatasets[sets][0], Id))
you can replace the for
with if
with a filter.