0
\$\begingroup\$

In this code I calculate the statistical functions of values from different files and save them into CSV files (one filer for each statistical function).

def mean(data):
 pass
def standard_deviation(data):
 pass
def afd(data):
 pass
def norma_afd(data):
 pass
def asd(data):
 pass
def norma_asd(data):
 pass
def calculation(file):
 """
 Get the DataFrame and calculate for ever column the different statistical mean
 :param file: DataFrame 56x7681
 :return: 6 different 56x1 lists for (mean, std, afd, norm_afd, asd, norm_asd)
 """
 m, sd, afd_l, nafd, asd_l, nasd = ([] for _ in range(6))
 for column in file:
 data = file[column].to_numpy()
 m.append(mean(data))
 sd.append(standard_deviation(data))
 afd_l.append(afd(data))
 nafd.append(norma_afd(data))
 asd_l.append(asd(data))
 nasd.append(norma_asd(data))
 return m, sd, afd_l, nafd, asd_l, nasd
def run(load_path, save_path):
 """
 Get (yield) all the different DataFrame from a folder
 and calculate for each file the statistical mean and save it in a csv file
 :param load_path: the folder path to load all the different files
 :param save_path: the folder save path
 :return: none
 """
 m, sd, afd_l, nafd, asd_l, nasd = ([] for _ in range(6))
 for current_path, file in yield_data(load_path, data_type="data"):
 a, b, c, d, e, f = calculation(file)
 m.append(a)
 sd.append(b)
 afd_l.append(c)
 nafd.append(d)
 asd_l.append(e)
 nasd.append(f)
 if not os.path.exists(save_path):
 os.makedirs(save_path)
 pd.DataFrame(m).to_csv(save_path + os.path.sep + "mean.csv", index=False, header=False)
 pd.DataFrame(sd).to_csv(save_path + os.path.sep + "std.csv", index=False, header=False)
 pd.DataFrame(afd_l).to_csv(save_path + os.path.sep + "afd.csv", index=False, header=False)
 pd.DataFrame(nafd).to_csv(save_path + os.path.sep + "norm_afd.csv", index=False, header=False)
 pd.DataFrame(asd_l).to_csv(save_path + os.path.sep + "asd.csv", index=False, header=False)
 pd.DataFrame(nasd).to_csv(save_path + os.path.sep + "norm_asd.csv", index=False, header=False)

Is there a better and more efficient way to write this code? This may be a stupid question, but I would be really interested to know if there is a better way.

Toby Speight
87.3k14 gold badges104 silver badges322 bronze badges
asked Feb 11, 2020 at 13:47
\$\endgroup\$
1
  • \$\begingroup\$ Are you missing some import statements? Please edit to complete the program. \$\endgroup\$ Commented Feb 11, 2020 at 15:46

1 Answer 1

1
\$\begingroup\$

As the code cannot be run, here is a wild guess on a way to restructure it:

def mean(fileResults, data):
 result = 0
 # do computation
 fileResults["mean"].append(result)
def standard_deviation(fileResults, data):
 pass
def afd(fileResults, data):
 pass
def norma_afd(fileResults, data):
 pass
def asd(fileResults, data):
 pass
def norma_asd(fileResults, data):
 pass
def calculation(file):
 """
 Get the DataFrame and calculate for ever column the different statistical mean
 :param file: DataFrame 56x7681
 :return: 6 different 56x1 lists for (mean, std, afd, norm_afd, asd, norm_asd)
 """
 fileResults = {
 "mean": [],
 "std": [],
 "afd": [],
 "norm_afd": [],
 "asd": [],
 "norm_asd": [],
 }
 functionCallList = [
 mean,
 standard_deviation,
 afd,
 norma_afd,
 asd,
 norma_asd,
 ]
 for column in file:
 data = file[column].to_numpy()
 for functionCall in functionCallList:
 functionCall(fileResults, data)
 return fileResults
def run(load_path, save_path):
 """
 Get (yield) all the different DataFrame from a folder
 and calculate for each file the statistical mean and save it in a csv file
 :param load_path: the folder path to load all the different files
 :param save_path: the folder save path
 :return: none
 """
 results = {}
 for current_path, file in yield_data(load_path, data_type="data"):
 fileResults = calculation(file)
 for key, value in fileResults.items():
 if key not in results:
 results[key] = []
 results[key].append(value)
 if not os.path.exists(save_path):
 os.makedirs(save_path)
 for key, value in results.items():
 pd.DataFrame(value).to_csv(save_path + os.path.sep + key + ".csv", index=False, header=False)

So, instead of repeating function call, mostly when function prototypes are the same, you can use a list of function callbacks. Then just iterate it.

You can also use a dictionnary to store your data, instead of n lists. It's a little bit more scalable, and clearer than returning a 6-tuple. It also avoids a lot of copy-paste when you save csv files.

answered Feb 11, 2020 at 17:16
\$\endgroup\$
3
  • \$\begingroup\$ Agree, list of function is way to go. \$\endgroup\$ Commented Feb 11, 2020 at 18:01
  • 1
    \$\begingroup\$ @VincentRG To work correct for my purpose I initial the dictionary fileResults in calculation with {"mean": [], "std": [], "afd": [], "norm_afd": [], "asd": [], "norm_asd": []} and the calculation methods append to the lists. But many thanks for your solution. \$\endgroup\$ Commented Feb 12, 2020 at 9:06
  • \$\begingroup\$ Yes you're right, several columns per file are parsed, and for each column you do all computations. Hence the need for appending to each list for each column. I haven't seen that. \$\endgroup\$ Commented Feb 12, 2020 at 9:26

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.