3
\$\begingroup\$

I have a dataframe with measurement data of different runs at same conditions. Each row contains both the constant conditions the experiment was conducted and all the results from the different runs.

Since I am not able to provide a real dataset, the code snippet provided below will generate some dummy data.

I was able to achieve the desired output, but my function transform_columns() seems to be unecessary complicated:

import pandas as pd
import numpy as np
np.random.seed(seed=1234)
df = pd.DataFrame(np.random.randint(0, 100, size=(100, 6)), columns=['constant 1', 'constant 2', 1, 2, 3, 4])
def transform_columns(data):
 factor_columns = []
 response_columns = []
 for col in data:
 if isinstance(col, int):
 response_columns.append(col)
 else:
 factor_columns.append(col)
 collected = []
 for index, row in data.iterrows():
 conditions = row.loc[factor_columns]
 data_values = row.loc[response_columns].dropna()
 for val in data_values:
 out = conditions.copy()
 out['value'] = val
 collected.append(out)
 df = pd.DataFrame(collected).reset_index(drop=True)
 return df
print(transform_columns(df))

Is there any Pythonic or Pandas way to do this nicely?

asked Jan 9, 2020 at 21:42
\$\endgroup\$
1
  • \$\begingroup\$ It looks like the docs discourage the use of np.random.seed(), do you know how to change it? the desired output Can you explain what that is? It's much better for everyone than having to reverse-engineer your code. \$\endgroup\$ Commented Jan 9, 2020 at 23:44

2 Answers 2

2
\$\begingroup\$

It is probably easier to work with the underlying Numpy array directly than through Pandas. Ensure that all factor columns comes before all data columns, then this code will work:

import pandas as pd
import numpy as np
np.random.seed(seed=1234)
n_rows = 100
n_cols = 6
n_factor_cols = 2
n_data_cols = n_cols - n_factor_cols
arr = np.random.randint(0, 100, size=(n_rows, n_cols))
factor_cols = arr[:,:n_factor_cols]
data_cols = [arr[:,i][:,np.newaxis] for i in range(n_factor_cols, n_cols)]
stacks = [np.hstack((factor_cols, data_col)) for data_col in data_cols]
output = np.concatenate(stacks)

The above code assumes that order is not important. If it is, then use the following instead of np.concatenate:

output = np.empty((n_rows * n_data_cols, n_factor_cols + 1),
 dtype = arr.dtype)
for i, stack in enumerate(stacks):
 output[i::n_data_cols] = stack

This is the best I can do, but I wouldn't be surprised if someone comes along and rewrites it as a Numpy one-liner. :)

answered Jan 10, 2020 at 11:45
\$\endgroup\$
2
\$\begingroup\$

pandas library has rich functionality and allows to build a complex pipelines as a chain of routine calls.
In your case the whole idea is achievable with the following single pipeline:

import pandas as pd
import numpy as np
np.random.seed(seed=1234)
df = pd.DataFrame(np.random.randint(0, 100, size=(100, 6)), 
 columns=['constant 1', 'constant 2', 1, 2, 3, 4])
def transform_columns(df):
 return df.set_index(df.filter(regex=r'\D').columns.tolist()) \
 .stack().reset_index(name='value') \
 .drop(columns='level_2', axis=1)
print(transform_columns(df))

Details:

  • df.filter(regex=r'\D').columns.tolist()
    df.filter returns a subset of columns enforced by specified regex pattern regex=r'\D' (ensure the column name contains non-digit chars)

  • df.set_index(...) - set the input dataframe index (row labels) using column names from previous step

  • .stack() - reshape the dataframe from columns to index, having a multi-level index

  • .reset_index(name='value')
    pandas.Series.reset_index resets/treats index as a column; name='value' points to a desired column name containing the crucial values

  • .drop(columns='level_2', axis=1) - drops supplementary label level_2 from columns (axis=1)

You may check/debug each step separately to watch how the intermediate series/dataframe looks like and how it's transformed.


Sample output:

 constant 1 constant 2 value
0 47 83 38
1 47 83 53
2 47 83 76
3 47 83 24
4 15 49 23
.. ... ... ...
395 16 16 80
396 16 92 46
397 16 92 77
398 16 92 68
399 16 92 83
[400 rows x 3 columns]
answered Jan 10, 2020 at 20:42
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.