3
\$\begingroup\$

I have the following data:

ticker company sector industryGroup industry subindustry currency 1999年07月31日 1999年10月31日 2000年01月31日 ...
CompA Health Health Health Health Health USD 12.3 23.33 33.1 ...
CompB Machine Machine Machine Machine Machine USD 32.1 34.44 23.1 ... 
CompC Machine Machine Machine Machine Machine USD 32.1 34.44 23.1 ...
CompD Machine Machine Machine Machine Machine USD 32.1 34.44 23.1 ...

The above is just a sample of the data that is in excel file. The prices go till 02-01-2024 and there are more company row 541 companies to be exact. I wrote a code that takes the columns starting from the first date and puts them in a dataframe with the date column title. Next I took the second column prices and put them in a column with ticker symbol name. The output should be the same as below.

Date CompA CompB CompC CompD
1999年07月31日 12.3 32.1 32.1 32.1
1999年10月31日 23.33 34.44 34.44 34.44
2000年01月31日 33.1 34.44 34.44 34.44

This is my code:

import pandas as pd
bvmf = pd.read_excel('Market.xlsx')
df = pd.DataFrame()
df['Date'] = bvmf.iloc[0:1, 10:].columns
for i in range(len(bvmf)):
 ticker = bvmf['ticker'][i]
 df[ticker] = bvmf.iloc[i:i+1, 10:].values[0]

There are a lot of values. Is there a better way to implement this code?

BCdotWEB
11.4k2 gold badges28 silver badges45 bronze badges
asked May 30, 2024 at 14:13
\$\endgroup\$

1 Answer 1

9
\$\begingroup\$
  1. As a rule of thumb, avoid looping in Pandas:

    for i in range(len(bvmf)):
    

    Note that "transferring columns into rows" is called transposing. Use DataFrame.transpose or its alias DataFrame.T to swap the rows and columns without looping.

  2. Manually selecting the date columns is error-prone:

    .iloc[0:1, 10:]
    

    Case in point, your original data apparently required 10:, but the sample you posted here actually requires 7:. Use DataFrame.filter to select the date columns instead of manually indexing them.


So instead of manually crafting a separate df, just manipulate the original bvmf dataframe and transpose it:

bvmf = (bvmf.set_index('ticker') # index will become column header after transpose
 .filter(regex=r'\d{4}-\d{2}-\d{2}') # select only YYYY-MM-DD columns
 .T # transpose
 .rename_axis('date') # name the new index
)

Output (index named "date" and header named "ticker"):

ticker CompA CompB CompC CompD
date
1999年07月31日 12.30 32.10 32.10 32.10
1999年10月31日 23.33 34.44 34.44 34.44
2000年01月31日 33.10 23.10 23.10 23.10

Optionally:

answered May 30, 2024 at 14:46
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.