I have the following data:
ticker company sector industryGroup industry subindustry currency 1999年07月31日 1999年10月31日 2000年01月31日 ...
CompA Health Health Health Health Health USD 12.3 23.33 33.1 ...
CompB Machine Machine Machine Machine Machine USD 32.1 34.44 23.1 ...
CompC Machine Machine Machine Machine Machine USD 32.1 34.44 23.1 ...
CompD Machine Machine Machine Machine Machine USD 32.1 34.44 23.1 ...
The above is just a sample of the data that is in excel file. The prices go till 02-01-2024 and there are more company row 541 companies to be exact. I wrote a code that takes the columns starting from the first date and puts them in a dataframe with the date column title. Next I took the second column prices and put them in a column with ticker symbol name. The output should be the same as below.
Date CompA CompB CompC CompD
1999年07月31日 12.3 32.1 32.1 32.1
1999年10月31日 23.33 34.44 34.44 34.44
2000年01月31日 33.1 34.44 34.44 34.44
This is my code:
import pandas as pd
bvmf = pd.read_excel('Market.xlsx')
df = pd.DataFrame()
df['Date'] = bvmf.iloc[0:1, 10:].columns
for i in range(len(bvmf)):
ticker = bvmf['ticker'][i]
df[ticker] = bvmf.iloc[i:i+1, 10:].values[0]
There are a lot of values. Is there a better way to implement this code?
1 Answer 1
As a rule of thumb, avoid looping in Pandas:
for i in range(len(bvmf)):
Note that "transferring columns into rows" is called transposing. Use
DataFrame.transpose
or its aliasDataFrame.T
to swap the rows and columns without looping.Manually selecting the date columns is error-prone:
.iloc[0:1, 10:]
Case in point, your original data apparently required
10:
, but the sample you posted here actually requires7:
. UseDataFrame.filter
to select the date columns instead of manually indexing them.
So instead of manually crafting a separate df
, just manipulate the original bvmf
dataframe and transpose it:
bvmf = (bvmf.set_index('ticker') # index will become column header after transpose
.filter(regex=r'\d{4}-\d{2}-\d{2}') # select only YYYY-MM-DD columns
.T # transpose
.rename_axis('date') # name the new index
)
Output (index named "date" and header named "ticker"):
ticker CompA CompB CompC CompD
date
1999年07月31日 12.30 32.10 32.10 32.10
1999年10月31日 23.33 34.44 34.44 34.44
2000年01月31日 33.10 23.10 23.10 23.10
Optionally:
- Use
DataFrame.reset_index
if you want the dates as a regular column like your original code/output. - Use
pd.to_datetime
to convert your dates into real datetime objects.