I have written Python code which uses multiple if
conditions and a for
loop. The main objective of the code is to produce a traffic light system based on certain conditions.
Red = -1
Yellow = 0
Green = 1
It takes 4 months (m0
, m1
, m2
, m3
) and dataframe
as an input, runs the condition through each row, and returns -1, 0 or 1.
The code compares Month 1 with Month 0, Month 2 with Month 1 and Month 3 with Month 2.
For input:
if month+1 < Month for any value, then red else green.
For example, if the revenue of July 2020 is less than June 2020 then the input is red, otherwise green. Based on three comparisons the outcome is calculated. The outcome could be either 1-, 0, or 1.
The code I have written works well but is not optimised in any way. Is there a better way to do this?
This would be an O(n) operation but, at least there should be a way to write it concisely in Python. Or if code can be improved operationally as well.
def getTrafficLightData(df, dimension, m1, m2, m3, m4):
'''
Inputs -
Dataframe
dimension = on which we want to calculate traffic light system
m1, m2, m3, m4 - Could be any for months but we have taken consecutive months for Traffic Light System.
Example Call - getTrafficLightData(report6_TLS_data, "Revenue_","2020-6","2020-7","2020-8","2020-9")
'''
TFS_df = pd.DataFrame(columns=[dimension + "_TLS"])
if dimension == "Overstrike_":
suffix = "%"
for i in range(len(df)):
if (
(
df[dimension + m1 + suffix].iloc[i]
> df[dimension + m2 + suffix].iloc[i]
)
and (
df[dimension + m2 + suffix].iloc[i]
> df[dimension + m3 + suffix].iloc[i]
)
and (
df[dimension + m3 + suffix].iloc[i]
> df[dimension + m4 + suffix].iloc[i]
)
):
TFS_df.loc[i] = [1]
elif (
(
df[dimension + m1 + suffix].iloc[i]
< df[dimension + m2 + suffix].iloc[i]
)
and (
df[dimension + m2 + suffix].iloc[i]
< df[dimension + m3 + suffix].iloc[i]
)
and (
df[dimension + m3 + suffix].iloc[i]
< df[dimension + m4 + suffix].iloc[i]
)
):
TFS_df.loc[i] = [-1]
elif (
(
df[dimension + m1 + suffix].iloc[i]
< df[dimension + m2 + suffix].iloc[i]
)
and (
df[dimension + m2 + suffix].iloc[i]
> df[dimension + m3 + suffix].iloc[i]
)
and (
df[dimension + m3 + suffix].iloc[i]
< df[dimension + m4 + suffix].iloc[i]
)
):
TFS_df.loc[i] = [-1]
elif (
(
df[dimension + m2 + suffix].iloc[i]
> df[dimension + m2 + suffix].iloc[i]
)
and (
df[dimension + m2 + suffix].iloc[i]
< df[dimension + m3 + suffix].iloc[i]
)
and (
df[dimension + m3 + suffix].iloc[i]
< df[dimension + m4 + suffix].iloc[i]
)
):
TFS_df.loc[i] = [-1]
elif (
(
df[dimension + m1 + suffix].iloc[i]
> df[dimension + m2 + suffix].iloc[i]
)
and (
df[dimension + m2 + suffix].iloc[i]
> df[dimension + m3 + suffix].iloc[i]
)
and (
df[dimension + m3 + suffix].iloc[i]
< df[dimension + m4 + suffix].iloc[i]
)
):
TFS_df.loc[i] = [0]
elif (
(
df[dimension + m1 + suffix].iloc[i]
> df[dimension + m2 + suffix].iloc[i]
)
and (
df[dimension + m2 + suffix].iloc[i]
< df[dimension + m3 + suffix].iloc[i]
)
and (
df[dimension + m3 + suffix].iloc[i]
> df[dimension + m4 + suffix].iloc[i]
)
):
TFS_df.loc[i] = [1]
elif (
(
df[dimension + m1 + suffix].iloc[i]
< df[dimension + m2 + suffix].iloc[i]
)
and (
df[dimension + m2 + suffix].iloc[i]
> df[dimension + m3 + suffix].iloc[i]
)
and (
df[dimension + m3 + suffix].iloc[i]
> df[dimension + m4 + suffix].iloc[i]
)
):
TFS_df.loc[i] = [1]
elif (
(
df[dimension + m1 + suffix].iloc[i]
< df[dimension + m2 + suffix].iloc[i]
)
and ( #
df[dimension + m2 + suffix].iloc[i]
< df[dimension + m3 + suffix].iloc[i]
)
and ( #
df[dimension + m3 + suffix].iloc[i]
> df[dimension + m4 + suffix].iloc[i]
)
):
TFS_df.loc[i] = [0]
else:
TFS_df.loc[i] = [0]
return TFS_df
else:
if dimension == "Margin_":
suffix = "%"
else:
suffix = ""
for i in range(len(df)):
if (
(
df[dimension + m1 + suffix].iloc[i]
> df[dimension + m2 + suffix].iloc[i]
)
and (
df[dimension + m2 + suffix].iloc[i]
> df[dimension + m3 + suffix].iloc[i]
)
and (
df[dimension + m3 + suffix].iloc[i]
> df[dimension + m4 + suffix].iloc[i]
)
):
TFS_df.loc[i] = [-1]
elif (
(
df[dimension + m1 + suffix].iloc[i]
< df[dimension + m2 + suffix].iloc[i]
)
and (
df[dimension + m2 + suffix].iloc[i]
< df[dimension + m3 + suffix].iloc[i]
)
and (
df[dimension + m3 + suffix].iloc[i]
< df[dimension + m4 + suffix].iloc[i]
)
):
TFS_df.loc[i] = [1]
elif (
(
df[dimension + m1 + suffix].iloc[i]
< df[dimension + m2 + suffix].iloc[i]
)
and (
df[dimension + m2 + suffix].iloc[i]
> df[dimension + m3 + suffix].iloc[i]
)
and (
df[dimension + m3 + suffix].iloc[i]
< df[dimension + m4 + suffix].iloc[i]
)
):
TFS_df.loc[i] = [1]
elif (
(
df[dimension + m1 + suffix].iloc[i]
> df[dimension + m2 + suffix].iloc[i]
)
and (
df[dimension + m2 + suffix].iloc[i]
< df[dimension + m3 + suffix].iloc[i]
)
and (
df[dimension + m3 + suffix].iloc[i]
< df[dimension + m4 + suffix].iloc[i]
)
):
TFS_df.loc[i] = [1]
elif (
(
df[dimension + m1 + suffix].iloc[i]
> df[dimension + m2 + suffix].iloc[i]
)
and (
df[dimension + m2 + suffix].iloc[i]
> df[dimension + m3 + suffix].iloc[i]
)
and (
df[dimension + m3 + suffix].iloc[i]
< df[dimension + m4 + suffix].iloc[i]
)
):
TFS_df.loc[i] = [0]
elif (
(
df[dimension + m1 + suffix].iloc[i]
> df[dimension + m2 + suffix].iloc[i]
)
and (
df[dimension + m2 + suffix].iloc[i]
< df[dimension + m3 + suffix].iloc[i]
)
and (
df[dimension + m3 + suffix].iloc[i]
> df[dimension + m4 + suffix].iloc[i]
)
):
TFS_df.loc[i] = [-1]
elif (
(
df[dimension + m1 + suffix].iloc[i]
< df[dimension + m2 + suffix].iloc[i]
)
and (
df[dimension + m2 + suffix].iloc[i]
> df[dimension + m3 + suffix].iloc[i]
)
and (
df[dimension + m3 + suffix].iloc[i]
> df[dimension + m4 + suffix].iloc[i]
)
):
TFS_df.loc[i] = [-1]
elif (
(
df[dimension + m1 + suffix].iloc[i]
< df[dimension + m2 + suffix].iloc[i]
)
and (
df[dimension + m2 + suffix].iloc[i]
< df[dimension + m3 + suffix].iloc[i]
)
and (
df[dimension + m3 + suffix].iloc[i]
> df[dimension + m4 + suffix].iloc[i]
)
):
TFS_df.loc[i] = [0]
else:
TFS_df.loc[i] = [0]
return TFS_df
The function is called in below way -
report6_TLS_data['Revenue_TLS']=getTrafficLightData(report6_TLS_data, "Revenue_","2020-6","2020-7","2020-8","2020-9")
report6_TLS_data["Margin_TLS"]=getTrafficLightData(report6_TLS_data, "Margin_","2020-6","2020-7","2020-8","2020-9")
report6_TLS_data["Overstrike_TLS"]=getTrafficLightData(report6_TLS_data, "Overstrike_","2020-6","2020-7","2020-8","2020-9")
Any pointers would be helpful.
The input data is of form -
ym PART NUMBER BranchCode Revenue_2019-1 Revenue_2019-10 Revenue_2019-11 Revenue_2019-12 Revenue_2019-2 Revenue_2019-3 Revenue_2019-4 Revenue_2019-5 Revenue_2019-6 Revenue_2019-7 Revenue_2019-8 Revenue_2019-9 Revenue_2020-1 Revenue_2020-2 Revenue_2020-3 Revenue_2020-4 Revenue_2020-5 Revenue_2020-6 Revenue_2020-7 Revenue_2020-8 Revenue_2020-9 Margin_2019-1 Margin_2019-10 Margin_2019-11 Margin_2019-12 Margin_2019-2 Margin_2019-3 Margin_2019-4 Margin_2019-5 Margin_2019-6 Margin_2019-7 Margin_2019-8 Margin_2019-9 Margin_2020-1 Margin_2020-2 Margin_2020-3 Margin_2020-4 Margin_2020-5 Margin_2020-6 Margin_2020-7 Margin_2020-8 Margin_2020-9 Overstrike_2019-1 Overstrike_2019-10 Overstrike_2019-11 Overstrike_2019-12 Overstrike_2019-2 Overstrike_2019-3 Overstrike_2019-4 Overstrike_2019-5 Overstrike_2019-6 Overstrike_2019-7 Overstrike_2019-8 Overstrike_2019-9 Overstrike_2020-1 Overstrike_2020-2 Overstrike_2020-3 Overstrike_2020-4 Overstrike_2020-5 Overstrike_2020-6 Overstrike_2020-7 Overstrike_2020-8 Overstrike_2020-9 Transactions_2019-1 Transactions_2019-10 Transactions_2019-11 Transactions_2019-12 Transactions_2019-2 Transactions_2019-3 Transactions_2019-4 Transactions_2019-5 Transactions_2019-6 Transactions_2019-7 Transactions_2019-8 Transactions_2019-9 Transactions_2020-1 Transactions_2020-2 Transactions_2020-3 Transactions_2020-4 Transactions_2020-5 Transactions_2020-6 Transactions_2020-7 Transactions_2020-8 Transactions_2020-9 Margin_2019-1% Margin_2019-10% Margin_2019-11% Margin_2019-12% Margin_2019-2% Margin_2019-3% Margin_2019-4% Margin_2019-5% Margin_2019-6% Margin_2019-7% Margin_2019-8% Margin_2019-9% Margin_2020-1% Margin_2020-2% Margin_2020-3% Margin_2020-4% Margin_2020-5% Margin_2020-6% Margin_2020-7% Margin_2020-8% Margin_2020-9% Overstrike_2019-1% Overstrike_2019-10% Overstrike_2019-11% Overstrike_2019-12% Overstrike_2019-2% Overstrike_2019-3% Overstrike_2019-4% Overstrike_2019-5% Overstrike_2019-6% Overstrike_2019-7% Overstrike_2019-8% Overstrike_2019-9% Overstrike_2020-1% Overstrike_2020-2% Overstrike_2020-3% Overstrike_2020-4% Overstrike_2020-5% Overstrike_2020-6% Overstrike_2020-7% Overstrike_2020-8% Overstrike_2020-9%
0 BAGG001 BC 71.75 90.00 20.25 43.50 42.50 30.00 70.00 44.25 45.00 46.75 129.50 58.00 81.00 36.00 33.25 0.75 15.00 24.75 0.00 0.00 2.50 32.97 39.15 8.95 14.31 18.95 7.86 30.68 19.27 19.74 18.12 59.38 22.30 34.95 17.59 14.10 0.32 6.35 5.30 0.00 0.00 1.06 0.00 0.00 0.00 1.00 0.00 1.00 0.00 0.00 3.00 3.00 1.00 1.00 2.00 0.00 0.00 0.00 0.00 2.00 0.00 0.00 0.00 8 16 5 9 5 6 12 7 10 7 13 10 13 5 11 1 2 4 0 0 1 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.00 0.00 1.00
1 BAGG001 PK 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 25.50 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 9.90 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
1 Answer 1
It's important to understand that Pandas is a wrapper around Numpy, and both are geared toward vectorization. Avoid loops and repetition as much as possible, and this method can really be boiled down. It's worth saying that
if month+1 < Month for any value, then red else green.
is absolutely not what's going on. For now I've made a proposed solution that deals with a fixed number of months using a lookup table. You can adapt it as necessary. It's based on the table that you showed, interpreting comparisons as bits in a field.
Otherwise: don't bake an underscore into your dimension
string; don't produce a double-underscore in the output dataframe; add type hints; and accept the months as a variadic argument or tuple.
import numpy as np
import pandas as pd
lookup = np.array((-1, 0, -1, 1, -1, 1, 0, 1))
to_binary = np.array((4, 2, 1)).T
def get_traffic_light_data_new(
df: pd.DataFrame,
dimension: str,
*months: str,
) -> np.array:
assert len(months) == 4
by_month = df[[f'{dimension}_{month}' for month in months]].to_numpy()
signs = by_month[:, 1:] - by_month[:, :-1] > 0
index = np.matmul(signs, to_binary)
return lookup[index]
def test():
scenarios = pd.DataFrame(
[
[3, 2, 1, 0],
[1, 2, 3, 4],
[1, 2, 1.5, 1.7],
[2, 1, 3, 4],
[3, 2, 1, 4],
[3, 2, 2.5, 2.4],
[5, 6, 1, 0],
[3, 4, 5, 4],
],
columns=(
'Revenue_2020-6',
'Revenue_2020-7',
'Revenue_2020-8',
'Revenue_2020-9',
),
)
args = (
scenarios,
'Revenue',
'2020-6',
'2020-7',
'2020-8',
'2020-9',
)
new_result = get_traffic_light_data_new(*args)
print(new_result)
if __name__ == '__main__':
test()
-
\$\begingroup\$ the condition I have stated is for input. what i mean here is the colurs on the inputs are based on above side. for example, if there are 4 months ( June july, Aug and September), and if you see first row on the input side, it means july revenue was less than june, august revenue was less than july and sep revenue was less than august, so all three are red. If all three are red, output should be red. and similarly for other conditions. The input data contains values for revenue for 4 months, then based on which condition it satisfies on the input side, output should be accordingly. \$\endgroup\$Vivekanand Joshi– Vivekanand Joshi2020年10月29日 08:14:57 +00:00Commented Oct 29, 2020 at 8:14
-
\$\begingroup\$ Your description violates scenario 3. There is a decrease, yet the output is green. Both your old implementation and my proposed implementation match the coloured table and not the description. \$\endgroup\$Reinderien– Reinderien2020年10月29日 13:37:40 +00:00Commented Oct 29, 2020 at 13:37
-
\$\begingroup\$ I guess I have not been clear in my comment. Apologies. What I am saying is if my data satisfies input condition for all 3 columns of the left side, my code should give me right hand (output) color. Thank you. \$\endgroup\$Vivekanand Joshi– Vivekanand Joshi2020年10月29日 20:24:29 +00:00Commented Oct 29, 2020 at 20:24
df[dimension + m2 + suffix].iloc[i] > df[dimension + m2 + suffix].iloc[i]
\$\endgroup\$