2
\$\begingroup\$

I have written Python code which uses multiple if conditions and a for loop. The main objective of the code is to produce a traffic light system based on certain conditions.

Red = -1
Yellow = 0
Green = 1

It takes 4 months (m0, m1, m2, m3) and dataframe as an input, runs the condition through each row, and returns -1, 0 or 1.

scenarios

The code compares Month 1 with Month 0, Month 2 with Month 1 and Month 3 with Month 2.

For input:

if month+1 < Month for any value, then red else green.

For example, if the revenue of July 2020 is less than June 2020 then the input is red, otherwise green. Based on three comparisons the outcome is calculated. The outcome could be either 1-, 0, or 1.

The code I have written works well but is not optimised in any way. Is there a better way to do this?

This would be an O(n) operation but, at least there should be a way to write it concisely in Python. Or if code can be improved operationally as well.

def getTrafficLightData(df, dimension, m1, m2, m3, m4):
'''
Inputs - 
 Dataframe
 dimension = on which we want to calculate traffic light system
 m1, m2, m3, m4 - Could be any for months but we have taken consecutive months for Traffic Light System.
 Example Call - getTrafficLightData(report6_TLS_data, "Revenue_","2020-6","2020-7","2020-8","2020-9")
'''
TFS_df = pd.DataFrame(columns=[dimension + "_TLS"])
if dimension == "Overstrike_":
 suffix = "%"
 for i in range(len(df)):
 if (
 (
 df[dimension + m1 + suffix].iloc[i]
 > df[dimension + m2 + suffix].iloc[i]
 )
 and (
 df[dimension + m2 + suffix].iloc[i]
 > df[dimension + m3 + suffix].iloc[i]
 )
 and (
 df[dimension + m3 + suffix].iloc[i]
 > df[dimension + m4 + suffix].iloc[i]
 )
 ):
 TFS_df.loc[i] = [1]
 elif (
 (
 df[dimension + m1 + suffix].iloc[i]
 < df[dimension + m2 + suffix].iloc[i]
 )
 and (
 df[dimension + m2 + suffix].iloc[i]
 < df[dimension + m3 + suffix].iloc[i]
 )
 and (
 df[dimension + m3 + suffix].iloc[i]
 < df[dimension + m4 + suffix].iloc[i]
 )
 ):
 TFS_df.loc[i] = [-1]
 elif (
 (
 df[dimension + m1 + suffix].iloc[i]
 < df[dimension + m2 + suffix].iloc[i]
 )
 and (
 df[dimension + m2 + suffix].iloc[i]
 > df[dimension + m3 + suffix].iloc[i]
 )
 and (
 df[dimension + m3 + suffix].iloc[i]
 < df[dimension + m4 + suffix].iloc[i]
 )
 ):
 TFS_df.loc[i] = [-1]
 elif (
 (
 df[dimension + m2 + suffix].iloc[i]
 > df[dimension + m2 + suffix].iloc[i]
 )
 and (
 df[dimension + m2 + suffix].iloc[i]
 < df[dimension + m3 + suffix].iloc[i]
 )
 and (
 df[dimension + m3 + suffix].iloc[i]
 < df[dimension + m4 + suffix].iloc[i]
 )
 ):
 TFS_df.loc[i] = [-1]
 elif (
 (
 df[dimension + m1 + suffix].iloc[i]
 > df[dimension + m2 + suffix].iloc[i]
 )
 and (
 df[dimension + m2 + suffix].iloc[i]
 > df[dimension + m3 + suffix].iloc[i]
 )
 and (
 df[dimension + m3 + suffix].iloc[i]
 < df[dimension + m4 + suffix].iloc[i]
 )
 ):
 TFS_df.loc[i] = [0]
 elif (
 (
 df[dimension + m1 + suffix].iloc[i]
 > df[dimension + m2 + suffix].iloc[i]
 )
 and (
 df[dimension + m2 + suffix].iloc[i]
 < df[dimension + m3 + suffix].iloc[i]
 )
 and (
 df[dimension + m3 + suffix].iloc[i]
 > df[dimension + m4 + suffix].iloc[i]
 )
 ):
 TFS_df.loc[i] = [1]
 elif (
 (
 df[dimension + m1 + suffix].iloc[i]
 < df[dimension + m2 + suffix].iloc[i]
 )
 and (
 df[dimension + m2 + suffix].iloc[i]
 > df[dimension + m3 + suffix].iloc[i]
 )
 and (
 df[dimension + m3 + suffix].iloc[i]
 > df[dimension + m4 + suffix].iloc[i]
 )
 ):
 TFS_df.loc[i] = [1]
 elif (
 (
 df[dimension + m1 + suffix].iloc[i]
 < df[dimension + m2 + suffix].iloc[i]
 )
 and ( #
 df[dimension + m2 + suffix].iloc[i]
 < df[dimension + m3 + suffix].iloc[i]
 )
 and ( #
 df[dimension + m3 + suffix].iloc[i]
 > df[dimension + m4 + suffix].iloc[i]
 )
 ):
 TFS_df.loc[i] = [0]
 else:
 TFS_df.loc[i] = [0]
 return TFS_df
else:
 if dimension == "Margin_":
 suffix = "%"
 else:
 suffix = ""
 for i in range(len(df)):
 if (
 (
 df[dimension + m1 + suffix].iloc[i]
 > df[dimension + m2 + suffix].iloc[i]
 )
 and (
 df[dimension + m2 + suffix].iloc[i]
 > df[dimension + m3 + suffix].iloc[i]
 )
 and (
 df[dimension + m3 + suffix].iloc[i]
 > df[dimension + m4 + suffix].iloc[i]
 )
 ):
 TFS_df.loc[i] = [-1]
 elif (
 (
 df[dimension + m1 + suffix].iloc[i]
 < df[dimension + m2 + suffix].iloc[i]
 )
 and (
 df[dimension + m2 + suffix].iloc[i]
 < df[dimension + m3 + suffix].iloc[i]
 )
 and (
 df[dimension + m3 + suffix].iloc[i]
 < df[dimension + m4 + suffix].iloc[i]
 )
 ):
 TFS_df.loc[i] = [1]
 elif (
 (
 df[dimension + m1 + suffix].iloc[i]
 < df[dimension + m2 + suffix].iloc[i]
 )
 and (
 df[dimension + m2 + suffix].iloc[i]
 > df[dimension + m3 + suffix].iloc[i]
 )
 and (
 df[dimension + m3 + suffix].iloc[i]
 < df[dimension + m4 + suffix].iloc[i]
 )
 ):
 TFS_df.loc[i] = [1]
 elif (
 (
 df[dimension + m1 + suffix].iloc[i]
 > df[dimension + m2 + suffix].iloc[i]
 )
 and (
 df[dimension + m2 + suffix].iloc[i]
 < df[dimension + m3 + suffix].iloc[i]
 )
 and (
 df[dimension + m3 + suffix].iloc[i]
 < df[dimension + m4 + suffix].iloc[i]
 )
 ):
 TFS_df.loc[i] = [1]
 elif (
 (
 df[dimension + m1 + suffix].iloc[i]
 > df[dimension + m2 + suffix].iloc[i]
 )
 and (
 df[dimension + m2 + suffix].iloc[i]
 > df[dimension + m3 + suffix].iloc[i]
 )
 and (
 df[dimension + m3 + suffix].iloc[i]
 < df[dimension + m4 + suffix].iloc[i]
 )
 ):
 TFS_df.loc[i] = [0]
 elif (
 (
 df[dimension + m1 + suffix].iloc[i]
 > df[dimension + m2 + suffix].iloc[i]
 )
 and (
 df[dimension + m2 + suffix].iloc[i]
 < df[dimension + m3 + suffix].iloc[i]
 )
 and (
 df[dimension + m3 + suffix].iloc[i]
 > df[dimension + m4 + suffix].iloc[i]
 )
 ):
 TFS_df.loc[i] = [-1]
 elif (
 (
 df[dimension + m1 + suffix].iloc[i]
 < df[dimension + m2 + suffix].iloc[i]
 )
 and (
 df[dimension + m2 + suffix].iloc[i]
 > df[dimension + m3 + suffix].iloc[i]
 )
 and (
 df[dimension + m3 + suffix].iloc[i]
 > df[dimension + m4 + suffix].iloc[i]
 )
 ):
 TFS_df.loc[i] = [-1]
 elif (
 (
 df[dimension + m1 + suffix].iloc[i]
 < df[dimension + m2 + suffix].iloc[i]
 )
 and (
 df[dimension + m2 + suffix].iloc[i]
 < df[dimension + m3 + suffix].iloc[i]
 )
 and (
 df[dimension + m3 + suffix].iloc[i]
 > df[dimension + m4 + suffix].iloc[i]
 )
 ):
 TFS_df.loc[i] = [0]
 else:
 TFS_df.loc[i] = [0]
 return TFS_df

The function is called in below way -

report6_TLS_data['Revenue_TLS']=getTrafficLightData(report6_TLS_data, "Revenue_","2020-6","2020-7","2020-8","2020-9")
report6_TLS_data["Margin_TLS"]=getTrafficLightData(report6_TLS_data, "Margin_","2020-6","2020-7","2020-8","2020-9")
report6_TLS_data["Overstrike_TLS"]=getTrafficLightData(report6_TLS_data, "Overstrike_","2020-6","2020-7","2020-8","2020-9")

Any pointers would be helpful.

The input data is of form -

ym PART NUMBER BranchCode Revenue_2019-1 Revenue_2019-10 Revenue_2019-11 Revenue_2019-12 Revenue_2019-2 Revenue_2019-3 Revenue_2019-4 Revenue_2019-5 Revenue_2019-6 Revenue_2019-7 Revenue_2019-8 Revenue_2019-9 Revenue_2020-1 Revenue_2020-2 Revenue_2020-3 Revenue_2020-4 Revenue_2020-5 Revenue_2020-6 Revenue_2020-7 Revenue_2020-8 Revenue_2020-9 Margin_2019-1 Margin_2019-10 Margin_2019-11 Margin_2019-12 Margin_2019-2 Margin_2019-3 Margin_2019-4 Margin_2019-5 Margin_2019-6 Margin_2019-7 Margin_2019-8 Margin_2019-9 Margin_2020-1 Margin_2020-2 Margin_2020-3 Margin_2020-4 Margin_2020-5 Margin_2020-6 Margin_2020-7 Margin_2020-8 Margin_2020-9 Overstrike_2019-1 Overstrike_2019-10 Overstrike_2019-11 Overstrike_2019-12 Overstrike_2019-2 Overstrike_2019-3 Overstrike_2019-4 Overstrike_2019-5 Overstrike_2019-6 Overstrike_2019-7 Overstrike_2019-8 Overstrike_2019-9 Overstrike_2020-1 Overstrike_2020-2 Overstrike_2020-3 Overstrike_2020-4 Overstrike_2020-5 Overstrike_2020-6 Overstrike_2020-7 Overstrike_2020-8 Overstrike_2020-9 Transactions_2019-1 Transactions_2019-10 Transactions_2019-11 Transactions_2019-12 Transactions_2019-2 Transactions_2019-3 Transactions_2019-4 Transactions_2019-5 Transactions_2019-6 Transactions_2019-7 Transactions_2019-8 Transactions_2019-9 Transactions_2020-1 Transactions_2020-2 Transactions_2020-3 Transactions_2020-4 Transactions_2020-5 Transactions_2020-6 Transactions_2020-7 Transactions_2020-8 Transactions_2020-9 Margin_2019-1% Margin_2019-10% Margin_2019-11% Margin_2019-12% Margin_2019-2% Margin_2019-3% Margin_2019-4% Margin_2019-5% Margin_2019-6% Margin_2019-7% Margin_2019-8% Margin_2019-9% Margin_2020-1% Margin_2020-2% Margin_2020-3% Margin_2020-4% Margin_2020-5% Margin_2020-6% Margin_2020-7% Margin_2020-8% Margin_2020-9% Overstrike_2019-1% Overstrike_2019-10% Overstrike_2019-11% Overstrike_2019-12% Overstrike_2019-2% Overstrike_2019-3% Overstrike_2019-4% Overstrike_2019-5% Overstrike_2019-6% Overstrike_2019-7% Overstrike_2019-8% Overstrike_2019-9% Overstrike_2020-1% Overstrike_2020-2% Overstrike_2020-3% Overstrike_2020-4% Overstrike_2020-5% Overstrike_2020-6% Overstrike_2020-7% Overstrike_2020-8% Overstrike_2020-9%
0 BAGG001 BC 71.75 90.00 20.25 43.50 42.50 30.00 70.00 44.25 45.00 46.75 129.50 58.00 81.00 36.00 33.25 0.75 15.00 24.75 0.00 0.00 2.50 32.97 39.15 8.95 14.31 18.95 7.86 30.68 19.27 19.74 18.12 59.38 22.30 34.95 17.59 14.10 0.32 6.35 5.30 0.00 0.00 1.06 0.00 0.00 0.00 1.00 0.00 1.00 0.00 0.00 3.00 3.00 1.00 1.00 2.00 0.00 0.00 0.00 0.00 2.00 0.00 0.00 0.00 8 16 5 9 5 6 12 7 10 7 13 10 13 5 11 1 2 4 0 0 1 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.00 0.00 1.00
1 BAGG001 PK 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 25.50 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 9.90 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Reinderien
71k5 gold badges76 silver badges256 bronze badges
asked Oct 28, 2020 at 19:07
\$\endgroup\$
2
  • 2
    \$\begingroup\$ Your syntax is incorrect. Please fix your indentation. \$\endgroup\$ Commented Oct 28, 2020 at 21:01
  • 1
    \$\begingroup\$ Looks like a copy paste error...df[dimension + m2 + suffix].iloc[i] > df[dimension + m2 + suffix].iloc[i] \$\endgroup\$ Commented Oct 28, 2020 at 22:34

1 Answer 1

1
\$\begingroup\$

It's important to understand that Pandas is a wrapper around Numpy, and both are geared toward vectorization. Avoid loops and repetition as much as possible, and this method can really be boiled down. It's worth saying that

if month+1 < Month for any value, then red else green.

is absolutely not what's going on. For now I've made a proposed solution that deals with a fixed number of months using a lookup table. You can adapt it as necessary. It's based on the table that you showed, interpreting comparisons as bits in a field.

Otherwise: don't bake an underscore into your dimension string; don't produce a double-underscore in the output dataframe; add type hints; and accept the months as a variadic argument or tuple.

import numpy as np
import pandas as pd
lookup = np.array((-1, 0, -1, 1, -1, 1, 0, 1))
to_binary = np.array((4, 2, 1)).T
def get_traffic_light_data_new(
 df: pd.DataFrame,
 dimension: str,
 *months: str,
) -> np.array:
 assert len(months) == 4
 by_month = df[[f'{dimension}_{month}' for month in months]].to_numpy()
 signs = by_month[:, 1:] - by_month[:, :-1] > 0
 index = np.matmul(signs, to_binary)
 return lookup[index]
def test():
 scenarios = pd.DataFrame(
 [
 [3, 2, 1, 0],
 [1, 2, 3, 4],
 [1, 2, 1.5, 1.7],
 [2, 1, 3, 4],
 [3, 2, 1, 4],
 [3, 2, 2.5, 2.4],
 [5, 6, 1, 0],
 [3, 4, 5, 4],
 ],
 columns=(
 'Revenue_2020-6',
 'Revenue_2020-7',
 'Revenue_2020-8',
 'Revenue_2020-9',
 ),
 )
 args = (
 scenarios,
 'Revenue',
 '2020-6',
 '2020-7',
 '2020-8',
 '2020-9',
 )
 new_result = get_traffic_light_data_new(*args)
 print(new_result)
if __name__ == '__main__':
 test()
answered Oct 28, 2020 at 23:10
\$\endgroup\$
3
  • \$\begingroup\$ the condition I have stated is for input. what i mean here is the colurs on the inputs are based on above side. for example, if there are 4 months ( June july, Aug and September), and if you see first row on the input side, it means july revenue was less than june, august revenue was less than july and sep revenue was less than august, so all three are red. If all three are red, output should be red. and similarly for other conditions. The input data contains values for revenue for 4 months, then based on which condition it satisfies on the input side, output should be accordingly. \$\endgroup\$ Commented Oct 29, 2020 at 8:14
  • \$\begingroup\$ Your description violates scenario 3. There is a decrease, yet the output is green. Both your old implementation and my proposed implementation match the coloured table and not the description. \$\endgroup\$ Commented Oct 29, 2020 at 13:37
  • \$\begingroup\$ I guess I have not been clear in my comment. Apologies. What I am saying is if my data satisfies input condition for all 3 columns of the left side, my code should give me right hand (output) color. Thank you. \$\endgroup\$ Commented Oct 29, 2020 at 20:24

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.