My question got rejected the last time so I am trying a better approach to getting a solution:
df.head:
predicted_u4 u_2_5_weight predicted_o2.5_n predicted_score_difference dnb_weight total_score o_1_5_weight predicted_total_score away_score predicted_bttsu2.5_n home_score btts_u_2_5_weight result_match selection_n o_2_5_weight btts_o_2_5_weight predicted_bttso2.5_n win_weight predicted_result predicted_btts_n selection_match_n u_4_5_weight btts_weight predicted_u2.5_n result
0 0.530389 0.4 0.697917 0.881006 0.7 4 3.2 3.540952 3 0.08308 1 0.4 no match O 2.5 (untested) 0.40 0.40 0.536766 1.1 home 0.618518 match 0.4 0.40 0.291228 away
1 0.530389 0.4 0.697917 0.881006 0.7 4 3.2 3.540952 3 0.08308 1 0.4 no match O 2.5 (untested) 0.40 0.40 0.536766 1.1 home 0.618518 match 0.4 0.40 0.291228 away
2 0.743486 0.4 0.477249 0.229046 0.7 2 3.2 2.458867 0 0.13194 2 0.4 match U 2.5 (untested) 0.48 0.40 0.397920 1.1 home 0.531042 match 0.4 0.54 0.529926 home
3 0.743486 0.4 0.477249 0.229046 0.7 2 3.2 2.458867 0 0.13194 2 0.4 match U 2.5 (untested) 0.48 0.40 0.397920 1.1 home 0.531042 match 0.4 0.54 0.529926 home
4 0.752334 0.4 0.532446 0.357271 0.7 1 3.2 2.599825 0 0.06794 1 0.4 match U 2.5 (untested) 0.54 0.44 0.435302 1.1 home 0.516939 match 0.4 0.52 0.480485 home
df.shape[0]:
2437086
I am trying a function to update the rows using:
def selection_n(row):
if (row["win_weight"] == 1.1 or row["btts_o_2_5_weight"] == 0.4) and row["predicted_score_difference"] > row["win_weight"] and row["predicted_bttso2.5_n"] > row["btts_o_2_5_weight"]:
return "W & BTTS O 2.5 (untested)"
elif row["predicted_score_difference"] > row["win_weight"] and row["predicted_bttso2.5_n"] > row["btts_o_2_5_weight"]:
return "W & BTTS O 2.5"
if (row["win_weight"] == 1.1 or row["btts_weight"] == 0.4) and row["predicted_score_difference"] > row["win_weight"] and row["predicted_btts_n"] > row["btts_weight"]:
return "W & BTTS (untested)"
elif row["predicted_score_difference"] > row["win_weight"] and row["predicted_btts_n"] > row["btts_weight"]:
return "W & BTTS"
if (row["win_weight"] == 1.1 or row["o_2_5_weight"] == 0.4) and row["predicted_score_difference"] > row["win_weight"] and row["predicted_o2.5_n"] > row["o_2_5_weight"]:
return "W & O 2.5 (untested)"
elif row["predicted_score_difference"] > row["win_weight"] and row["predicted_o2.5_n"] > row["o_2_5_weight"]:
return "W & O 2.5"
if (row["win_weight"] == 1.1 or row["o_1_5_weight"] == 3.2) and row["predicted_score_difference"] > row["win_weight"] and row["predicted_total_score"] > row["o_1_5_weight"]:
return "W & O 1.5 (untested)"
elif row["predicted_score_difference"] > row["win_weight"] and row["predicted_total_score"] > row["o_1_5_weight"]:
return "W & O 1.5"
if (row["win_weight"] == 1.1 or row["u_2_5_weight"] == 0.4) and row["predicted_score_difference"] > row["win_weight"] and row["predicted_u2.5_n"] > row["u_2_5_weight"]:
return "W & U 2.5 (untested)"
elif row["predicted_score_difference"] > row["win_weight"] and row["predicted_u2.5_n"] > row["u_2_5_weight"]:
return "W & U 2.5"
if (row["win_weight"] == 1.1 or row["u_4_5_weight"] == 0.4) and row["predicted_score_difference"] > row["win_weight"] and row["predicted_u4"] > row["u_4_5_weight"]:
return "W & U 4.5 (untested)"
elif row["predicted_score_difference"] > row["win_weight"] and row["predicted_u4"] > row["u_4_5_weight"]:
return "W & U 4.5"
if row["win_weight"] == 1.1 and row["predicted_score_difference"] > row["win_weight"]:
return "W (untested)"
elif row["predicted_score_difference"] > row["win_weight"]:
return "W"
if row["o_2_5_weight"] == 0.4 and row["predicted_o2.5_n"] > row["o_2_5_weight"]:
return "O 2.5 (untested)"
elif row["predicted_o2.5_n"] > row["o_2_5_weight"]:
return "O 2.5"
if row["btts_o_2_5_weight"] == 0.4 and row["predicted_bttso2.5_n"] > row["btts_o_2_5_weight"]:
return "BTTS O 2.5 (untested)"
elif row["predicted_bttso2.5_n"] > row["btts_o_2_5_weight"]:
return "BTTS O 2.5"
if row["btts_weight"] == 0.4 and row["predicted_btts_n"] > row["btts_weight"]:
return "BTTS (untested)"
elif row["predicted_btts_n"] > row["btts_weight"]:
return "BTTS"
if row["u_2_5_weight"] == 0.4 and row["predicted_u2.5_n"] > row["u_2_5_weight"]:
return "U 2.5 (untested)"
elif row["predicted_u2.5_n"] > row["u_2_5_weight"]:
return "U 2.5"
if row["dnb_weight"] == 0.7 and row["dnb_weight"] < row["predicted_score_difference"] < row["win_weight"]:
return "DNB (untested)"
elif row["dnb_weight"] < row["predicted_score_difference"] < row["win_weight"]:
return "DNB"
if row["u_4_5_weight"] == 0.4 and row["predicted_u4"] > row["u_4_5_weight"]:
return "U 4.5 (untested)"
elif row["predicted_u4"] > row["u_4_5_weight"]:
return "U 4.5"
if (row["o_1_5_weight"] == 0.4 or row["u_4_5_weight"] == 0.4) and row["predicted_total_score"] > row["o_1_5_weight"] and row["predicted_u4"] > row["u_4_5_weight"]:
return "O 1.5 and U 4.5 (untested)"
elif row["predicted_total_score"] > row["o_1_5_weight"] and row["predicted_u4"] > row["u_4_5_weight"]:
return "O 1.5 and U 4.5"
if row["btts_u_2_5_weight"] == 0.4 and row["predicted_bttsu2.5_n"] > row["btts_u_2_5_weight"]:
return "U 2.5 & BTTS (untested)"
elif row["btts_u_2_5_weight"] == 0.4 and row["predicted_bttsu2.5_n"] > row["btts_u_2_5_weight"]:
return "U 2.5 & BTTS"
def selection_match_n(row):
if pd.isna(row["home_score"]) or pd.isna(row["away_score"]):
return "no_result"
if pd.isnull(row["selection_n"]):
return "no sel."
if row["result_match"] == 'match' and row["predicted_result"] != 'draw' and row["home_score"] > 0 and row["away_score"] > 0 and row["total_score"] > 2 and (row["selection_n"] == 'W & BTTS O 2.5' or row["selection_n"] == 'W & BTTS O 2.5 (untested)'):
return "match"
if row["result_match"] == 'match' and row["predicted_result"] != 'draw' and row["home_score"] > 0 and row["away_score"] > 0 and (row["selection_n"] == 'W & BTTS' or row["selection_n"] == 'W & BTTS (untested)'):
return "match"
if row["result_match"] == 'match' and row["predicted_result"] != 'draw' and row["total_score"] > 2 and (row["selection_n"] == 'W & O 2.5' or row["selection_n"] == 'W & O 2.5 (untested)'):
return "match"
if row["result_match"] == 'match' and row["predicted_result"] != 'draw' and row["total_score"] > 1 and (row["selection_n"] == 'W & O 1.5' or row["selection_n"] == 'W & O 1.5 (untested)'):
return "match"
if row["result_match"] == 'match' and row["predicted_result"] != 'draw' and row["total_score"] < 3 and (row["selection_n"] == 'W & U 2.5' or row["selection_n"] == 'W & U 2.5 (untested)'):
return "match"
if row["result_match"] == 'match' and row["predicted_result"] != 'draw' and row["total_score"] < 5 and (row["selection_n"] == "W & U 4.5" or row["selection_n"] == "W & U 4.5 (untested)"):
return "match"
if row["result_match"] == 'match' and row["predicted_result"] != 'draw' and (row["selection_n"] == "W" or row["selection_n"] == "W (untested)"):
return "match"
if row["total_score"] > 2 and (row["selection_n"] == 'O 2.5' or row["selection_n"] == 'O 2.5 (untested)'):
return "match"
if row["home_score"] > 0 and row["away_score"] > 0 and row["total_score"] > 2 and (row["selection_n"] == 'BTTS O 2.5' or row["selection_n"] == 'BTTS O 2.5 (untested)'):
return "match"
if row["home_score"] > 0 and row["away_score"] > 0 and (row["selection_n"] == 'BTTS' or row["selection_n"] == 'BTTS (untested)'):
return "match"
if row["total_score"] < 3 and (row["selection_n"] == 'U 2.5' or row["selection_n"] == 'U 2.5 (untested)'):
return "match"
if (row["result_match"] == 'match' or row["result"] == 'draw' or row["predicted_result"] == 'draw') and (row["selection_n"] == "DNB" or row["selection_n"] == "DNB (untested)"):
return "match"
if row["total_score"] < 5 and (row["selection_n"] == 'U 4.5' or row["selection_n"] == 'U 4.5 (untested)'):
return "match"
if 1 < row["total_score"] < 5 and (row["selection_n"] == 'O 1.5 and U 4.5' or row["selection_n"] == 'O 1.5 and U 4.5 (untested)'):
return "match"
if row["home_score"] > 0 and row["away_score"] > 0 and row["total_score"] < 3 and (row["selection_n"] == 'U 2.5 & BTTS' or row["selection_n"] == 'U 2.5 & BTTS (untested)'):
return "match"
else:
return "no match"
def selection_update_n(row):
if row["selection_match_n"] == 'no match' and (row["selection_n"] == 'W & BTTS O 2.5' or row["selection_n"] == 'W & BTTS O 2.5 (untested)'):
if row["result_match"] == 'no match' and row["predicted_result"] != 'draw':
row["win_weight"] += 0.02
elif (row["home_score"] == 0 or row["away_score"] == 0) and row["total_score"] < 3:
row["btts_o_2_5_weight"] += 0.02
elif row["home_score"] == 0 or row["away_score"] == 0:
row["btts_o_2_5_weight"] += 0.02
if row["selection_match_n"] == 'no match' and (row["selection_n"] == 'W & BTTS' or row["selection_n"] == 'W & BTTS (untested)') and row["result_match"] == 'no match' and row["predicted_result"] != 'draw':
if row["home_score"] > 0 and row["away_score"] > 0:
row["win_weight"] += 0.02
elif (row["home_score"] == 0 or row["away_score"] == 0):
row["win_weight"] += 0.02
row["btts_weight"] += 0.02
elif row["selection_match_n"] == 'no match' and (row["selection_n"] == 'W & BTTS' or row["selection_n"] == 'W & BTTS (untested)') and row["result_match"] == 'match' and row["predicted_result"] != 'draw' and (row["home_score"] == 0 or row["away_score"] == 0):
row["btts_weight"] += 0.02
if row["selection_match_n"] == 'no match' and (row["selection_n"] == 'W & O 2.5' or row["selection_n"] == 'W & O 2.5 (untested)') and row["result_match"] == 'no match' and row["predicted_result"] != 'draw':
if row["total_score"] > 2:
row["win_weight"] += 0.02
elif row["total_score"] < 3:
row["win_weight"] += 0.02
row["o_2_5_weight"] += 0.02
elif row["selection_match_n"] == 'no match' and (row["selection_n"] == 'W & O 2.5' or row["selection_n"] == 'W & O 2.5 (untested)') and row["result_match"] == 'match' and row["predicted_result"] != 'draw' and row["total_score"] < 3:
row["o_2_5_weight"] += 0.02
if row["selection_match_n"] == 'no match' and (row["selection_n"] == 'W & O 1.5' or row["selection_n"] == 'W & O 1.5 (untested)') and row["result_match"] == 'no match' and row["predicted_result"] != 'draw':
if row["total_score"] > 1:
row["win_weight"] += 0.02
else:
row["win_weight"] += 0.02
row["o_1_5_weight"] += 0.02
elif row["selection_match_n"] == 'no match' and (row["selection_n"] == 'W & O 1.5' or row["selection_n"] == 'W & O 1.5 (untested)') and row["result_match"] == 'match' and row["predicted_result"] != 'draw' and row["total_score"] < 2:
row["o_1_5_weight"] += 0.02
if row["selection_match_n"] == 'no match' and (row["selection_n"] == 'W & U 2.5' or row["selection_n"] == 'W & U 2.5 (untested)') and row["result_match"] == 'no match' and row["predicted_result"] != 'draw':
if row["total_score"] < 3:
row["win_weight"] += 0.02
else:
row["win_weight"] += 0.02
row["u_2_5_weight"] += 0.02
elif row["selection_match_n"] == 'no match' and (row["selection_n"] == 'W & U 2.5' or row["selection_n"] == 'W & U 2.5 (untested)') and row["result_match"] == 'match' and row["predicted_result"] != 'draw' and row["total_score"] > 2:
row["u_2_5_weight"] += 0.02
if row["selection_match_n"] == 'no match' and (row["selection_n"] == "W & U 4.5" or row["selection_n"] == "W & U 4.5 (untested)") and row["result_match"] == 'no match' and row["predicted_result"] != 'draw':
if row["total_score"] < 5:
row["win_weight"] += 0.02
else:
row["win_weight"] += 0.02
row["u_4_5_weight"] += 0.02
elif row["selection_match_n"] == 'no match' and (row["selection_n"] == "W & U 4.5" or row["selection_n"] == "W & U 4.5 (untested)") and row["result_match"] == 'match' and row["predicted_result"] != 'draw' and row["total_score"] > 4:
row["u_4_5_weight"] += 0.02
if row["selection_match_n"] == 'no match' and (row["selection_n"] == "W" or row["selection_n"] == "W (untested)") and row["result_match"] == 'no match' and row["predicted_result"] != 'draw':
row["win_weight"] += 0.02
if row["selection_match_n"] == 'no match' and (row["selection_n"] == "W" or row["selection_n"] == "W (untested)") and row["result_match"] == 'no match' and row["predicted_result"] != 'draw':
row["win_weight"] += 0.02
if row["selection_match_n"] == 'no match' and (row["selection_n"] == 'O 2.5' or row["selection_n"] == 'O 2.5 (untested)') and row["total_score"] < 3:
row["o_2_5_weight"] += 0.02
if row["selection_match_n"] == 'no match' and (row["selection_n"] == 'BTTS O 2.5' or row["selection_n"] == 'BTTS O 2.5 (untested)') and (row["home_score"] == 0 or row["away_score"] == 0 or row["total_score"] < 3):
row["btts_o_2_5_weight"] += 0.02
if row["selection_match_n"] == 'no match' and (row["selection_n"] == 'BTTS' or row["selection_n"] == 'BTTS (untested)') and (row["home_score"] == 0 or row["away_score"] == 0):
row["btts_weight"] += 0.02
if row["selection_match_n"] == 'no match' and (row["selection_n"] == 'U 2.5' or row["selection_n"] == 'U 2.5 (untested)') and row["total_score"] > 2:
row["u_2_5_weight"] += 0.02
if row["selection_match_n"] == 'no match' and (row["selection_n"] == "DNB" or row["selection_n"] == "DNB (untested)") and row["predicted_result"] != 'draw' and (row["result_match"] != 'no match' or row["result"] != 'draw'):
row["dnb_weight"] += 0.02
if row["selection_match_n"] == 'no match' and (row["selection_n"] == 'U 4.5' or row["selection_n"] == 'U 4.5 (untested)') and row["total_score"] > 4:
row["u_4_5_weight"] += 0.02
if row["selection_match_n"] == 'no match' and (row["selection_n"] == 'O 1.5 and U 4.5' or row["selection_n"] == 'O 1.5 and U 4.5 (untested)') and (row["total_score"] < 2 or row["total_score"] > 4):
row["o_1_5_weight"] += 0.02
row["u_4_5_weight"] += 0.02
if row["selection_match_n"] == 'no match' and (row["selection_n"] == 'U 2.5 & BTTS' or row["selection_n"] == 'U 2.5 & BTTS (untested)') and (row["home_score"] == 0 or row["away_score"] == 0 or row["total_score"] > 2):
row["btts_u_2_5_weight"] += 0.0
return row
I am trying various approaches to improve the performance. Note, I am unable to use modin as I use Pycharm environment and I get init errors with either ray or dask hence unable to exploit multicore processing.
timeit.timeit(lambda: df.apply(selection_n, axis=1), number=10):
173.67167650000192
timeit.timeit(lambda: df.apply(selection_match_n, axis=1), number=10):
112.6237928000046
timeit.timeit(lambda: df.apply(selection_update_n, axis=1), number=10):
160.64576310000848
If you want to know what I am trying to accomplish here,
There are selections in the selection_n
column that gets updated based on the _weight
columns these weights are checked for matches and all no_matches need to be updated and then checked again. This loop continues until all the "no match" entries get confirmed to "match"
Since this dataframe is a big one, the loop gets very time consuming (last time, the loop took 4 days to complete)
Based on "law of increasing returns" I have tried this approach which tends to work better as the no_match rows reduce every loop so .apply
would work faster:
loop_counter = 0
while (df["selection_match_n"] == "no match").any():
start_time = time.time()
loop_counter += 1
print(f"Iteration: {loop_counter}")
df['selection_n'] = df.swifter.apply(selection_n, axis=1)
# Splitting the DataFrame
no_match_rows = df[df['selection_match_n'] == 'no match']
other_rows = df[df['selection_match_n'] != 'no match']
# Process the no_match_rows DataFrame
no_match_rows['selection_n'] = no_match_rows.swifter.apply(selection_n, axis=1)
no_match_rows = no_match_rows.swifter.apply(selection_update_n, axis=1)
no_match_rows['selection_match_n'] = no_match_rows.swifter.apply(selection_match_n, axis=1)
print('Count of Selection: no_match rows:', (no_match_rows["selection_match_n"] == "no match").sum())
# Concatenate the modified no_match_rows back with other_rows
df = pd.concat([other_rows, no_match_rows])
I have tried swifter
which does not do much.
I am wondering what is the best way to improve performance of these functions. I think vectorisation can be the best use coupled with Cython (if that's possible).
-
1\$\begingroup\$ Welcome to Code Review! The current question title, which states your concerns about the code, is too general to be useful here. Please edit to the site standard, which is for the title to simply state the task accomplished by the code. Please see How to get the best value out of Code Review: Asking Questions for guidance on writing good question titles. \$\endgroup\$Toby Speight– Toby Speight2023年08月14日 07:03:57 +00:00Commented Aug 14, 2023 at 7:03
2 Answers 2
I find the code hard to read.
This may start with having no clue what it is supposed to achieve when I start reading:
Document your code. In the code.
I don't like
if ‹condition›:
...
return ‹whatever›
else:
‹first statement when not ‹condition››
...
- just drop the else.
I see a lot of repeated accesses to row
improving neither readability nor speed.
Try introducing variables.
The "untested" return branches seem to be controlled by one condition more than the paired return without " (untested)":
Check the common condition first (and once, only)
def selection_n(row):
win_weight_1_1 = row["win_weight"] == 1.1
btts_o_2_5_weight = row["btts_o_2_5_weight"]
btts_o_2_5_weight_0_4 = btts_o_2_5_weight == 0.4
predicted_score_difference = row["predicted_score_difference"]
predicted_score_difference_greater = predicted_score_difference > win_weight
predicted_bttso2_5_n_greater = row["predicted_bttso2.5_n"] > btts_o_2_5_weight
o_2_5_weight = row["o_2_5_weight"]
predicted_o2_5_n_greater = row["predicted_o2.5_n"] > o_2_5_weight
o_2_5_weight_0_4 = o_2_5_weight == 0.4
btts_weight = row["btts_weight"]
btts_weight_0_4 = btts_weight == 0.4
u_4_5_weight_0_4 = row["u_4_5_weight"] == 0.4
predicted_btts_n_greater = row["predicted_btts_n"] > btts_weight
predicted_u2_5_n_greater = row["predicted_u2.5_n"] > row["u_2_5_weight"]
predicted_u4_greater = row["predicted_u4"] > row["u_4_5_weight"]
if row["predicted_total_score"] > row["o_1_5_weight"]:
if predicted_bttso2_5_n_greater:
if win_weight_1_1 or btts_o_2_5_weight_0_4:
return "W & BTTS O 2.5 (untested)"
return "W & BTTS O 2.5"
if predicted_btts_n_greater:
if win_weight_1_1 or btts_weight_0_4:
return "W & BTTS (untested)"
return "W & BTTS"
if predicted_o2_5_n_greater:
if win_weight_1_1 or o_2_5_weight:
return "W & O 2.5 (untested)"
return "W & O 2.5"
if predicted_total_score_greater:
if win_weight_1_1 or row["o_1_5_weight"] == 3.2:
return "W & O 1.5 (untested)"
return "W & O 1.5"
if predicted_u2_5_n_greater:
if win_weight_1_1 or row["u_2_5_weight"] == 0.4:
return "W & U 2.5 (untested)"
return "W & U 2.5"
if predicted_u4_greater:
if win_weight_1_1 or u_4_5_weight_0_4:
return "W & U 4.5 (untested)"
return "W & U 4.5"
if win_weight_1_1:
return "W (untested)"
return "W"
# row["predicted_total_score"] <= row["o_1_5_weight"]
if predicted_bttso2_5_n_greater:
if o_2_5_weight:
return "O 2.5 (untested)"
return "O 2.5"
if predicted_bttso2_5_n_greater:
if btts_o_2_5_weight_0_4:
return "BTTS O 2.5 (untested)"
return "BTTS O 2.5"
if predicted_btts_n_greater:
if btts_weight_0_4:
return "BTTS (untested)"
return "BTTS"
if predicted_u2_5_n_greater:
if row["u_2_5_weight"] == 0.4:
return "U 2.5 (untested)"
return "U 2.5"
if row["dnb_weight"] < predicted_score_difference < row["win_weight"]:
if row["dnb_weight"] == 0.7:
return "DNB (untested)"
return "DNB"
if predicted_u4_greater:
if u_4_5_weight_0_4:
return "U 4.5 (untested)"
return "U 4.5"
if predicted_total_score_greater and predicted_u4_greater:
if row["o_1_5_weight"] == 0.4 or u_4_5_weight_0_4:
return "O 1.5 and U 4.5 (untested)"
return "O 1.5 and U 4.5"
if row["predicted_bttsu2.5_n"] > row["btts_u_2_5_weight"]:
if row["btts_u_2_5_weight"] == 0.4:
return "U 2.5 & BTTS (untested)"
return "U 2.5 & BTTS"
Potential bugs in selection_n()
:
- in the question, the last two conditions are exactly the same; I took the liberty to guess the intention
- if none of the conditions match, it returns None
a helper function reduces repetition and bulk:
def or_untested(selection, literal):
""" return selection matches literal or its extension with " (untested)". """
return selection == literal or selection == literal + " (untested)"
selection_match_n()
seems to contain a lot of checks subsumed by later, less narrow constraints:
def selection_match_n(row):
if pd.isna(row["home_score"]) or pd.isna(row["away_score"]):
return "no_result"
selection = row["selection_n"]
if pd.isnull(selection):
return "no sel."
match = row["result_match"] == 'match'
predicted_draw = row["predicted_result"] != 'draw'
total_score = row["total_score"]
if (match and predicted_draw
and selection == 'W' or selection.starts_with("W ")):
# if row["home_score"] > 0
# and row["away_score"] > 0
# and or_untested(selection, 'W & BTTS'):
## if total_score > 2
## and or_untested(selection, 'W & BTTS O 2.5'):
## return "match"
# return "match"
# if total_score > 1
# if (total_score > 2
# and or_untested(selection, 'W & O 2.5')):
# return "match"
# if or_untested(selection, 'W & O 1.5'):
# return "match"
# if total_score < 5
# if (total_score < 3
# and or_untested(selection, 'W & U 2.5')):
# return "match"
# if or_untested(selection, "W & U 4.5"):
# return "match"
return "match"
if total_score > 2 and or_untested(selection, 'O 2.5'):
return "match"
both_scored = row["home_score"] > 0 and row["away_score"] > 0
if both_scored:
# if total_score > 2 and or_untested(selection, 'BTTS O 2.5'):
# return "match"
if or_untested(selection, 'BTTS'):
return "match"
if total_score < 3 and or_untested(selection, 'U 2.5'):
return "match"
if ((match or row["result"] == 'draw' or predicted_draw)
and or_untested(selection, "DNB")):
return "match"
if total_score < 5:
if or_untested(selection, 'U 4.5'):
return "match"
if 1 < total_score and or_untested(selection, 'O 1.5 and U 4.5'):
return "match"
if both_scored and total_score < 3 and or_untested(selection, 'U 2.5'):
return "match"
return "no match"
selection_update_n()
row
is not modified unlessrow["selection_match_n"] == 'no match'
: return upfront if!=
.if row["total_score"] > 2: ... elif row["total_score"] < 3:
is weird: v <= 2 implies v < 3
-
seems to be duplicated:if row["selection_match_n"] == 'no match' and (row["selection_n"] == "W" or row["selection_n"] == "W (untested)") and row["result_match"] == 'no match' and row["predicted_result"] != 'draw': row["win_weight"] += 0.02
delete one, change to+= 0.04
if appropriate.
def selection_update_n(row):
if row["selection_match_n"] != 'no match':
return row
selection = row["selection_n"]
match = row["result_match"] == 'match'
no_match = row["result_match"] == 'no match'
ne_draw = row["predicted_result"] != 'draw'
no_match_ne_draw = no_match and ne_draw
one_score_zero = row["home_score"] == 0 or row["away_score"] == 0
total_score = row["total_score"]
if selection[0] == 'W':
if or_untested(selection, 'W & BTTS O 2.5'):
if no_match and ne_draw:
row["win_weight"] += 0.02
elif one_score_zero:
# and total_score < 3:
row["btts_o_2_5_weight"] += 0.02
# elif one_score_zero:
# row["btts_o_2_5_weight"] += 0.02
if or_untested(selection, 'W & BTTS') and ne_draw:
if no_match and row["home_score"] >= 0 and row["away_score"] >= 0:
row["win_weight"] += 0.02
if one_score_zero:
row["btts_weight"] += 0.02
if or_untested(selection, 'W & O 2.5') and no_match_ne_draw:
if total_score > 2:
row["win_weight"] += 0.02
elif total_score < 3: #?!
row["win_weight"] += 0.02
row["o_2_5_weight"] += 0.02
elif (or_untested(selection, 'W & O 2.5')
and match and ne_draw and total_score < 3):
row["o_2_5_weight"] += 0.02
if (or_untested(selection, 'W & O 1.5') and no_match_ne_draw):
row["win_weight"] += 0.02
if total_score <= 1:
row["o_1_5_weight"] += 0.02
elif (or_untested(selection, 'W & O 1.5') and match
and ne_draw and total_score < 2):
row["o_1_5_weight"] += 0.02
if (or_untested(selection, 'W & U 2.5') and no_match_ne_draw):
row["win_weight"] += 0.02
if total_score >= 3:
row["win_weight"] += 0.02
elif (or_untested(selection, 'W & U 2.5') and match
and ne_draw and total_score > 2):
row["u_2_5_weight"] += 0.02
if or_untested(selection, "W & U 4.5"):
if no_match_ne_draw:
row["win_weight"] += 0.02
if total_score >= 5:
row["u_4_5_weight"] += 0.02
elif match and ne_draw and total_score > 4:
row["u_4_5_weight"] += 0.02
if or_untested(selection, "W") and no_match_ne_draw:
row["win_weight"] += 0.02
elif selection.starts_with('BTTS'):
if or_untested(selection, 'BTTS') and one_score_zero:
row["btts_weight"] += 0.02
elif (or_untested(selection, 'BTTS O 2.5')
and (one_score_zero or total_score < 3)):
row["btts_o_2_5_weight"] += 0.02
elif selection.starts_with('U '):
if selection.starts_with('U 2.5'):
if (or_untested(selection, 'U 2.5 & BTTS')
and (one_score_zero or total_score > 2)):
row["btts_u_2_5_weight"] += 0.0
if or_untested(selection, total_score > 2):
row["u_2_5_weight"] += 0.02
if or_untested(selection, 'U 4.5') and total_score > 4:
row["u_4_5_weight"] += 0.02
elif selection.starts_with('O '):
if or_untested(selection, 'O 2.5') and total_score < 3:
row["o_2_5_weight"] += 0.02
elif (or_untested(selection, 'O 1.5 and U 4.5')
and (total_score < 2 or 4 < total_score)):
row["o_1_5_weight"] += 0.02
row["u_4_5_weight"] += 0.02
elif (or_untested(selection, "DNB") and ne_draw
and (row["result_match"] != 'no match' or row["result"] != 'draw')):
row["dnb_weight"] += 0.02
return row
Above, I went to some length in reducing bulk to improve readability.
I still don't see rhythm or rhyme.
-
1\$\begingroup\$ (Such refactoring should be done in presence of a test scaffold and using an environment supporting refactorings such as extract variable.) \$\endgroup\$greybeard– greybeard2023年08月14日 08:32:27 +00:00Commented Aug 14, 2023 at 8:32
Do everything that @greybeard says - and then vectorise. You have a selection_n
function that is the subject of an apply
; the apply
needs to go away and the logic in selection_n
needs to be pulled out one dimension. I'm certainly not going to demonstrate how to do this in full because your logic is quite long, but as one example let's look at
df.apply(selection_n, axis=1)
with the very first condition:
if (row["win_weight"] == 1.1 or row["btts_o_2_5_weight"] == 0.4) and row["predicted_score_difference"] > row["win_weight"] and row["predicted_bttso2.5_n"] > row["btts_o_2_5_weight"]:
return "W & BTTS O 2.5 (untested)"
That becomes:
df.loc[
(
(df['win_weight' == 1.1) | (df['btts_o_2_5_weight'] == 0.4)
)
& (
df['predicted_score_difference'] > df['win_weight']
)
& (
df['predicted_bttso2.5_n'] > df['btts_o_2_5_weight']
),
'selection_n',
] = 'W & BTTS O 2.5 (untested)'
Rewritten like this your speed should increase.
-
\$\begingroup\$ Can you please provide an example for a if-else combo? Since the function is if-else, recreating that from the example will be relatively simpler \$\endgroup\$PyNoob– PyNoob2023年08月21日 03:09:30 +00:00Commented Aug 21, 2023 at 3:09
-
\$\begingroup\$ @PyNoob Since an if-else implies two different conditions, there would be two different assignments. The first would evaluate condition
a
and the second would evaluate the inverse, condition~a
. \$\endgroup\$Reinderien– Reinderien2023年08月23日 18:30:06 +00:00Commented Aug 23, 2023 at 18:30
Explore related questions
See similar questions with these tags.