I am working on a basic, personal ML project on predicting weather. First, I have finished working on Jupyter Notebook. And now, I am transforming it into create a Flask app.
I have just completed the code of my Flask app. It's working fine on localhost. But I am not sure if I have done everything in the right way. Would you please review my code on Github to see if I am doing everything right?
https://github.com/SteveAustin583/weather-prediction-flask
Here is my code on app.py file:
import os
from flask import Flask, request, jsonify, render_template
import joblib
import pandas as pd
import numpy as np
# Initialize Flask app
app = Flask(__name__)
# --- Load Model Artifacts ---
# Define paths to the model artifacts
# Ensure these paths are correct relative to app.py
BASE_DIR = os.path.dirname(os.path.abspath(__file__))
MODEL_ARTIFACTS_DIR = os.path.join(BASE_DIR, 'model_artifacts')
MODEL_PATH = os.path.join(MODEL_ARTIFACTS_DIR, 'weather_prediction_model.joblib')
ENCODER_PATH = os.path.join(MODEL_ARTIFACTS_DIR, 'weather_label_encoder.joblib')
FEATURES_PATH = os.path.join(MODEL_ARTIFACTS_DIR, 'classifier_feature_names.joblib')
# Load the artifacts
try:
model = joblib.load(MODEL_PATH)
label_encoder = joblib.load(ENCODER_PATH)
# These are the feature names the model was trained on (e.g., ['temp_min', 'temp_max', 'precipitation', 'wind'])
expected_features = joblib.load(FEATURES_PATH)
print(f"Model, Label Encoder, and Feature List loaded successfully from {MODEL_ARTIFACTS_DIR}")
print(f"Expected features for prediction: {expected_features}")
except FileNotFoundError as e:
print(f"Error loading model artifacts: {e}")
print("Please ensure 'weather_prediction_model.joblib', 'weather_label_encoder.joblib', and 'classifier_feature_names.joblib' are in the 'model_artifacts' directory.")
model = None
label_encoder = None
expected_features = None
except Exception as e:
print(f"An unexpected error occurred during artifact loading: {e}")
model = None
label_encoder = None
expected_features = None
# --- Flask Routes ---
@app.route('/', methods=['GET', 'POST'])
def index():
prediction_text = None
error_text = None
input_data_for_template = {feature: "" for feature in expected_features} if expected_features else {}
if request.method == 'POST':
if not model or not label_encoder or not expected_features:
error_text = "Model artifacts not loaded. Cannot make a prediction."
return render_template('index.html', prediction_text=prediction_text, error_text=error_text, input_data=input_data_for_template)
try:
# Get data from form
form_data = request.form.to_dict()
input_data_for_template = form_data.copy() # Store for re-populating form
# Validate and prepare features
feature_values = []
missing_features = []
type_errors = []
for feature_name in expected_features:
if feature_name not in form_data or form_data[feature_name] == '':
missing_features.append(feature_name)
continue
try:
feature_values.append(float(form_data[feature_name]))
except ValueError:
type_errors.append(f"Feature '{feature_name}' must be a number.")
if missing_features:
error_text = f"Missing input for: {', '.join(missing_features)}."
elif type_errors:
error_text = " ".join(type_errors)
else:
# Create DataFrame for prediction (model expects 2D array)
input_df = pd.DataFrame([feature_values], columns=expected_features)
# Make prediction
encoded_prediction = model.predict(input_df)
predicted_weather_category = label_encoder.inverse_transform(encoded_prediction)
prediction_text = f"Predicted Weather: {predicted_weather_category[0]}"
except Exception as e:
error_text = f"Error during prediction: {str(e)}"
print(f"Prediction error: {e}")
return render_template('index.html', prediction_text=prediction_text, error_text=error_text, input_data=input_data_for_template, expected_features=expected_features)
@app.route('/api/predict', methods=['POST'])
def api_predict():
if not model or not label_encoder or not expected_features:
return jsonify({'error': 'Model artifacts not loaded. Cannot make a prediction.'}), 500
try:
data = request.get_json(force=True) # Get data posted as JSON
# Validate and prepare features
feature_values = []
missing_features = []
type_errors = []
for feature_name in expected_features:
if feature_name not in data:
missing_features.append(feature_name)
continue
try:
feature_values.append(float(data[feature_name]))
except (TypeError, ValueError): # Handles if value is not a number or None
type_errors.append(f"Feature '{feature_name}' must be a valid number.")
if missing_features:
return jsonify({'error': f"Missing features in JSON payload: {', '.join(missing_features)}"}), 400
if type_errors:
return jsonify({'error': " ".join(type_errors)}), 400
# Create DataFrame for prediction
input_df = pd.DataFrame([feature_values], columns=expected_features)
# Make prediction
encoded_prediction = model.predict(input_df)
predicted_weather_category = label_encoder.inverse_transform(encoded_prediction)
return jsonify({'predicted_weather': predicted_weather_category[0]})
except Exception as e:
print(f"API Prediction error: {e}")
return jsonify({'error': f'Error processing request: {str(e)}'}), 500
# --- Run the App ---
if __name__ == '__main__':
# Check if artifacts loaded correctly before trying to run
if model and label_encoder and expected_features:
app.run(debug=True, port=5000) # Runs on http://127.0.0.1:5000/
else:
print("Application cannot start due to missing model artifacts. Please check the 'model_artifacts' directory and error messages.")
You can find my notebook right here:
1 Answer 1
High level advice: helper functions are your friends, use lots of little ones.
defer execution
This runs at import
time:
model = joblib.load(MODEL_PATH)
It's not the end of the world, but it would be
preferable to wait until run time.
That is, protect it within
the __name__ == '__main__'
guard.
Usually we desire that it is "easy" for
a unit test module to import
target code
such as helper functions.
And it should be completed "quickly".
So actions that could fail or take a while,
like network connections, should usually
happen within the __main__
guard rather than
running at import
time.
Unpickling a complex data file can fail in more
than one way, and may run arbitrary python code.
And when onboarding a new engineer to the project,
you want to make it easy for them to see pytest
succeed, without worrying about creating large
data artifacts.
comment suggests helper name
This is a nice enough comment:
# Load the artifacts
But it suggests that def _load_artifacts():
would be appropriate here.
Assigning to module-level model, features, & encoder globals works well enough.
But it might be convenient to make them attributes
of some @dataclass
.
Looking at the slightly long list of
filenames that appears in that nice diagnostic
message, perhaps we'd like them in a single list
that can be used during loading and error reporting.
The "unexpected error" message is not very diagnostic, and seems to add little value. Consider removing it entirely, and letting the default behavior deal with it. Then you'll get a helpful stack trace with line numbers, instead of the current one-line report.
pathlib
Using the ancient os.path
API works well enough.
But prefer the modern Path
API, as it is more concise.
MODEL_ARTIFACTS_DIR = os.path.join(BASE_DIR, 'model_artifacts')
becomes
MODEL_ARTIFACTS_DIR = BASE_DIR / 'model_artifacts'
It also allows for improved type hints.
helpers
Your def index():
is a little on the long side,
partly because the POST behavior isn't broken out
as a helper, and partly due to nesting
try
/ except
within try
/ except
.
Consider using an approach like pydantic for data validation.
Consider simplifying the error checking and just
letting the client see "500 server error" for a
malformed request. If a float(None)
conversion
is attempted, maybe the client gets what it deserves.
As long as you can read good logs on the server of
what the client sent, it shouldn't be too hard to
figure out the details and repair the buggy
client code.
A similar tradeoff is seen here; it happens that I really like the OP code as written.
def api_predict():
if not model or not label_encoder or not expected_features:
return jsonify({'error': 'Model artifacts not loaded. Cannot make a prediction.'}), 500
It's a sure thing that the model.predict()
call
or the label_encoder.inverse_transform()
call
is going to blow up if we're doing None.predict()
.
So we will get the 500
one way or another.
You seem to have a requirement here that client shall
see a helpful diagnostic error, rather than
an obscure
AttributeError: 'NoneType' object has no attribute 'predict'
I think that's a good thing; keep the code.
An even better approach would be to ensure that
server startup fails if the model didn't load.
Then flask isn't even listening on a TCP port,
and client sees "network connection refused"
instead of 500
.
The server operator (or an automated network
health monitor) would be expected to notice
the startup trouble and dispatch someone to diagnose
and repair it.
f-string already calls str()
There's no difference in the behavior of these two formatted string expressions:
error_text = f"Error during prediction: {str(e)}"
print(f"Prediction error: {e}")
Simplify the first one, eliminating the redundant str()
call.
-
\$\begingroup\$ I have implemented your feedback and have created a new question for reviewing the implementation. Would you please take a look at it during your free time? codereview.stackexchange.com/questions/297385/… \$\endgroup\$Steve Austin– Steve Austin2025年06月15日 14:00:02 +00:00Commented Jun 15 at 14:00