From my previous post, I tried to take into account all of the nice answers made by Kate, Chris, J_H, Booboo and mudskipper
Changes made
In short, I attempted to apply all of the following:
- API key handling
- Better URL construction
- Input cleaning improvements
- Category detection rewrite
- More functions
- Article formatting cleanup
- Added error handling
- Replaced hard-coded strings with constants
- PEP8 & Style corrections
- Added a
__main__entry point - Simplified logic flow
API key changes
I noticed not long ago that NewsAPI prefers to keep all API keys for private use only. You may get yours here. I am in no way promoting them.
Code
Here is the updated Python code:
"""
This script fetches the most recent news
headline from the NewsAPI service based
on a natural-language query given by the
user (e.g., "latest technology news",
"what’s happening in sports today", etc.).
"""
import os
import re
import requests
import json5
# Constants / configuration
API_URL = "https://newsapi.org/v2/top-headlines"
CATEGORIES = ("business", "entertainment", "general", "health", "science", "sports", "technology")
STOP_WORDS = [
"news", "new", "for", "concerning", "can", "you", "give", "me",
"the", "most", "recent", "in", "category", "currently", "current",
"now", "what", "whats", "is", "about", "today"
]
def clean_query(text: str) -> str:
"""
Clean the input text by removing stop words, then detect if
any category is mentioned. Return query parameter as string.
"""
text = text.lower()
# remove stop words with word boundaries
pattern = re.compile(r"\b(" + "|".join(re.escape(w) for w in STOP_WORDS) + r")\b", flags=re.IGNORECASE)
text = pattern.sub("", text).strip()
# detect categories
for cat in CATEGORIES:
if re.search(rf"\b{re.escape(cat)}\b", text, flags=re.IGNORECASE):
return {"category": cat}
# if nothing remains, default to general
if not text:
return {"category": "general"}
# else treat as search query
return {"q": text}
def fetch_top_headline(params: dict, api_key: str) -> dict:
"""
Fetches the top headline from NewsAPI with the given params.
Raises on HTTP error.
"""
params = params.copy()
params.update({
"sortBy": "publishedAt",
"pageSize": 1,
})
# Prefer putting API key in header for security
headers = {"X-Api-Key": api_key}
response = requests.get(API_URL, params=params, headers=headers)
response.raise_for_status()
return response.json()
def format_article(article: dict) -> str:
"""
Given a single article dict, extract its fields and return a nice string.
"""
source_name = article.get("source", {}).get("name", "")
author = article.get("author")
title = article.get("title", "")
desc = article.get("description") or ""
# clean up title
title = title.replace(source_name, "").replace("\n", " ")
title = title.replace(" - ", " ").replace("LISTEN | ", "")
# clean up description (if HTML-like)
if "<p>" in desc:
desc = desc.split("<p>", 1)[1]
# here you might want to use BeautifulSoup instead of manual replace
desc = desc.replace(""", '"')
# author formatting
if author and author.lower() != source_name.lower():
# clean commas
cleaned = author.replace(",", " and ")
author_part = f" by {cleaned}"
else:
author_part = ""
# source formatting
source_part = f" on {source_name}" if source_name else ""
return f"Here is the recently published news I found{source_part}{author_part}:\n" \
f"The title is {title}...\n{desc}"
def news_concerning(text: str) -> str:
"""
Main function that takes user text, queries NewsAPI, and returns a formatted result.
"""
api_key = os.getenv("NEWS_API_KEY")
if not api_key:
raise RuntimeError("API key not found. Please set NEWS_API_KEY in your environment.")
params = clean_query(text)
data = fetch_top_headline(params, api_key)
articles = data.get("articles", [])
if not articles:
# Nothing found
query_term = params.get("q") or params.get("category")
return f"I couldn’t find anything about {query_term}. Could you try a different query?"
first = articles[0]
try:
return format_article(first)
except Exception as e:
# Fallback error message
return "Sorry, I couldn’t format the article properly."
if __name__ == "__main__":
user_input = input("What news would you like to hear about? ")
print(news_concerning(user_input))
To run the script, enter the following:
NEWS_API_KEY="your_api_key_here" python3 NewsAPI.py
Critique request
Please, tell me anything that comes to mind.
3 Answers 3
Reinventing the wheel
The following loop simply reinvents next.
for cat in CATEGORIES: if re.search(rf"\b{re.escape(cat)}\b", text, flags=re.IGNORECASE): return {"category": cat}
Might be written:
cat = next(
(cat
for cat in CATEGORIES
if re.search(rf"\b{re.escape(cat)}\b", text, flags=re.I)),
None
)
if cat:
return {"category": cat}
Or:
categories_in_text = (
cat
for cat in CATEGORIES
if re.search(rf"\b{re.escape(cat)}\b", text, flags=re.I)
)
if cat := next(categories_in_text, None):
return {"category": cat}
Inconsistent chaining and DRY
In the following it's odd that you've chosen to chain the calls together the way you have.
# clean up title title = title.replace(source_name, "").replace("\n", " ") title = title.replace(" - ", " ").replace("LISTEN | ", "")
Consider whether you really want to define a dictionary with strings to replace and what to replace them with. This will let you extend this logic with additional replacements with very little additional effort.
E.g.
replacements = {
source_name: "",
"\n": "",
" - ": " ",
"LISTEN | ": ""
}
for orig, repl in replacements.items():
title = title.replace(orig, repl)
Exception handling
try: return format_article(first) except Exception as e: # Fallback error message return "Sorry, I couldn’t format the article properly."
This is very generic.
What exceptions can format_article actually raise?
Well, author could be None.
author = article.get("author")
But you've checked for that before doing anything with it that might raise an exception.
if author and author.lower() != source_name.lower():
Honestly, I can't figure out anything format_article itself can be doing to raise an exception.
As such, I think it would be more appropriate to simply let this unforeseen exception that's out of your control propagate out to the caller. Maybe in your calling code that's main guarded you choose to do something about this or maybe you just let the program exit with the exception, but generic Exception handlers should always raise an eyebrow.
-
3\$\begingroup\$ Agreed about the exception. You should only catch exceptions if you actually can do something about fixing the problem, and only those specific exceptions you can fix. If you can't do anything about the problem, then don't catch the exception; instead, let it bubble up the stack – you never know if there's someone else who knows how to fix it! (Even if it is just the OS cleaning up the memory and resources.) \$\endgroup\$Jörg W Mittag– Jörg W Mittag2025年11月26日 19:23:31 +00:00Commented Nov 26 at 19:23
-
\$\begingroup\$ Your idea of using
next()is really clean. Does it search for a specific category in order of the first found or in order of the ones listed? In other words, if I enter the query « sports business », which would be selected? \$\endgroup\$Chip01– Chip012025年11月26日 19:46:39 +00:00Commented Nov 26 at 19:46 -
\$\begingroup\$ @JörgWMittag By the way, I actually do not remember why the exception is there, and can assure you that it was not because an error occurred without it. \$\endgroup\$Chip01– Chip012025年11月26日 19:49:02 +00:00Commented Nov 26 at 19:49
-
\$\begingroup\$ @Chip01 because "business" occurs first in
CATEGORIESit'd be the one returned. A slightly more sophisticated search might find how many times each keyword occurs in the text and return the one that occurs most often. Which one occurs first would be a decent tie-breaker. \$\endgroup\$Chris– Chris2025年11月26日 19:57:30 +00:00Commented Nov 26 at 19:57 -
1\$\begingroup\$ To see the order, simply convert that generator expression to a list instead. \$\endgroup\$Chris– Chris2025年11月26日 19:58:26 +00:00Commented Nov 26 at 19:58
Type hints
I like the usage of type hints for your functions. However, the return hint looks wrong for:
def clean_query(text: str) -> str:
The function always seems to return a dict. For example:
return {"q": text}
Simpler
Since you only use the variable first in the news_concerning function once,
you could eliminate it:
return format_article(articles[0])
Comments
This comment can be deleted since it looks like a note you made to yourself during code development:
# here you might want to use BeautifulSoup instead of manual replace
Comments are intended to explain code behavior.
Unused import
According to
ruff
, the following import is unused:
import json5
Session
This may not a big deal since you are using requests at only one place, but using a session has some benefits. You can set all your headers at session level once for all, and then there is no need to repeat them when doing requests.get or post etc.
So this code:
headers = {"X-Api-Key": api_key}
response = requests.get(API_URL, params=params, headers=headers)
can be:
session = requests.session()
session.headers = headers
response = session.get(API_URL, params=params)
And you can still add additional parameters on a case by case basis if necessary, on top of what session will provide automatically for you.
This is useful when working with APIs for example, and you have plenty of functions connecting to different API endpoints.
Separation of concerns
The function news_concerning does at least three completely different things: check an environment variable, and then call a function, and then extract some data.
To begin with, you could move this code to a dedicated function, that checks environment variables on startup:
api_key = os.getenv("NEWS_API_KEY")
if not api_key:
raise RuntimeError("API key not found. Please set NEWS_API_KEY in your environment.")
For the moment, that's the only check you are doing but there may be more in the future. It is fine to have very small functions, each focused on a specific purpose.
However, it's only 3 lines currently. So it would be okay to have this inside the __main__ guard instead. This check should be performed as early as possible. If you refactor your code later (and I suspect you will), that check may not take place when it should, because the code is executed in a certain order and follows a certain flow.
If we look at your current code, the check is actually done almost immediately, because you call news_concerning very early:
if __name__ == "__main__":
user_input = input("What news would you like to hear about? ")
print(news_concerning(user_input))
But you should request user input only after making sure that the environment variable is set, so that the user does not enter data for nothing, since the application will stop with an error right away.
Future improvements
I would be tempted to use a config file for the stopwords. For example in YAML format. So that a user can maintain the list without touching the code, at the risk of breaking things.
Ditto for the categories.
I would also sort the keywords using Python, to have a better overview and reduce the chance of adding duplicates when editing that list. In fact, you can use the set function to process a list and remove duplicates and get a clean list.
-
\$\begingroup\$ I like the idea of moving stopwords/categories into a config file. Do you think YAML is the best choice here, or would JSON/TOML be more practical for portability? \$\endgroup\$Chip01– Chip012025年11月26日 19:51:53 +00:00Commented Nov 26 at 19:51
-
1\$\begingroup\$ @Chip01 I believe YAML with pyyaml is just fine for the config file. Also, you can structure your data hierarchically eg. by theme to make it even better organized. YAML would be more convenient than JSON for editing I think, but beware indentation. \$\endgroup\$Kate– Kate2025年11月26日 19:59:56 +00:00Commented Nov 26 at 19:59
-
1\$\begingroup\$ YAML is a bad format: you'd want to use a library like StrictYAML to work around that. TOML is a simpler format with support in the Python standard library, and this use-case doesn't motivate the complexity of YAML. Portability isn't much of a concern considering how many third-party libraries are already being brought in, but for ultra-portability you'd want JSON: the main reason to avoid it is that JSON is hard to edit manually: it's easy to get those braces unbalanced by mistake. (cc @Chip01) \$\endgroup\$wizzwizz4– wizzwizz42025年11月27日 10:13:13 +00:00Commented Nov 27 at 10:13