This is an Atom formatted XML site feed. It is intended to be viewed in a Newsreader or syndicated to another site. Please visit Atom Enabled for more info.
I'm used to running pre-commit autoupdate regularly to update the versions of the linters/formatters that I use. Especially when there's some error.
For example, a couple of months ago, there was some problem with ansible-lint. You have an ansible-lint, ansible and ansible-core package and one of them needed an upgrade. I'd get an error like this:
ModuleNotFoundError: No module named 'ansible.parsing.yaml.constructor'
The solution: pre-commit autoupdate, which grabbed a new ansible-lint version that solved the problem. Upgrading is good.
But... little over a month ago, ansible-lint pinned python to 3.13 in the pre-commit hook. So when you update, you suddenly need to have 3.13 on your machine. I have that locally, but on the often-used "ubuntu latest" (24.04) github action runner, only 3.12 is installed by default. Then you'd get this:
[INFO] Installing environment for https://github.com/pre-commit/pre-commit-hooks.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
[INFO] Installing environment for https://github.com/astral-sh/ruff-pre-commit.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
[INFO] Installing environment for https://github.com/ansible-community/ansible-lint.git.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
An unexpected error has occurred: CalledProcessError: command:
('/opt/hostedtoolcache/Python/3.12.12/x64/bin/python', '-mvirtualenv',
'/home/runner/.cache/pre-commit/repomm4m0yuo/py_env-python3.13', '-p', 'python3.13')
return code: 1
stdout:
RuntimeError: failed to find interpreter for Builtin discover of python_spec='python3.13'
stderr: (none)
Check the log at /home/runner/.cache/pre-commit/pre-commit.log
Error: Process completed with exit code 3.
Ansible-lint's pre-commit hook needs 3.10+ or so, but won't accept anything except 3.13. Here's the change: https://github.com/ansible/ansible-lint/pull/4796 (including some comments that it is not ideal, including the github action problem).
The change apparently gives a good error message to people running too-old python versions, but it punishes those that do regular updates (and have perfectly fine non-3.13 python versions). A similar pin was done in "black" and later reverted (see the comments on this issue) as it caused too many problems.
Note: this comment gives some of the reasons for hardcoding 3.13. Pre-commit itself doesn't have a way to specify a minimum Python version. Apparently old Python version cans lead to weird install errors, though I haven't found a good ticket about that in the issue tracker. The number of issues in the tracker is impressively high, so I can imagine such a hardcoded version helping a bit.
Now on to the "fix". Override the language_version like this:
- repo: https://github.com/ansible-community/ansible-lint.git hooks: - id: ansible-lint language_version: python3 # or python3.12 or so
If you use ansible-lint a lot (like I do), you'll have to add that line to all your (django) project repositories when you update your pre-commit config...
I personally think this pinning is a bad idea. After some discussion in issue 4821 I created a sub-optimal proposal to at least setting the default to 3.12, but that issue was closed&locked because I apparently "didn't search the issue tracker".
Anyway, this blog post hopefully helps people adjust their many pre-commit configs.
]]>My summaries from the sixth Python meetup in Leiden (NL).
His first experience with Mongodb was when he had to build a patient data warehouse based on literature. He started with postgres, but the fixed table structure was very limiting. Mongodb was much more flexible.
Postgres is a relational database, Mongodb is a document database. Relational: tables, clearly defined relationships and a pre-defined structure. Document/nosql: documents, flexible relationships and a flexible structure.
Nosql/document databases can scale horizontally. Multiple servers, connected. Relational databases have different scaling mechanisms.
Why is mongo such a nice combination with python?
He showed example python code, comparing a mysql example with a Mongodb version. The Mongodb version did indeed look simpler.
The advantage of Mongodb (the freedom) also is its drawback: you need to do your own validation and your own housekeeping, otherwise your data slowly becomes unusable.
Mathijs is now only using Mongodb, mostly because of the speed of development he enjoys with it.
He showed a couple of videos of drummers. Some with and some without "blast beats". In metal (if I understood correctly) it means both a lot of base drum, but essentially also a "machine gun" on tne snare drum. He likes this kind of music a lot, so he wanted to analize it programmatically
He used the demucs library for his blast beat counter project. Demucs separates different instruments out of a piece of music.
With fourier transforms, he could analyse the frequencies. Individual drum sounds (snare drum hit, base drum hit) were analysed this way.
With the analysed frequency bits, they could recognise them in a piece of music and count occurrences and pick out the blast beats. He had some nice visualisations, too.
He was asked to analyze "never gonna give you up" from Rick Ashley :-) Downloading it from youtube, separating out the drums, ananlysing it, visualising it: it worked! Nice: live demo. (Of course there were no blast beats in the song.)
Live demo time again! He build a quick jekyll site (static site generator) and he's got a small hetzner server. Just a bit of apache config and he's got an empty directory that's being hosted on a domainname. He quickly did this by hand.
Next he added his simple code to a git repo and uploaded it to github.
A nice trick for Github actions are self hosted runners. They're easy to install, just follow the instructions on Github.
The runner can then run what's in your github's action, like "generate files with jekyll and store them in the right local folder on the server".
The runner runs on your server, running your code: a much nicer solution than giving your ssh key to Github and having it log into your server. You also can use it on some local computer without an external address: the runner will poll Github instead of it being Github that sends you messages.
The auto-deploy worked. And while he was busy with his demo, two PRs with changes to the static website had already been created by other participants. He merged them and the site was indeed updated right away.
(One of my summaries of the PyUtrecht meetup in Utrecht, NL).
Note: Victorien is currently the number one person maintaining Pydantic. Pydantic is basically "dataclasses with validation".
There was a show of hands: about 70% uses type hints. Type hints has been around since python 3.5. There have been improvements during the years like str|None instead of Union[str, None] in 3.10, for instance.
Something I didn't know: you can always introspect type hints when running your python code: typing.get_type_hints(my_func).
Getting typing-related changes into Python takes a lot of work. You need to implemeent the changes in CPython. You have to update the spec. And get it supported by the major type checkers. That's really a difference to typescript, as typing is built-in from the start, there.
Something that helps typing in the future is 3.15's lazy from xxx import yyy import.
There's an upcoming PEP 764, "inline typed dictionaries":
def get_movie() -> {"name": str, "year": int}:
# At least something like this ^^^, I can't type that quickly :-)
...
He has some suggestions for a new syntax, using something like <{ .... }>, but getting a syntax change into Python takes a lot of talking and a really solid proposal.
]]>(One of my summaries of the PyUtrecht meetup in Utrecht, NL).
"From SNMP to gRPC". Maurice is working on network automation. (The link goes to his github account, the presentation's demo code is there).
SNMP, the Simple Network Monitoring Protocol, has been the standard for network monitoring since 1980. But its age is showing. It is polling-pased, which is wasteful. The mechanism will continually poll the endpoints. It is like checking for new messages on your phone every minute instead of relying on push messaging.
The better way is streaming telemetry, the push model. He uses gRPC, "A high performance, open source universal RPC framework" and gNMI, "gRPC Network Management Interface".
You can ask for capabilities: used in the discovery phase. Get is a simple one-time request for a specific value. With set you can do a bit of configuring. The magic is in subscribe: it creates a persistent connection, allowing the device to continuously stream data back to the client (according to the settings done with "set").
(For the demo, he use pyGMNI, a handy python library for gNMI.)
When to use streaming?
SNMP is still fine when you have small setup and hign frequency isn't really needed.
]]>(One of my summaries of the Pycon NL one-day conference in Utrecht, NL).
(Sofie helps maintain FastAPI, Typer and spaCy; this talk is all about AI).
Sofie started with an example of a chatbot getting confused about the actual winner of an F1 race after disqualification of the winner. So you need to have a domain expert on board who can double-check the data and the results.
Let's say you want your chatbot output to link to Wikipedia for important terms. That's actually a hard task, as it has to do normalization of terms, differentiating between Hamilton-the-driver, Hamilton-the-town, Hamilton-the-founding-father and more.
There's a measure for quality of output that's called an "F-score". She used some AI model to find the correct page and got a 79.2% F-score. How good or bad is it?
For this, you can try to determine a reasonable bottom line. "Guessing already means 50%" is what you might think. No, there are 7 million Wikipedia pages, so random guessing gives 0% F-score. Let's pick all the pages which actually mention the word "Hamilton". If we then look at more words like "Alexander Hamilton" or "Lewis Hamilton", we can reason that a basic non-AI regular approach should get 78% at least, so the AI model's 79.2% isn't impressive.
The highest reachable quality depends on the data itself and what people expect. "Hamilton won at Spa", do you expect Spa to point at the town or at the circuit? The room voted 60/40, so even the best answer itself can't be 100% correct :-)
A tip: if you get a bad result, investigate the training data to see if you can spot some structural problem (which you can then fix). Especially if you have your own annotated data. In her example, some of the annotators annotated circuit names including the "GP" or "grand prix" name ("Monaco GP") and others just the town name ("Spa").
Some more tips:
Unrelated photo from our 2025 holiday in Austria: just over the border in Germany, we stayed two days in Passau. View from the 'Oberhaus' castle on three rivers combining, with visibly different colors. From the left, the small, dark 'Ilz'. The big, drab-colored one in the middle is the 'Donau' (so 'schöne blaue Donau' should be taken with a grain of salt). From the right, also big, the much lighter 'Inn' (lots of granite sediment from the Alps, here).
]]>(One of my summaries of the Pycon NL one-day conference in Utrecht, NL).
Daniele works as director of engineering at Canonical (the company behind Ubuntu). What he wants to talk about today is how to define, measure and elevate engineering quality at scale. That's his job. He needs to influence/change that in an organization with a thousand technical people in dozens of teams with 100+ projects. They ideally must converge on the standards of quality he has defined and there's only one of me. Engineering people are opinionated people :-)
Your personal charm and charisma wears thin after a while: there needs to be a different way. So: how can you get 1000+ to do what you want, the way you want. Ideally somewhat willingly? You cannot make people do it. You'll have to be really enthousiastic about it.
He suggests three things:
It being a workshop, we worked through a few examples. Someone mentioned "improved test coverage in our software".
Why does this work with human beings?
Humans are funny creatures. As soon as they believe in something, it will carry them over many bumps in the road.
People love to see their work recognized. So if you maintain a spreadsheet with all the projects' results and progress, you won't have to ask them for an update: they will bug you if the spreadsheet hasn't been updated in a while. They really want to see the work they've put in!
You can get a positive feedback loop. If the work you need to do is clear, if the value is clear and if there is recognition, you'll want to do it almost automatically. And if you do it, you mention it in presentations and discussions with others. Then the others are automatically more motivated to work on it, too.
Giving kids a sticker when they do something successfully really helps. It also works for hard-core programmers and team managers!
https://reinout.vanrees.org/images/2025/austria-vacation-7.jpegUnrelated photo from our 2025 holiday in Austria: just over the border in Germany, Passau has a nice cathedral.
]]>(One of my summaries of the Pycon NL one-day conference in Utrecht, NL).
Full title: leading kedro: lessons from maintaining an open source python framework.
Merel is the tech lead of the python open source framework kedro.
What is open source? Ok, the source code is publicly available for anyone to use, modify and share. But it is also a concept of sharing. Developing together. "Peer production". It also means sharing of technical information and documentation. In the 1990s the actual term "open source" was coined. Also, an important milestone: Github was launched in 2008, greatly easing open source development.
Kedro is a python toolbox that applies software engineering principles to data science code, making it easier to go from prototype to production. Started in 2017, it was open sourced in 2019. (Note: Kedro has now been donated to the Linux foundation). This made it much easier to collaborate with others outside the original company (Quantumblack).
Open source also means maintenance challenges. It is not just code. Code is the simple part. How to attract contributors? How to get good quality contributions? What to accept/reject? How to balance quick wins with the long term vision of the project? How to make contributors come back?
What lessons did they learn?
Unrelated photo from our 2025 holiday in Austria: Neufelden has a dam+reservoir, the water travels downstream by underground pipe to the hydropower plant. At this point the pipe comes to the surface and crosses the river on a concrete construction. Nearby, the highest road bridge in this region also crosses.
]]>(One of my summaries of the Pycon NL one-day conference in Utrecht, NL).
He showed a drawing of Cornelis "wooden leg" Jol, a pirate from the 17th century from Sebastiaan's hometown. Why is he a pirate? He dresses like one, has a wooden leg, murders people like pirate and even has a parrot, so he's probably a pirate. For python programmers used to duck typing, this is familiar.
The 17th century, the Netherlands were economically wealthy. And had a big sea-faring empire. But they wanted a way to expand their might without paying for it. So... privatization to the rescue. You give pirates a vrijbrief, a government letter saying they've got some kind of "permission" from the Dutch government to rob and pillage and kill everybody as long it aren't Dutch people and ships. A privateer.So it looks like a pirate and behaves like a pirate, but it isn't technically a real pirate.
Now on to today. There are a lot of cyber threats. Often state-sponsored. You might have a false sense of security in working for a relatively small company instead of for a juicy government target. But... privateers are back! Lots of hacking companies have coverage of governments - as long as they hack other countries. And hacking small companies can also be profitable.
"I care about security". Do you really? What do real security people think? They think developers don't really pay much attention to it. Eye-roll at best, disinterest at worst. Basically, "it is somebody else's problem".
What you need is a security culture. A buy-in at every level. You can draw an analogy with safety culture at physically dangerous companies like petrochemical. So: you as developer, should argue for security with your boss. You are a developer, so you have a duty to speak up. Just like a generic employee at a chemical plant has the duty to speak when seeing something risky.
You don't have to become a security export (on top of everything else), but you do have to pay attention. Here are some pointers:
Unrelated photo from our 2025 holiday in Austria: center of Neufelden, nicely restored and beautifully painted.
]]>(One of my summaries of the Pycon NL one-day conference in Utrecht, NL).
Full title: tooling with purpose: making smart choices as you build.
Aris uses python and data to answers research questions about everything under the ground (as geophysicist).
As a programmer you have to make lots of choices. Python environment, core project tooling, project-specific tooling, etc.
First: python environment management: pyenv/venv/pip, poetry, uv. And conda/pixi for the scientific python world. A show of hands showed uv to be real popular.
Now core project tooling. Which project structure? Do you use a template/cookiecutter for it? Subdirectories? A testing framework? Pytest is the default, start with that. (He mentioned "doctests" becoming very popular: that surprised me, as they were popular before 2010 and started to be considered old and deprecated after 2010. I'll need to investigate a bit more).
Linting and type checking? Start with ruff for formatting/checking. Mypy is the standard type checker, but pyright/vscode and pyre are options. And the new ty is alpha, but looks promising.
Also, part of the core tooling: do you document your code? At least a README.
For domain specific tooling there are so many choices. It is easy to get lost. What to use for data storage? Web/API? Visualization tools. Scientific libraries.
Choose wisely! With great power comes great responsibility, but with great power also comes the burden of decision-making. Try to standardize. Enforce policies. Try to keep it simple.
Be aware of over-engineering. Over-engineering often comes with good intentions. And... sometimes complexity is the right path. As an example, look at database choices. You might wonder between SQL or a no-sql database and whether you need to shard your database. But often a simple sqlite database file is fast enough!
Configuration management: start with a simple os.getenv() and grab settings from environment variables. Only start using .toml files when that no longer fits your use case.
Web/api: start simple. You probably don't need authentication from the start if it is just a quick prototype. Get something useful working, first. Once it works, you can start working on deployment or a nicer frontend.
Async code is often said to be faster. But debugging is time-consuming and hard. Error handling is different. It only really pays off when you have many, many concurrent operations. Profile your code before you start switching to async. It won't speed up CPU-bound code.
Logging: just start using with the built-in logging module. Basic logging is better than no logging. Don't start the Perfect Fancy Logging Setup until you have the basics running.
Testing is good and recommended, but don't go overboard. Don't "mock" everything to get 100% coverage. Those kinds of tests break often. And often the tests test the mock instead of your actual code. Aim for the same amount of test code compared to your actual code.
Some closing comments:
Unrelated photo from our 2025 holiday in Austria: Neufelden station. From a 1991 train trip. I remembered the valley as being beautiful. As we now do our family holidays by train, I knew where to go as soon as Austria was chosen as destination.
]]>(One of my summaries of the Pycon NL one-day conference in Utrecht, NL).
Full title: from flask to fastapi: why and how we made the switch.
He works at "polarsteps", a travel app. Especially a travel app that will be used in areas with really bad internet connectivity. So performance is top of mind.
They used flask for a long time. Flask 2 added async, but it was still WSGI-bound. They really needed the async scaling possibility for their 4 million monthly users. Type hinting was also a big wish item for improved reliability.
They switched to fastapi:
This meant they gave up some things that Flask provided:
They did a gradual migration. So they needed to build a custom fastapi middleware that could support both worlds. And some api versioning to keep the two code bases apart. It took a lot of time to port everything over.
The middleware was key. Completely async in fastapi. Every request came through here. If needed, a request would be routed to Flask via wsgi, if possible it would go to the new fastapi part of the code.
For the migration, they made a dashboard of all the endpoints and the traffic volume. They migrated high-traffic APIs first: early infra validation. Attention to improvements by checking if the queries were faster. Lots of monitoring of both performance and errors.
Some lessons learned:
Unrelated photo from our 2025 holiday in Austria: the beautiful 'große Mühl' river valley.
]]>