-
Notifications
You must be signed in to change notification settings - Fork 91
Feature Request: Duckdb extensions as pip extras #39
-
Why do you want this feature?
Problem:
In some scenarios, it is not allowed to download binaries at run time in production.
Proposed solution:
In python environments, I think it would make sense to include the extensions as extras where you can install them like python -m pip install duckdb[httpfs] for example.
Was this solution considered and discarded? if so, can you explain why?
If there are not big constrains, we will be happy to help in the implementation of this feature.
Note: This suggestion only applies to python, but I guess similar approaches could be implemented for other languages.
Beta Was this translation helpful? Give feedback.
All reactions
-
👍 23 -
❤️ 13
Replies: 9 comments 6 replies
-
Yes this feature would be very useful
currently I am facing the following error.
IOException: IO Error: Extension "C:\Users002円K75744.duckdb\extensions\v0.8.1\windows_amd64\httpfs.duckdb_extension" not found.
Extension "httpfs" is an existing extension.
Install it first using "INSTALL httpfs".
Beta Was this translation helpful? Give feedback.
All reactions
-
I also would love to see this feature. I work in an environment that is walled off from the internet. Only recently were we allowed to install anything on pypi through a proxy (before that, there was an internal mirror repo that was maintained for shorter list of packages). Anything outside of that is completely unreachable.
To say the least, it's very unexpected for a python package installed from pypi to require strange automatic installs from external repositories as well. It feels strange, from a security perspective, to normalize the automatic downloading and installing of packages at runtime.
Beta Was this translation helpful? Give feedback.
All reactions
-
+1
after trying to get approval to download from extensions.duckdb.org, or trying to build from source for some platforms e.g. manylinux, or finding the prebuilt/release artifacts in github and S3, was lucked to find some build artifacts for some Linux (e.g. linux_amd64_gcc4) for v1.0.0 1f98600 from the InvokeCI action https://github.com/duckdb/duckdb/actions/runs/9286093161. Find the package you need for your platform at the bottom of the action run page, and you need to log in to github first.
However, the extension of some platforms such as Windows and MacOS are not exposed there. Can they be exposed as well? Can all the artifacts be made available in the duckdb release page? Understand there's a lot taking spaces if not more than they already have.
Beta Was this translation helpful? Give feedback.
All reactions
-
I would also need this to enable bundling extensions with QStudio: https://www.timestored.com/qstudio/
Beta Was this translation helpful? Give feedback.
All reactions
-
I have just created a repo to download the extensions from duckdb repo and upload them to pypi. I have also added utility function to load the extension in duckdb.
pip install duckdb-extensions duckdb-extension-spatial
and then in python
from duckdb_extensions import extension_importer extension_importer.import_extension("spatial")
I really wish we don't have to maintain this for long and publishing to pypi becomes a core feature.
Beta Was this translation helpful? Give feedback.
All reactions
-
🎉 8 -
❤️ 8
-
Had the same struggling with FW. This is really helpful! Thank you! It will be nice if duckdb lab make this part of the python SDK core.
Beta Was this translation helpful? Give feedback.
All reactions
-
❤️ 2
-
@santosh-d3vpl3x thanks a lot for this, is it possible to support community extensions with this?
Beta Was this translation helpful? Give feedback.
All reactions
-
I've made a similar version except that it installs the extensions into the same layout that duckdb expects, which allows duckdb to directly load the extensions from a virtualenv instead of having to first install it/pollute ~/.duckdb, keeping the entire process relocatable/self-contained to within a single virtualenv.
So you can do something like
pip install duckdb==1.3.2 duckdb-ext[aws,delta,httpfs]
And it installs them into e.g. .../site-packages/duckdb_ext/extensions
Then you can either manually:
import duckdb import duckdb_ext with duckdb.connect() as con: con.execute("SET extension_directory = ?;", (duckdb_ext.get_extension_dir(),)) # Ideally also: con.execute("SET autoinstall_known_extensions = false;")
or with a helper function (that does the same thing)
import duckdb import duckdb_ext with duckdb_ext.init(duckdb.connect()) as con: ...
Then you can either rely on extension autoloading or explicitly load extensions
Beta Was this translation helpful? Give feedback.
All reactions
-
Just for the books, what I did as a workaround - and I would love to see that feature also - is
Create a file duckdb_install.py:
import duckdb
con = duckdb.connect()
con.execute("SET extension_directory = './duckdb/extension/directory'")
con.install_extension("avro")
and a Dockerfile:
FROM python:3.11
ARG PIP_EXTRA_INDEX_URL
# Set the working directory in the container
WORKDIR /code
# Copy the requirements file into the container
COPY requirements.txt .
# Install the dependencies
RUN pip install --no-cache-dir -r requirements.txt
# Copy the rest of the application code into the container
COPY . .
RUN python duckdb_install.py
# Command to run
CMD ["python", "lakehousewriter.py"]
In other words, I use duckdb to install the extensions in the current directory.
In the main program I obviously set the extension_directory also as a first step also.
Beta Was this translation helpful? Give feedback.
All reactions
-
some more use cases duckdb/duckdb#19127
Beta Was this translation helpful? Give feedback.
All reactions
-
For visibility, I'm including this here and closing duckdb/duckdb#19127.
Scenario 1: DuckDB is used as part of a serverless function and it is using extensions. So for every invocation of the serverless function a 20MB extension binary is downloaded - that does not make sense.
Scenario 2: All serverless functions/VMs/Containers are inside a virtual private network without internet access. So extensions cannot be installed on demand.
Scenario 3: The CICD pipeline does test everything, yet the extensions might get updated when DuckDB is started, potential breaking the code. We have CICD pipelines, test automation etc because we want to control what is being used.
DuckDB provides me with all the features to achieve above. In the build process I need to start DuckDB, set the extension target directory, let duckdb choose the matching version & hardware variant and then include those files in the deployable.
The easiest way of doing that would be to add the extension into the Python requirements.txt file. DuckDB checks this directory first and downloads the extension only in case it is not found locally.
Another option is a static build of DuckDB, but that would be even more work.
Beta Was this translation helpful? Give feedback.
All reactions
-
Is it already on the roadmap?
https://duckdb.org/roadmap -> I do not see it here?
I would love to have this feature in 1.5.0 (2026年02月16日).
https://duckdb.org/release_calendar
Beta Was this translation helpful? Give feedback.
All reactions
-
@santosh-d3vpl3x's https://github.com/santosh-d3vpl3x/duckdb_extensions is awesome!
It sounds like there are a number of issues that people want solved. I'm not completely sure I have a great overview of them yet. This is what I can gather:
- Installing extensions when you don't have internet access.1 This is for people who can access pypi.org (or some mirror) and cannot access duckdb.org.
- Not needing to download extensions during the execution of your script.2 You'd
pip install duckdb[ext1, ext2]instead of a runtime / startup install withINSTALL ext1etc. The benefit here is that it is easier to bundle the application, and not having to download extensions as part of script startup / execution. - CI/CD reproducibility.3 I'm not completely sure about this one, more info would be very welcome.
Does this cover it?
This is not on our roadmap as of yet and is not a feature that will be shipped in 1.5.0. Taking on the burden of more distribution channels of N extensions (publishing them on PyPI) is also not something we'd take on lightly.
Footnotes
-
Comments by @JorgeGarciaIrazabal, @bodacious-bill, @yinzhs and @wernerdaehn ↩
-
Comment by @wernerdaehn, @santosh-d3vpl3x and @jtanx ↩
-
Scenario 3 in @wernerdaehn 's use cases comment ↩
Beta Was this translation helpful? Give feedback.
All reactions
-
🚀 2
-
I would say yes, you covered it.
The process should also be self explanatory: the ability to add extensions to the requirements.txt.
I mean, it is a requirement, an external library, needed for the code to work.
Once it is there, some will install all requirements in the runtime environment, others would download all requirements into a target directory and then bundle code and libraries.
From a technical point of view it means
- Extensions can be downloaded via the requirements.txt
- Duckdb looks for extensions in the same locations as python does.
Note that requirements.txt supports pip install but also downloading from GitHub repos directly!
Beta Was this translation helpful? Give feedback.
All reactions
-
👍 2
-
Sounds spot on.
I have been following this issue for the reasons listed.
My take on the CI/CD reproducibility. I would generally like a CI test job that ran last week to produce the same result today. This is currently not guaranteed as most users just install the extension like "install aws;"
The AWS module might be updated and could break a feature.
I think most people has the following problem.
Not needing to download extensions during the execution of your script
This will solve most problems.
Beta Was this translation helpful? Give feedback.
All reactions
-
👍 2