Question: Bazel downloader should only be reading packages, versions, and hashes from pip.parse.requirements_lock, correct? #2564

dougthor42 asked this question in Q&A

@dougthor42 dougthor42

Jan 14, 2025

· 4 comments · 1 reply

Answered by aignas Return to top

dougthor42
Jan 14, 2025
Collaborator

We have a requirements lock file requirements.txt. It contains args for --index-url and --extra-index-url like so:

# This file was autogenerated by uv ...
# uv pip compile ...
--index-url https://pypi.org/simple
--extra-index-url=https://oauth2accesstoken@baz.com/simple
nurpc==1.0.0 \
 --hash=sha256:...

And our MODULE.bazel has:

pip.parse(
 experimental_index_url = "https://pypi.org/simple",
 experimental_index_url_overrides = {"package": "https//baz.com/simple", ...}, # note no "oauth2accesstoken"
 requirements_lock = "requirements.txt",
 ...

At this point Bazel fails for 401 unauthorized (which is why I'm down this rabbit hole in the first place... you can ignore this):

$ bazel clean --expunge_async; bazel run //:gazelle_python_manifest.update
...
===== stdout start ===== 
Looking in indexes: https://pypi.org/simple, https://****@baz.com/simple 
===== stdout end ===== 
===== stderr start ===== 
WARNING: 401 Error, Credentials not correct for https://baz.com/simple/nurpc/ 
ERROR: Could not find a version that satisfies the requirement nurpc==1.0.0 (from versions: none) 
ERROR: No matching distribution found for nurpc==1.0.0
...

But if I change requirements.txt:

---extra-index-url https://oauth2accesstoken@baz.com/simple
+--extra-index-url https://baz.com/simple

I get a different Bazel error. Whaaaaaat?:

$ bazel clean --expunge_async; bazel run //:gazelle_python_manifest.update
...
===== stdout start ===== 
Looking in indexes: https://pypi.org/simple, https://baz.com/simple 
User for baz.com: 
===== stdout end ===== 
===== stderr start ===== 
ERROR: Exception: 
Traceback (most recent call last):
...

Thus, it seems like Bazel may be reading --extra-index-url from the requirements.txt file. Is this intended? If so, why?

Given that the different error is asking for username and password, I'm inclined to think it's related to the credential helper. Our credential helper essentially just calls echo "{headers: {Bearer $(gcloud auth application-default print-access-token)} }", so that isn't reading requirements.txt.

Answered by aignas

Jun 2, 2025

I am +1 for the proposal and I think I had something similar in mind when it would come to actually start stabilizing the API.

The only issue or thing to have in mind is to ensure that the index is the same across all of the platforms and fail with an error. We are parsing the requirements files in parse_requirements function and incidentally we are using the same file to pull stuff from the PyPI, so that place can handle everything.

View full answer

Replies: 4 comments 1 reply

aignas
Jan 14, 2025
Maintainer

The index interaction happens in 2 places:

The bazel downloader in the pip.parse extension evaluation is using it to query the index and download the packages. The first error is most likely because the value we are using is from the pip.parse attribute - should we use the one from the file?
If you are building from sdist, in the whl_library. It seems that this is the second error that you are getting.

EDIT: If you are content with the answer feel free to convert the issue to a discussion (the best place for questions in general), otherwise we can groom this into a feature/bug ticket. :)

0 replies

dougthor42
Jan 15, 2025
Collaborator Author

The first error is most likely because the value we are using is from the pip.parse attribute - should we use the one from the file?

Hard to say. My initial reaction was to say "no, we should use the value from pip.parse.experimental_*" but I don't have any strong reasoning for that.

If you are building from sdist, in the whl_library. It seems that this is the second error that you are getting.

The nurpc thing being downloaded from the private index is already a wheel, so it shouldn't be building from sdist.

Image

convert the issue to a discussion

Ah, thanks! I wasn't aware we used Discussions (or maybe I forgot...). Done!

0 replies

aignas
May 31, 2025
Maintainer

Is this still a problem?

0 replies

dougthor42
May 31, 2025
Collaborator Author

There's a TL;DR proposal summary at the end.

Well it doesn't impact us but that's because I've been operating in the "just make sure that the index url in requirements lock file matches that defined in experimental_index_url" mode, but I still don't fully understand where the canonical source is.

I think the ideal situation is that there is only a single place where index URLs are set. It seems like pip.parse is logical place for this, especially with the experimental Bazel downloader stuff. Any time rules_python downloads things from an index, including building a wheel from sdist in whl_library, we should use the bazel downloader (and thus gets to use the credential helper).

This brings up an issue though - some workflows use requirements.in which contains --(extra-)index-url args and generate the lock file with compile_pip_requirements. This happens before pip.parse sets index URLs and thus we have to use the index urls set in requirements.in.

Maybe we make the requirements lock file the source of index URLs for pip.parse instead of setting them manually with experimental_(extra_)index_url(s)? So an API like:

pip.parse(
 experimental_bazel_downloader = True,
 experimental_index_url_overrides = {"package": "https//baz.com/simple", ...},
 requirements_lock = "requirements.txt",
 ...

If requirements.txt has:

--index_url https://foo/simple
--extra-index-url https://bar/simple
--extra-index-url https://foobar/simple
package==1.2.3 \
 --hash=sha256:abcd1234

The experimental_bazel_downloader = True API would be functionally equivalent to:

pip.parse(
 experimental_index_url = "https://foo/simple",
 experimental_extra_index_urls = ["https://bar/simple", "https://foobar/simple"]
 experimental_index_url_overrides = {"package": "https//baz.com/simple", ...}, # note no "oauth2accesstoken"
 requirements_lock = "requirements.txt",
 ...

The net result of such a change is that the requirements.in file is now the canonical source for index URLs no matter how the lock file is made:

flowchart TD
 A[req.in] --> B[pip compile<br>uv pip compile]
 A --> C[compile_pip_requirements]
 B --> D[req.lock]
 C --> D
 D --> |effectively sets| E[pip.parse.experimental_index_url<br>pip.parse.experimental_extra_index_urls]

Proposal summary:

Remove experimental_index_url, experimental_extra_index_urls
Add boolean experimental_bazel_downloader (or similar wording)
Have pip.parse pull index URLs from the requirements lock file. If not present, default to public PyPI.

I think keeping experimental_index_url_overrides is prudent.

Alternative:

Keep experimental_index_url, experimental_extra_index_urls but support special options like DEFER or similar wording that causes pip.parse to pull index URLs from the requirements lock file.

1 reply

@aignas

aignas Jun 2, 2025
Maintainer

I am +1 for the proposal and I think I had something similar in mind when it would come to actually start stabilizing the API.

Answer selected by dougthor42

Uh oh!

Question: Bazel downloader should only be reading packages, versions, and hashes from pip.parse.requirements_lock, correct? #2564

Uh oh!

dougthor42 Jan 14, 2025 Collaborator

Replies: 4 comments · 1 reply

Uh oh!

Uh oh!

aignas Jan 14, 2025 Maintainer

Uh oh!

dougthor42 Jan 15, 2025 Collaborator Author

Uh oh!

aignas May 31, 2025 Maintainer

Uh oh!

Uh oh!

dougthor42 May 31, 2025 Collaborator Author

Proposal summary:

Alternative:

Uh oh!

aignas Jun 2, 2025 Maintainer

dougthor42
Jan 14, 2025
Collaborator

Replies: 4 comments 1 reply

aignas
Jan 14, 2025
Maintainer

dougthor42
Jan 15, 2025
Collaborator Author

aignas
May 31, 2025
Maintainer

dougthor42
May 31, 2025
Collaborator Author

aignas Jun 2, 2025
Maintainer