feat(venv): support data, include, and scripts schemes#3726
feat(venv): support data, include, and scripts schemes#3726rickeylev merged 44 commits intobazel-contrib:mainfrom
Conversation
ref https://packaging.python.org/en/latest/specifications/binary-distribution-format/#installing-a-wheel-distribution-1-0-py32-none-any-whl for generic data scheme: these are usually installed to the venv root, but that seems ill-advised. So we install into a data sub-directory of the venv.
There was a problem hiding this comment.
Code Review
This pull request enhances wheel support by correctly handling and symlinking data/, bin/, and include/ directories within virtual environments. It updates the venv construction logic to recognize these directories, adds a new DATA symlink kind, and includes these paths in the wheel's data targets. Review feedback highlights a bug in path prefix matching for top-level directories, suggests removing internal 'TO CHECK' comments, and recommends refactoring how data targets are added to avoid redundancy and runfile bloat in non-venv environments.
|
Be sure you check the whl_library extraction starlark code, because it is handling some of the mapping already. |
…into whl.with.data
Uses os.name to correctly assert the Scripts/ and Include/ directory paths when creating the virtual environment on Windows.
Also add debug info to venvs_site_packages_libs_test to help diagnose Windows CI failures.
…addition These changes seemed to cause many regressions in WORKSPACE mode in CI.
Also revert redundant bazel_skylib additions to example WORKSPACE files.
The original target was intended to be internal-only, so it has been renamed and all references updated. No alias was added as per instructions.
The `whl_from_dir_repo` repository rule previously relied on the Unix `zip` utility to create wheels. Because this command isn't natively available on Windows, any tests that depended on repositories generated by this rule had to be explicitly skipped on Windows hosts. To fix this and expand our test coverage, this adds a native Windows fallback. When running on Windows, the rule now invokes a helper PowerShell script that uses .NET compression APIs to create the archive. This script ensures the resulting wheel remains uncompressed and uses zeroed-out timestamps to match the deterministic behavior of the original `zip -0X` command. With this constraint removed, the Unix-only compatibility flags (`SUPPORTS_BZLMOD_UNIXY`) have been dropped, enabling several namespace package and wheel-related integration tests to finally run on Windows.
…into whl.with.data
|
@gemini-code-assist review |
There was a problem hiding this comment.
Code Review
This pull request enhances the handling of wheel data, scripts, and headers within virtual environments by introducing a DATA symlink kind and updating venv directory mappings. The changes ensure that files from a wheel's .data directory are correctly placed in the venv root, bin, and include directories, while preventing direct linking of shared directories to avoid conflicts. Feedback suggests expanding the list of restricted directories to include site-packages, Lib, and Scripts to prevent accidental overwrites. Additionally, it is recommended to include data files unconditionally in whl_library_targets to ensure consistent behavior across different execution modes.
|
Alrighty, I think this is ready. Some more work to make the things in bin actually runnable is needed, but this at least gets files into the right directory |
aignas
left a comment
There was a problem hiding this comment.
LGTM with minor questions.
| out_venv_path = venv_path[len("data/"):] | ||
| kind = VenvSymlinkKind.DATA | ||
| prefix = "" | ||
| elif venv_path.startswith("include/"): |
There was a problem hiding this comment.
What about windows wheels? Do we not need to check for capitalized includes?
There was a problem hiding this comment.
No, because the wheel extraction always uses unix names. Platform-specific renaming happens at the py_binary level.
Admittedly, I'm not entirely happy with this block of logic because it assumes how whl_library extracted the contents. Just another good reason we need to create a py_extracted_wheel_library rule
| name = "verify_files_test", | ||
| srcs = ["verify_files_test.py"], | ||
| target_compatible_with = SUPPORTS_BZLMOD_UNIXY, | ||
| target_compatible_with = SUPPORTS_BZLMOD, |
There was a problem hiding this comment.
nit: would it be possible to do this target_compatible_with at the somepkg_with_build_files? Or are we running them on non-bzlmod as well?
Can be done in a separate PR.
There was a problem hiding this comment.
To set TCW on somepkg_with_build_files, it would need to be set by whl_library.
A recent PR added some of the whl_from_dir stuff to workspace. The trick was to put it in internal_dev_setup.bzl -- whl_library relies on some earlier rules_python repo generation.
Currently, the files from the data, headers, and scripts portions of a
wheel don't end up in the proper sub-directories of the venv. This
means the full files of a distribution aren't available at the typical
location in the venv, making it harder to integrate with standard
tools.
To fix, simply map the directories to paths in the venv and give them
similar treatment as site-packages.
The spec says certain first-level directories of the
.datadirectorymap to specific scheme paths, which
whl_libraryalready handles.Here's a listing of the
wheel data directory -> install scheme path key -> whl_library directoryrelationships
Relevant reading:
https://packaging.python.org/en/latest/specifications/binary-distribution-format
https://docs.python.org/3/library/sysconfig.html#posix-prefix
https://docs.python.org/3/library/sysconfig.html#nt
The whl_library rule uses posix names for extracting. When
materialized into a binary's venv, platform specific names are used:
Along the way ...
always included as part of depending on the library. They would be
in included in venv_site_packages=yes mode, so this better aligns
behavior of the two modes.
is_venv_site_packagesto_is_venv_site_packages_yestobetter represent the purpose and visibility of it.
using it for testing various special cases.