Ray 编译踩坑记:老版本在老系统上的编译之路

背景

Ray 官方提供了安装文档编译文档,涵盖了多种预构建方案——从稳定的发行版本、到日常构建版本,甚至支持主分支上任意 commit 的构建版本。在多平台、多 Python 版本和多芯片架构的组合下,这些预构建方案在大多数情况下都能满足需求。

但当涉及到在生产环境中维护 HotFix 版本时,从源码编译就成了必不可少的技能。特别是在一些企业环境中,你可能面临的是在 CentOS 8 这样的老系统上,基于 Ray 2.40.0 这样的较早版本进行代码改动和编译。官方的编译文档看起来步骤清晰,但当版本和系统都比较旧的时候,往往会踩到文档没有提及的坑。这些坑有的是版本兼容性问题,有的是系统环境的特殊性导致的,网络上也很难找到现成的解决方案。

面对这些问题,我在 macOS 和 CentOS 8 上编译 Ray 2.40.0 的实践中逐个排查和解决了这些问题,本文记录了其中的关键坑点和解法,希望能给未来有类似编译需求且遇到坑的同学一些帮助。

在 MacBook 上编译 Ray

我的开发环境是 MacBook M4 Pro。由于本地开发方便,因此我优先在 MacBook 上尝试编译了 Ray,优先探索老版本编译可能遇到的问题。

环境配置

参考 Ray 官方的编译文档即可。

  1. 克隆 Ray 仓库
    1
    2
    git clone git@github.com:ray-project/ray.git
    cd ray
  2. 安装 Bazel 编译环境。执行以下命令会通过 bazelisk 安装 v6.5.0 版本的 bazel 到 ~/bin/ 目录。需要手动将 ~/bin/ 加入 PATH 才能访问 bazel 命令。
    1
    2
    3
    4
    5
    6
    brew update
    brew install wget

    ci/env/install-bazel.sh
    echo 'export PATH="$PATH:~/bin"' >> ~/.bashrc
    exec bash
  3. 官网下载安装 Node.js,安装后确保 npm 和 node 命令可用即可。实测较高版本并不会导致编译失败,如果后续编译 dashboard 时遇到问题可回退到官方指定的 Node 版本。
  4. 官网下载安装 Anaconda。
  5. 使用 Anaconda 创建 Python 环境。为避免潜在的依赖不一致问题,建议 Python 版本与目标环境保持一致。
    1
    2
    conda create -n ray-compile python=3.11.9
    conda activate ray-compile

编译 2.52.1

为保证可复现性,我选择基于当前最新正式版本 2.52.1 而非 master 分支的最新 commit 进行编译。

注意:如果仅修改了 Python 文件,可参考官方文档直接替换 pip 中的 Python 文件即可,无需进行以下复杂的 C++ 编译。

  1. 切换到 2.52.1 版本
    1
    git checkout ray-2.52.1
  2. 编译 dashboard(约 3 分钟)
    1
    2
    3
    cd python/ray/dashboard/client
    npm ci
    npm run build
  3. 编译 Ray
    1
    2
    3
    4
    cd -
    cd python/
    pip install -r requirements.txt
    pip install -e . --verbose
  4. 编译成功,输出如下
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    > python git:(ray-2.52.1) pip install -e . --verbose
    Using pip 25.3 from /opt/anaconda3/envs/ray-compile/lib/python3.11/site-packages/pip (python 3.11)
    Obtaining file:///Users/xytan/Desktop/study/ray/python
    Running command installing build dependencies
    Using pip 25.3 from /opt/anaconda3/envs/ray-compile/lib/python3.11/site-packages/pip (python 3.11)
    Collecting setuptools>=40.8.0
    Obtaining dependency information for setuptools>=40.8.0 from https://files.pythonhosted.org/packages/a3/dc/17031897dae0efacfea57dfd3a82fdd2a2aeb58e0ff71b77b87e44edc772/setuptools-80.9.0-py3-none-any.whl.metadata
    Using cached setuptools-80.9.0-py3-none-any.whl.metadata (6.6 kB)
    Using cached setuptools-80.9.0-py3-none-any.whl (1.2 MB)
    Installing collected packages: setuptools
    Successfully installed setuptools-80.9.0
    Installing build dependencies ... done
    Running command Checking if build backend supports build_editable
    Checking if build backend supports build_editable ... done
    Running command Getting requirements to build editable
    Getting requirements to build editable ... done
    Running command installing backend dependencies
    Using pip 25.3 from /opt/anaconda3/envs/ray-compile/lib/python3.11/site-packages/pip (python 3.11)
    ...
    ...
    ...
    Building editable for ray (pyproject.toml) ... done
    Created wheel for ray: filename=ray-2.52.1-0.editable-cp311-cp311-macosx_11_0_arm64.whl size=7592 sha256=95a5cacd0ec290dbca09851988ac9bb0de54c9ddedbee169e7fa8a84428b5e21
    Stored in directory: /private/var/folders/xx/j9ztcfr55_d_3p24y1v_4mzw0000gn/T/pip-ephem-wheel-cache-6wzkmjte/wheels/3b/4a/f0/6edffb2ad8c786ba8990ff9495668d930965bc91921b146ea6
    Successfully built ray
    Installing collected packages: ray
    changing mode of /opt/anaconda3/envs/ray-compile/bin/ray to 755
    changing mode of /opt/anaconda3/envs/ray-compile/bin/serve to 755
    changing mode of /opt/anaconda3/envs/ray-compile/bin/tune to 755
    Successfully installed ray-2.52.1

编译 2.40.0

成功编译 2.52.1 后,下一步尝试编译 2.40.0 版本。

首先执行以下命令,预期编译能够顺利完成:

1
2
3
git checkout ray-2.40.0
pip install -r requirements.txt
pip install -e . --verbose

然而出现了以下报错:No module named pip

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
...
...
...
running build_py
running build_ext
/opt/anaconda3/envs/ray-compile/bin/python3.11: No module named pip
Traceback (most recent call last):
File "/opt/anaconda3/envs/ray-compile/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 389, in <module>
main()
File "/opt/anaconda3/envs/ray-compile/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 373, in main
json_out["return_val"] = hook(**hook_input["kwargs"])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/ray-compile/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 303, in build_editable
return hook(wheel_directory, config_settings, metadata_directory)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/private/var/folders/xx/j9ztcfr55_d_3p24y1v_4mzw0000gn/T/pip-build-env-dszuoqoi/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 468, in build_editable
return self._build_with_temp_dir(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/private/var/folders/xx/j9ztcfr55_d_3p24y1v_4mzw0000gn/T/pip-build-env-dszuoqoi/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 404, in _build_with_temp_dir
self.run_setup()
File "/private/var/folders/xx/j9ztcfr55_d_3p24y1v_4mzw0000gn/T/pip-build-env-dszuoqoi/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 512, in run_setup
super().run_setup(setup_script=setup_script)
File "/private/var/folders/xx/j9ztcfr55_d_3p24y1v_4mzw0000gn/T/pip-build-env-dszuoqoi/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 317, in run_setup
exec(code, locals())
File "<string>", line 784, in <module>
File "/private/var/folders/xx/j9ztcfr55_d_3p24y1v_4mzw0000gn/T/pip-build-env-dszuoqoi/overlay/lib/python3.11/site-packages/setuptools/__init__.py", line 115, in setup
return distutils.core.setup(**attrs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/private/var/folders/xx/j9ztcfr55_d_3p24y1v_4mzw0000gn/T/pip-build-env-dszuoqoi/overlay/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 186, in setup
return run_commands(dist)
^^^^^^^^^^^^^^^^^^
File "/private/var/folders/xx/j9ztcfr55_d_3p24y1v_4mzw0000gn/T/pip-build-env-dszuoqoi/overlay/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 202, in run_commands
dist.run_commands()
File "/private/var/folders/xx/j9ztcfr55_d_3p24y1v_4mzw0000gn/T/pip-build-env-dszuoqoi/overlay/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 1002, in run_commands
self.run_command(cmd)
File "/private/var/folders/xx/j9ztcfr55_d_3p24y1v_4mzw0000gn/T/pip-build-env-dszuoqoi/overlay/lib/python3.11/site-packages/setuptools/dist.py", line 1102, in run_command
super().run_command(command)
File "/private/var/folders/xx/j9ztcfr55_d_3p24y1v_4mzw0000gn/T/pip-build-env-dszuoqoi/overlay/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 1021, in run_command
cmd_obj.run()
File "/private/var/folders/xx/j9ztcfr55_d_3p24y1v_4mzw0000gn/T/pip-build-env-dszuoqoi/overlay/lib/python3.11/site-packages/setuptools/command/editable_wheel.py", line 139, in run
self._create_wheel_file(bdist_wheel)
File "/private/var/folders/xx/j9ztcfr55_d_3p24y1v_4mzw0000gn/T/pip-build-env-dszuoqoi/overlay/lib/python3.11/site-packages/setuptools/command/editable_wheel.py", line 349, in _create_wheel_file
files, mapping = self._run_build_commands(dist_name, unpacked, lib, tmp)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/private/var/folders/xx/j9ztcfr55_d_3p24y1v_4mzw0000gn/T/pip-build-env-dszuoqoi/overlay/lib/python3.11/site-packages/setuptools/command/editable_wheel.py", line 272, in _run_build_commands
self._run_build_subcommands()
File "/private/var/folders/xx/j9ztcfr55_d_3p24y1v_4mzw0000gn/T/pip-build-env-dszuoqoi/overlay/lib/python3.11/site-packages/setuptools/command/editable_wheel.py", line 299, in _run_build_subcommands
self.run_command(name)
File "/private/var/folders/xx/j9ztcfr55_d_3p24y1v_4mzw0000gn/T/pip-build-env-dszuoqoi/overlay/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 357, in run_command
self.distribution.run_command(command)
File "/private/var/folders/xx/j9ztcfr55_d_3p24y1v_4mzw0000gn/T/pip-build-env-dszuoqoi/overlay/lib/python3.11/site-packages/setuptools/dist.py", line 1102, in run_command
super().run_command(command)
File "/private/var/folders/xx/j9ztcfr55_d_3p24y1v_4mzw0000gn/T/pip-build-env-dszuoqoi/overlay/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 1021, in run_command
cmd_obj.run()
File "<string>", line 772, in run
File "<string>", line 674, in pip_run
File "<string>", line 542, in build
File "/opt/anaconda3/envs/ray-compile/lib/python3.11/subprocess.py", line 413, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/opt/anaconda3/envs/ray-compile/bin/python3.11', '-m', 'pip', 'install', '-q', '--target=/Users/xytan/Desktop/study/ray/python/ray/thirdparty_files', 'psutil', 'setproctitle==1.2.2', 'colorama']' returned non-zero exit status 1.
An error occurred when building editable wheel for ray.
See debugging tips in: https://setuptools.pypa.io/en/latest/userguide/development_mode.html#debugging-tips
error: subprocess-exited-with-error

× Building editable for ray (pyproject.toml) did not run successfully.
exit code: 1
╰─> No available output.

note: This error originates from a subprocess, and is likely not a problem with pip.
full command: /opt/anaconda3/envs/ray-compile/bin/python3.11 /opt/anaconda3/envs/ray-compile/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py build_editable /var/folders/xx/j9ztcfr55_d_3p24y1v_4mzw0000gn/T/tmp2ifs64vi
cwd: /Users/xytan/Desktop/study/ray/python
Building editable for ray (pyproject.toml) ... error
ERROR: Failed building editable for ray
Failed to build ray
error: failed-wheel-build-for-install

× Failed to build installable wheels for some pyproject.toml based projects
╰─> ray

遇到该报错后,我检查了 2.40.0 版本的官方编译文档,确认流程完全符合文档步骤。

按理说,当前 conda 环境应该能找到 python3 和 pip,但调用 pip install -e . 时却报错。查看相关代码后发现,Ray 是通过子进程来安装这些 pip 包的:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# Note: We are passing in sys.executable so that we use the same
# version of Python to build packages inside the build.sh script. Note
# that certain flags will not be passed along such as --user or sudo.
# TODO(rkn): Fix this.
if not os.getenv("SKIP_THIRDPARTY_INSTALL"):
pip_packages = ["psutil", "setproctitle==1.2.2", "colorama"]
subprocess.check_call(
[
sys.executable,
"-m",
"pip",
"install",
"-q",
"--target=" + os.path.join(ROOT_DIR, THIRDPARTY_SUBDIR),
]
+ pip_packages,
env=dict(os.environ, CC="gcc"),
)

# runtime env agent dependenceis
runtime_env_agent_pip_packages = ["aiohttp"]
subprocess.check_call(
[
sys.executable,
"-m",
"pip",
"install",
"-q",
"--target=" + os.path.join(ROOT_DIR, RUNTIME_ENV_AGENT_THIRDPARTY_SUBDIR),
]
+ runtime_env_agent_pip_packages
)

我尝试直接在命令行执行 /opt/anaconda3/envs/ray-compile/bin/python3.11 -m pip install -q --target=/Users/xytan/Desktop/study/ray/python/ray/thirdparty_files psutil setproctitle==1.2.2 colorama,发现可以成功。这说明问题出在子进程执行环境中,可能是子进程初始化时未包含完整的 conda 环境。带着这些上下文,我咨询了 ChatGPT、DeepSeek、Qwen 等大模型,给出的方案包括修改 ~/.bazelrc、将 python 和 pip 加入 /etc/profile 的 PATH 等,但均未能解决问题。

由于对 Ray 编译的复杂度有所顾虑,担心白盒分析耗时不可控,我转而去 Ray 的 issue 区寻找线索。幸运的是找到了一个相同报错的 issue,遗憾的是该 issue 自 2024 年初创建至今近两年仍未关闭,官方的回复也未给出直接的解决方案,看起来是个棘手的环境问题。

这个问题说来也有些离谱:按照 Ray 官方文档竟然无法从源码编译,这算是个挺严重的问题。不知道 Ray 官方当时构建 2.40.0 版本时是如何操作的,也许是在一个包含所有依赖的沙箱环境中进行,因而未发现此问题。

既然官方也没有提供解决方案,白盒分析又耗时不可控,那有没有高效的黑盒方法来定位问题呢?

我灵机一动:既然 2.52.1 版本可以编译,2.40.0 版本不行,虽然两者相隔近五千个 commit,但可以用 git bisect 二分查找第一个能编译的 commit。由于不可编译的版本只需执行 pip install -e . --verbose 十秒内就能复现错误,理论上最多 13 次、耗时不到 3 分钟即可定位。

按照这个思路,我先通过 git merge-base ray-2.52.1 ray-2.40.0 获取两个分支的公共祖先 02ac0cdc7adf5e611134840c73fa47dd7866140d

经测试,ray-2.52.1 可以编译,公共祖先版本不可编译,满足二分条件。

执行 git bisect start ray-2.52.1 02ac0cdc7adf5e611134840c73fa47dd7866140d 开始二分查找。需要注意的是,bisect 默认假定新版本为 bad、旧版本为 good,用于寻找第一个引入 bad 的 commit。而我们的情况恰好相反——新版本可编译、旧版本不可编译,因此在判断 good/bad 时需反向操作。

以下是二分的详细过程,总耗时不超过 5 分钟即定位到第一个使编译成功的 commit:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
# 只需 11 次即可定位,二分效率惊人
git bisect start ray-2.52.1 02ac0cdc7adf5e611134840c73fa47dd7866140d
# Bisecting: 2469 revisions left to test after this (roughly 11 steps)
# [07f509670a9857d3507fcc9defdc5487d8083758] [data] Refactor interface for actor_pool_map_operator (#53752)

# git bisect 必须在项目根目录执行,因此退回上级目录,pip install 命令中的路径也相应调整
cd ..
pip install -e python --verbose

git bisect bad
pip install -e python --verbose

git bisect bad
pip install -e python --verbose

git bisect good
pip install -e python --verbose

git bisect bad
pip install -e python --verbose

git bisect good
pip install -e python --verbose

git bisect bad
pip install -e python --verbose

git bisect good
pip install -e python --verbose

git bisect bad
pip install -e python --verbose

git bisect bad
pip install -e python --verbose

git bisect good
pip install -e python --verbose

git bisect bad
pip install -e python --verbose

git bisect bad

出现结果如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
d2004b6353e131bb67e1bc7f771a09780ee32d2a is the first bad commit
commit d2004b6353e131bb67e1bc7f771a09780ee32d2a
Author: Philipp Moritz <pcmoritz@gmail.com>
Date: Thu Feb 13 00:08:30 2025 -0800

[Core] Initial port of Ray to Python 3.13 (#47984)

<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

This is the first step towards
https://github.com/ray-project/ray/issues/47933

It is not very tested at the moment (on Python 3.13), but it compiles
locally (with `pip install -e . --verbose`) and can execute a simple
workload like

>>> import ray
>>> ray.init()
2024-10-10 16:03:31,857 INFO worker.py:1799 -- Started a local Ray instance.
RayContext(dashboard_url='', python_version='3.13.0', ray_version='3.0.0.dev0', ray_commit='{{RAY_COMMIT_SHA}}')
>>> @ray.remote
... def f():
... return 42
...
>>> ray.get(f.remote())
42
>>>

(and similar for actors).

The main thing that needed to change to make Ray work on Python 3.13 was
to upgrade Cython to 3.0.11 which seems to be the first version of
Cython to support Python 3.13. Unfortunately it has a compiler bug
https://github.com/cython/cython/pull/3235 (the fix is not released yet)
that I had to work around.

I also had to work around https://github.com/cython/cython/issues/5750
by changing some typing from `float` to `int | float`.

## Related issue number

<!-- For example: "Closes #1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
- [ ] Unit tests
- [ ] Release tests
- [ ] This PR is not tested :(

---------

Signed-off-by: Philipp Moritz <pcmoritz@gmail.com>
Co-authored-by: pcmoritz <pcmoritz@anyscale.com>
Co-authored-by: srinathk10 <68668616+srinathk10@users.noreply.github.com>
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>

bazel/ray_deps_setup.bzl | 4 +-
python/ray/_raylet.pxd | 3 +-
python/ray/_raylet.pyx | 20 ++++++----
python/ray/includes/gcs_client.pxi | 28 +++++++-------
python/ray/includes/global_state_accessor.pxi | 8 ++--
python/ray/includes/object_ref.pxi | 2 +-
python/ray/includes/unique_ids.pxd | 53 +++++++--------------------
python/ray/includes/unique_ids.pxi | 10 ++---
python/setup.py | 4 +-
9 files changed, 55 insertions(+), 77 deletions(-)

分析该 commit 的代码,发现这是 AnyScale CTO 为 Ray 添加 Python 3.13 支持的改动,遗憾的是 PR 描述中并未提及任何编译问题的修复,应该是无意间修复了此问题。

因此只能深入代码寻找原因。幸运的是这个 PR 只修改了 9 个文件、不到 100 行代码,可以较直观地分析为何这个 commit 使编译得以成功。

经排查,发现该 commit 在 setup.py 中只修改了两行代码,其中关键的一行是为 setup_requires 增加了 pip 依赖。这个改动与之前的报错高度吻合:setup_requires 正是用于在子进程中初始化构建依赖的。

1
2
3
4
5
6
7
8
@@ -807,7 +807,7 @@ setuptools.setup(
# The BinaryDistribution argument triggers build_ext.
distclass=BinaryDistribution,
install_requires=setup_spec.install_requires,
- setup_requires=["cython >= 0.29.32", "wheel"],
+ setup_requires=["cython >= 3.0.12", "pip", "wheel"],
extras_require=setup_spec.extras,
entry_points={

找到原因后,我立即切换到 ray-2.40.0 分支并在 setup.py 中添加 pip 依赖,改动如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
diff --git a/python/setup.py b/python/setup.py
index 16017fa544..28ffef2503 100644
--- a/python/setup.py
+++ b/python/setup.py
@@ -807,7 +807,7 @@ setuptools.setup(
# The BinaryDistribution argument triggers build_ext.
distclass=BinaryDistribution,
install_requires=setup_spec.install_requires,
- setup_requires=["cython >= 0.29.32", "wheel"],
+ setup_requires=["cython >= 0.29.32", "pip", "wheel"],
extras_require=setup_spec.extras,
entry_points={
"console_scripts": [

执行 pip install -e python --verbose 后,之前的报错消失了,说明改动生效。但不幸的是又出现了新的报错:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
[2,597 / 5,261] Executing genrule @com_github_antirez_redis//:bin; 3s local ... (12 actions, 10 running)
ERROR: /private/var/tmp/_bazel_xytan/13505f911ec68d8fcfe382f9a26054b3/external/zlib/BUILD.bazel:1:11: Compiling zutil.c failed: (Exit 1): cc_wrapper.sh failed: error executing command (from target @zlib//:zlib)
(cd /private/var/tmp/_bazel_xytan/13505f911ec68d8fcfe382f9a26054b3/sandbox/darwin-sandbox/12269/execroot/com_github_ray_project_ray && \
exec env - \
PATH=/Users/xytan/Library/Caches/bazelisk/downloads/bazelbuild/bazel-6.5.0-darwin-arm64/bin:/private/var/folders/xx/j9ztcfr55_d_3p24y1v_4mzw0000gn/T/pip-build-env-v2znngb3/overlay/bin:/private/var/folders/xx/j9ztcfr55_d_3p24y1v_4mzw0000gn/T/pip-build-env-v2znngb3/normal/bin:/Users/xytan/.local/bin:/opt/anaconda3/envs/ray-compile/bin:/opt/anaconda3/condabin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/opt/pmk/env/global/bin:/Applications/iTerm.app/Contents/Resources/utilities:/Library/Java/JavaVirtualMachines/zulu-11.jdk/Contents/Home/bin:/usr/local/maven/bin:/Users/xytan/bin:/Library/Java/JavaVirtualMachines/zulu-11.jdk/Contents/Home/bin:/usr/local/maven/bin:/Users/xytan/bin \
PWD=/proc/self/cwd \
external/local_config_cc/cc_wrapper.sh -U_FORTIFY_SOURCE -fstack-protector -Wall -Wthread-safety -Wself-assign -Wunused-but-set-parameter -Wno-free-nonheap-object -fcolor-diagnostics -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections -fdata-sections -MD -MF bazel-out/darwin_arm64-opt/bin/external/zlib/_objs/zlib/zutil.pic.d '-frandom-seed=bazel-out/darwin_arm64-opt/bin/external/zlib/_objs/zlib/zutil.pic.o' -fPIC '-DBAZEL_CURRENT_REPOSITORY="zlib"' -iquote external/zlib -iquote bazel-out/darwin_arm64-opt/bin/external/zlib -isystem external/zlib -isystem bazel-out/darwin_arm64-opt/bin/external/zlib -fPIC -Werror -w '-Wno-error=implicit-function-declaration' -no-canonical-prefixes -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -c external/zlib/zutil.c -o bazel-out/darwin_arm64-opt/bin/external/zlib/_objs/zlib/zutil.pic.o)
# Configuration: 5f13e584be259b429338435560124496342d10ebccdd9918322724af70f69ddb
# Execution platform: @local_config_platform//:host

Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
In file included from external/zlib/zutil.c:10:
In file included from external/zlib/gzguts.h:21:
In file included from /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/stdio.h:61:
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/_stdio.h:318:7: error: expected identifier or '('
318 | FILE *fdopen(int, const char *) __DARWIN_ALIAS_STARTING(__MAC_10_6, __IPHONE_2_0, __DARWIN_ALIAS(fdopen));
| ^
external/zlib/zutil.h:147:33: note: expanded from macro 'fdopen'
147 | # define fdopen(fd,mode) NULL /* No fdopen() */
| ^
/Library/Developer/CommandLineTools/usr/lib/clang/17/include/__stddef_null.h:26:16: note: expanded from macro 'NULL'
26 | #define NULL ((void*)0)
| ^
In file included from external/zlib/zutil.c:10:
In file included from external/zlib/gzguts.h:21:
In file included from /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/stdio.h:61:
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/_stdio.h:318:7: error: expected ')'
external/zlib/zutil.h:147:33: note: expanded from macro 'fdopen'
147 | # define fdopen(fd,mode) NULL /* No fdopen() */
| ^
/Library/Developer/CommandLineTools/usr/lib/clang/17/include/__stddef_null.h:26:16: note: expanded from macro 'NULL'
26 | #define NULL ((void*)0)
| ^
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/_stdio.h:318:7: note: to match this '('
external/zlib/zutil.h:147:33: note: expanded from macro 'fdopen'
147 | # define fdopen(fd,mode) NULL /* No fdopen() */
| ^
/Library/Developer/CommandLineTools/usr/lib/clang/17/include/__stddef_null.h:26:15: note: expanded from macro 'NULL'
26 | #define NULL ((void*)0)
| ^
In file included from external/zlib/zutil.c:10:
In file included from external/zlib/gzguts.h:21:
In file included from /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/stdio.h:61:
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/_stdio.h:318:7: error: expected ')'
318 | FILE *fdopen(int, const char *) __DARWIN_ALIAS_STARTING(__MAC_10_6, __IPHONE_2_0, __DARWIN_ALIAS(fdopen));
| ^
external/zlib/zutil.h:147:33: note: expanded from macro 'fdopen'
147 | # define fdopen(fd,mode) NULL /* No fdopen() */
| ^
/Library/Developer/CommandLineTools/usr/lib/clang/17/include/__stddef_null.h:26:22: note: expanded from macro 'NULL'
26 | #define NULL ((void*)0)
| ^
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/_stdio.h:318:7: note: to match this '('
external/zlib/zutil.h:147:33: note: expanded from macro 'fdopen'
147 | # define fdopen(fd,mode) NULL /* No fdopen() */
| ^
/Library/Developer/CommandLineTools/usr/lib/clang/17/include/__stddef_null.h:26:14: note: expanded from macro 'NULL'
26 | #define NULL ((void*)0)
| ^
3 errors generated.
INFO: Elapsed time: 6.329s, Critical Path: 4.04s
INFO: 512 processes: 360 internal, 152 darwin-sandbox.
FAILED: Build did NOT complete successfully
Traceback (most recent call last):
File "/opt/anaconda3/envs/ray-compile/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 389, in <module>
main()
File "/opt/anaconda3/envs/ray-compile/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 373, in main
json_out["return_val"] = hook(**hook_input["kwargs"])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/ray-compile/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 303, in build_editable
return hook(wheel_directory, config_settings, metadata_directory)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/private/var/folders/xx/j9ztcfr55_d_3p24y1v_4mzw0000gn/T/pip-build-env-v2znngb3/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 468, in build_editable
return self._build_with_temp_dir(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/private/var/folders/xx/j9ztcfr55_d_3p24y1v_4mzw0000gn/T/pip-build-env-v2znngb3/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 404, in _build_with_temp_dir
self.run_setup()
File "/private/var/folders/xx/j9ztcfr55_d_3p24y1v_4mzw0000gn/T/pip-build-env-v2znngb3/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 512, in run_setup
super().run_setup(setup_script=setup_script)
File "/private/var/folders/xx/j9ztcfr55_d_3p24y1v_4mzw0000gn/T/pip-build-env-v2znngb3/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 317, in run_setup
exec(code, locals())
File "<string>", line 784, in <module>
File "/private/var/folders/xx/j9ztcfr55_d_3p24y1v_4mzw0000gn/T/pip-build-env-v2znngb3/overlay/lib/python3.11/site-packages/setuptools/__init__.py", line 115, in setup
return distutils.core.setup(**attrs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/private/var/folders/xx/j9ztcfr55_d_3p24y1v_4mzw0000gn/T/pip-build-env-v2znngb3/overlay/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 186, in setup
return run_commands(dist)
^^^^^^^^^^^^^^^^^^
File "/private/var/folders/xx/j9ztcfr55_d_3p24y1v_4mzw0000gn/T/pip-build-env-v2znngb3/overlay/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 202, in run_commands
dist.run_commands()
File "/private/var/folders/xx/j9ztcfr55_d_3p24y1v_4mzw0000gn/T/pip-build-env-v2znngb3/overlay/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 1002, in run_commands
self.run_command(cmd)
File "/private/var/folders/xx/j9ztcfr55_d_3p24y1v_4mzw0000gn/T/pip-build-env-v2znngb3/overlay/lib/python3.11/site-packages/setuptools/dist.py", line 1102, in run_command
super().run_command(command)
File "/private/var/folders/xx/j9ztcfr55_d_3p24y1v_4mzw0000gn/T/pip-build-env-v2znngb3/overlay/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 1021, in run_command
cmd_obj.run()
File "/private/var/folders/xx/j9ztcfr55_d_3p24y1v_4mzw0000gn/T/pip-build-env-v2znngb3/overlay/lib/python3.11/site-packages/setuptools/command/editable_wheel.py", line 139, in run
self._create_wheel_file(bdist_wheel)
File "/private/var/folders/xx/j9ztcfr55_d_3p24y1v_4mzw0000gn/T/pip-build-env-v2znngb3/overlay/lib/python3.11/site-packages/setuptools/command/editable_wheel.py", line 349, in _create_wheel_file
files, mapping = self._run_build_commands(dist_name, unpacked, lib, tmp)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/private/var/folders/xx/j9ztcfr55_d_3p24y1v_4mzw0000gn/T/pip-build-env-v2znngb3/overlay/lib/python3.11/site-packages/setuptools/command/editable_wheel.py", line 272, in _run_build_commands
self._run_build_subcommands()
File "/private/var/folders/xx/j9ztcfr55_d_3p24y1v_4mzw0000gn/T/pip-build-env-v2znngb3/overlay/lib/python3.11/site-packages/setuptools/command/editable_wheel.py", line 299, in _run_build_subcommands
self.run_command(name)
File "/private/var/folders/xx/j9ztcfr55_d_3p24y1v_4mzw0000gn/T/pip-build-env-v2znngb3/overlay/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 357, in run_command
self.distribution.run_command(command)
File "/private/var/folders/xx/j9ztcfr55_d_3p24y1v_4mzw0000gn/T/pip-build-env-v2znngb3/overlay/lib/python3.11/site-packages/setuptools/dist.py", line 1102, in run_command
super().run_command(command)
File "/private/var/folders/xx/j9ztcfr55_d_3p24y1v_4mzw0000gn/T/pip-build-env-v2znngb3/overlay/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 1021, in run_command
cmd_obj.run()
File "<string>", line 772, in run
File "<string>", line 674, in pip_run
File "<string>", line 617, in build
File "<string>", line 397, in bazel_invoke
File "/opt/anaconda3/envs/ray-compile/lib/python3.11/subprocess.py", line 413, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['bazel', 'build', '--verbose_failures', '--', '//:ray_pkg', '//cpp:ray_cpp_pkg']' returned non-zero exit status 1.
An error occurred when building editable wheel for ray.
See debugging tips in: https://setuptools.pypa.io/en/latest/userguide/development_mode.html#debugging-tips
error: subprocess-exited-with-error

× Building editable for ray (pyproject.toml) did not run successfully.
exit code: 1
╰─> No available output.

note: This error originates from a subprocess, and is likely not a problem with pip.
full command: /opt/anaconda3/envs/ray-compile/bin/python3.11 /opt/anaconda3/envs/ray-compile/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py build_editable /var/folders/xx/j9ztcfr55_d_3p24y1v_4mzw0000gn/T/tmpq2o3yp9s
cwd: /Users/xytan/Desktop/study/ray/python
Building editable for ray (pyproject.toml) ... error
ERROR: Failed building editable for ray
Failed to build ray
error: failed-wheel-build-for-install

× Failed to build installable wheels for some pyproject.toml based projects
╰─> ray

通过搜索,找到了 Bazel 仓库中一个相同报错的 issue。原因是新版本 macOS SDK 与 Bazel 依赖的 zlib 1.3 不兼容,需升级到 zlib 1.3.1 版本。

于是我按照 issue 中的描述在 WORKSPACE 文件中添加了以下配置,遗憾的是仍然报错:

1
2
3
4
5
6
7
8
9
10
11
zlib_version = "1.3.1"

zlib_sha256 = "9a93b2b7dfdac77ceba5a558a580e74667dd6fede4585b91eefb60f03b72df23"

http_archive(
name = "zlib",
build_file = "@com_google_protobuf//:third_party/zlib.BUILD",
sha256 = zlib_sha256,
strip_prefix = "zlib-%s" % zlib_version,
urls = ["https://github.com/madler/zlib/releases/download/v{v}/zlib-{v}.tar.gz".format(v = zlib_version)],
)

白盒分析暂时没有头绪。既然如此,只能继续用二分方案黑盒查找。这次由于需要编译数分钟才能复现,二分过程会稍慢一些,但仍可行。

经过 11 轮二分,定位到了使编译通过的 commit。从 PR 标题来看,该 commit 正是为了解决 macOS 编译问题:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
65ae6076f25325528dabf1432d1ff1bedb1c70b3 is the first bad commit
commit 65ae6076f25325528dabf1432d1ff1bedb1c70b3
Author: Dhyey Shah <dhyey2019@gmail.com>
Date: Mon Apr 7 12:41:34 2025 -0400

[core] Patch zlib and clang 17 compliant for mac update (#52020)

Signed-off-by: dayshah <dhyey2019@gmail.com>

.bazelrc | 2 +-
bazel/ray.bzl | 4 ++++
bazel/ray_deps_setup.bzl | 2 ++
src/ray/core_worker/core_worker.h | 9 +++++++--
thirdparty/patches/grpc-zlib-fdopen.patch | 13 +++++++++++++
thirdparty/patches/prometheus-zlib-fdopen.patch | 11 +++++++++++
thirdparty/patches/zlib-fdopen.patch | 19 +++++++++++++++++++
7 files changed, 57 insertions(+), 3 deletions(-)
create mode 100644 thirdparty/patches/grpc-zlib-fdopen.patch
create mode 100644 thirdparty/patches/prometheus-zlib-fdopen.patch
create mode 100644 thirdparty/patches/zlib-fdopen.patch

切回 ray-2.40.0 版本,执行 git cherry-pick 65ae6076f25325528dabf1432d1ff1bedb1c70b3 将该 commit cherry-pick 过来(需处理小范围冲突,可参考我个人维护的 release/2.40.0 版本),再补充 setup.py 中的 pip 依赖,即可在新版本 macOS 上成功编译 ray-2.40.0。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
ray git:(6e726cac4f) ✗ pip install -e python --verbose
Using pip 25.3 from /opt/anaconda3/envs/ray-compile-1/lib/python3.11/site-packages/pip (python 3.11)
Obtaining file:///Users/xytan/Desktop/study/ray/python
Running command installing build dependencies
Using pip 25.3 from /opt/anaconda3/envs/ray-compile-1/lib/python3.11/site-packages/pip (python 3.11)
Collecting setuptools>=40.8.0
Obtaining dependency information for setuptools>=40.8.0 from https://files.pythonhosted.org/packages/a3/dc/17031897dae0efacfea57dfd3a82fdd2a2aeb58e0ff71b77b87e44edc772/setuptools-80.9.0-py3-none-any.whl.metadata
Using cached setuptools-80.9.0-py3-none-any.whl.metadata (6.6 kB)
Using cached setuptools-80.9.0-py3-none-any.whl (1.2 MB)
Installing collected packages: setuptools
Successfully installed setuptools-80.9.0
Installing build dependencies ... done
Running command Checking if build backend supports build_editable
Checking if build backend supports build_editable ... done
Running command Getting requirements to build editable
Getting requirements to build editable ... done
Running command installing backend dependencies
Using pip 25.3 from /opt/anaconda3/envs/ray-compile-1/lib/python3.11/site-packages/pip (python 3.11)
Collecting pip
Obtaining dependency information for pip from https://files.pythonhosted.org/packages/44/3c/d717024885424591d5376220b5e836c2d5293ce2011523c9de23ff7bf068/pip-25.3-py3-none-any.whl.metadata
Using cached pip-25.3-py3-none-any.whl.metadata (4.7 kB)
Collecting wheel
Obtaining dependency information for wheel from https://files.pythonhosted.org/packages/0b/2c/87f3254fd8ffd29e4c02732eee68a83a1d3c346ae39bc6822dcbcb697f2b/wheel-0.45.1-py3-none-any.whl.metadata
Using cached wheel-0.45.1-py3-none-any.whl.metadata (2.3 kB)
Collecting cython>=0.29.32
Obtaining dependency information for cython>=0.29.32 from https://files.pythonhosted.org/packages/e0/ba/d785f60564a43bddbb7316134252a55d67ff6f164f0be90c4bf31482da82/cython-3.2.2-cp311-cp311-macosx_11_0_arm64.whl.metadata
Using cached cython-3.2.2-cp311-cp311-macosx_11_0_arm64.whl.metadata (5.0 kB)
Using cached pip-25.3-py3-none-any.whl (1.8 MB)
Using cached wheel-0.45.1-py3-none-any.whl (72 kB)
Using cached cython-3.2.2-cp311-cp311-macosx_11_0_arm64.whl (3.0 MB)
...
...
...
Building editable for ray (pyproject.toml) ... done
Created wheel for ray: filename=ray-2.40.0-0.editable-cp311-cp311-macosx_11_0_arm64.whl size=7304 sha256=5b09461aeadadc13af4d10af9d5c78e4a55a52718113de72f0b02bbeb485c5c3
Stored in directory: /private/var/folders/xx/j9ztcfr55_d_3p24y1v_4mzw0000gn/T/pip-ephem-wheel-cache-njomx5v1/wheels/3b/4a/f0/6edffb2ad8c786ba8990ff9495668d930965bc91921b146ea6
Successfully built ray
Installing collected packages: ray
Attempting uninstall: ray
Found existing installation: ray 2.52.1
Uninstalling ray-2.52.1:
Removing file or directory /opt/anaconda3/envs/ray-compile-1/bin/ray
Removing file or directory /opt/anaconda3/envs/ray-compile-1/bin/serve
Removing file or directory /opt/anaconda3/envs/ray-compile-1/bin/tune
Removing file or directory /opt/anaconda3/envs/ray-compile-1/lib/python3.11/site-packages/__editable__.ray-2.52.1.pth
Removing file or directory /opt/anaconda3/envs/ray-compile-1/lib/python3.11/site-packages/__editable___ray_2_52_1_finder.py
Removing file or directory /opt/anaconda3/envs/ray-compile-1/lib/python3.11/site-packages/__pycache__/__editable___ray_2_52_1_finder.cpython-311.pyc
Removing file or directory /opt/anaconda3/envs/ray-compile-1/lib/python3.11/site-packages/ray-2.52.1.dist-info/
Successfully uninstalled ray-2.52.1
changing mode of /opt/anaconda3/envs/ray-compile-1/bin/ray to 755
changing mode of /opt/anaconda3/envs/ray-compile-1/bin/rllib to 755
changing mode of /opt/anaconda3/envs/ray-compile-1/bin/serve to 755
changing mode of /opt/anaconda3/envs/ray-compile-1/bin/tune to 755
Successfully installed ray-2.40.0

小结

在 macOS 上编译 ray-2.52.1 和 ray-2.40.0 的过程中,遇到了两个棘手问题:第一个是找不到 pip 的问题,官方 issue、PR 和网络资料均无解决方案;第二个是 zlib 版本兼容问题,虽然在 issue 中找到了疑似方案,但尝试后未能奏效。

在白盒分析无果的情况下,我决定使用 git bisect 黑盒定位。得益于 O(log n) 相比 O(n) 的效率优势,成功在近五千个 commit 中高效找到了使 ray-2.40.0 能够编译的两个关键 commit。

通过这次排查,我将基于 release/2.40.0 版本新增的两个修复 commit 推送到了 GitHub,同时也将本文的发现回复在了近两年未关闭的 issue 中并使得 issue 被 resolve 关闭,希望后来遇到这些坑的朋友能从中受益。

在 CentOS 8 上编译 Ray

完成 MacBook 上的编译探索后,接下来在 CentOS 8 上编译 Ray。相比之前遇到的代码层面问题,这部分更多是环境配置的挑战。

由于 CentOS 8 已于 2021 年底停止维护,主流云厂商的官方镜像中已不再提供该版本,最低可选版本为 CentOS Stream 9:

Ubuntu 同样如此,最低可选版本为 Ubuntu 22.04,无法直接获取 Ubuntu 16 等老版本镜像:

虽然基于这些新版本镜像也能编译 Ray,但由于其 glibc 等核心库版本较高,编译产物往往无法在老版本系统上运行。

因此,若需为老版本操作系统编译 HotFix,推荐的做法是:在云厂商处租用相同 CPU 架构的较新版本机器,然后通过 Docker 拉取 CentOS 或 Ubuntu 官方提供的老版本镜像进行编译,以此确保编译环境与生产环境的一致性。

基于以上分析,我在 Google Cloud 上租用了一台 x86 架构的 CentOS Stream 9 机器进行后续编译。

环境配置

  1. 按上述要求从云厂商处申请机器,通过 SSH 登录
  2. 安装 Docker
    1
    sudo yum install docker
  3. 拉取目标版本的 CentOS 镜像并进入容器
    1
    docker run -it centos:8.1.1911 /bin/bash
  4. 由于 CentOS 8 官方源已停止服务,需在容器内配置可用的 yum 源

    1
    2
    3
    4
    5
    sed -i 's|mirrorlist=|#mirrorlist=|g' /etc/yum.repos.d/CentOS-*.repo
    sed -i 's|#baseurl=http://mirror.centos.org|baseurl=http://vault.centos.org|g' /etc/yum.repos.d/CentOS-*.repo

    yum clean all
    yum makecache
  5. 安装 Node.js

    1
    2
    3
    4
    curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh | bash
    exec bash
    nvm install 14
    nvm use 14
  6. 克隆 Ray 仓库

    1
    2
    3
    yum install -y git
    git clone https://github.com/onesizefitsquorum/ray.git
    cd ray
  7. 安装 C++ 编译工具链

    1
    2
    3
    4
    5
    yum groupinstall 'Development Tools'
    yum install psmisc
    ci/env/install-bazel.sh
    echo 'export PATH="$PATH:~/bin"' >> ~/.bashrc
    exec bash
  8. 安装 Anaconda

    1
    2
    3
    4
    yum install wget
    wget https://repo.anaconda.com/archive/Anaconda3-2025.06-1-Linux-x86_64.sh
    sh Anaconda3-2025.06-1-Linux-x86_64.sh
    exec bash
  9. 创建并激活 Python 环境

    1
    2
    conda create -n ray-compile python=3.11.9
    conda activate ray-compile

编译 HotFix 分支

  1. 切换到维护的 HotFix 分支
    1
    git checkout release/2.40.0
  2. 编译 Dashboard
    1
    2
    3
    cd python/ray/dashboard/client
    npm ci
    npm run build
  3. 编译 Ray
    1
    2
    3
    4
    cd -
    cd python/
    pip install -r requirements.txt
    pip install -e . --verbose
  4. 编译失败。需要继续探索原因。

这期间的若干尝试主要有以下三类报错:

  1. GLIBC 安全检查与外部依赖冲突(Fortify 越界)

    • 这个问题是由于 Ray 的外部依赖库 (@upb) 采取的内存操作方式,与系统的高版本 GLIBC 引入的严格安全检查(_FORTIFY_SOURCE)冲突导致的。
    • 报错日志片段(关键信息):
      1
      error: '__builtin_memcpy(...)' forming offset [9, 16] is out of the bounds [0, 8] of object 'value' ... [-Werror=array-bounds]
    • 问题根源与修改原因: 该错误是 upb 库在进行内存复制时,触发了 GLIBC 的 _FORTIFY_SOURCE 机制的 array-bounds 警告,并且该警告被 Bazel 的编译选项 -Werror 升级为错误。由于 upb 库的编译规则(特别是针对 Bazel Host 工具时)强制定义了 -D_FORTIFY_SOURCE=1,同时又强制使用了 -Werror,覆盖了我们的外部参数。 因此,需要指定 BAZEL_ARGS 来强制覆盖这些编译选项:
      • —host_copt=-U_FORTIFY_SOURCE —copt=-U_FORTIFY_SOURCE:取消定义 _FORTIFY_SOURCE 宏,绕过严格的安全检查。
      • —host_copt=-Wno-error —copt=-Wno-error:禁用将警告升级为错误的行为,防止 upb 编译失败。
  2. Bazel 配置文件强制执行 -Werror 导致的编译失败

    • 即使我们通过 BAZEL_ARGS 传递了 -Wno-error,Ray 源码中的 Bazel 配置文件(.bazelrc)仍有更高级别的规则强制应用 -Werror,这导致了像 implicit-fallthrough 这样的 C++ 警告升级为错误。

    • 报错日志片段(关键信息):

      1
      2
      3
      4
      src/ray/common/id.cc: In function 'uint64_t ray::MurmurHash64A(const void*, int, unsigned int)':
      src/ray/common/id.cc:106:7: error: this statement may fall through [-Werror=implicit-fallthrough=]
      ...
      cc1plus: all warnings being treated as errors
    • 问题根源与修改原因: Ray 的 C++ 代码在 MurmurHash64A 等函数中使用了 switch 语句的 Fall-through(自然落入) 结构,这种结构在 GCC 中会触发 -Wimplicit-fallthrough 警告。由于 Ray 源码根目录下的 .bazelrc 文件中存在一条高优先级的配置规则,例如 build:linux —per_file_copt=”…”-Werror,这条规则将所有警告都升级为了错误。命令行参数无法覆盖这条规则。因此,需要手动进入 .bazelrc 文件,将该行配置(即强制添加 -Werror 的项)注释掉,才能允许这些警告存在,从而使核心代码编译通过。
  3. GCC 版本不兼容导致的 C++ 歧义错误

    • 这个问题发生在尝试使用 Ray 2.40.0 版本的源代码时。Ray 的 C++ 代码库是基于较新的 C++ 标准(如 C++17)编写的,而系统默认 GCC 版本(可能是 GCC 8.x 或更早)在处理新标准的一些特性时存在缺陷。
    • 报错日志片段(关键信息):
      1
      error: ambiguous overload for 'operator<<' (operand types are 'std::ostringstream' {aka 'std::__cxx11::basic_ostringstream<char>'} and 'std::nullptr_t')
    • 问题根源与修改原因: 该错误是经典的 nullptr_t 歧义问题。在 Ray 的日志宏(RayLog)中,尝试将 nullptr(类型为 std::nullptr_t)输出到 std::ostringstream。旧版 GCC(如 GCC 8.x 的标准库)对 std::nullptr_t 没有明确的 operator<< 重载,导致编译器无法区分它是应该被当作 bool 还是 const void*,因此报告歧义错误。将 GCC 版本升级到 11.2.1 能够解决此问题,因为新版本的 GCC 标准库完善了对 C++17 特性的支持,消除了这种类型转换的歧义。

通过在 GitHub 和 Ray 问答社区中进行搜索,并结合 Gemini 和 ChatGPT 的多轮问答结果,在踩坑十余次、折腾许久后,最终一一解决。

解决方案有三:

  1. 升级 gcc 版本到 11.2.1:可以看到在官网编译文档中 Ubuntu 上推荐的编译器版本为 clang12,但却没有说明推荐的 gcc 版本。实测升级到 11.2.1 版本的 GCC 能够编译通过。
    1
    2
    3
    4
    yum install gcc-toolset-11
    scl enable gcc-toolset-11 bash
    # 注意需要重新切换 conda 环境
    conda activate ray-compile
  2. 设置 BAZEL_ARGS 环境变量
    • -U_FORTIFY_SOURCE: 禁用 Fortify 检查,解决 upb/memcpy 越界问题。
    • -Wno-error: 禁用将警告升级为错误,避免外部依赖因严格的警告而失败。
    • --host_copt / --host_cxxopt: 确保这些豁免规则应用于 Bazel 编译工具链(即 Host 平台)。
      1
      export BAZEL_ARGS="--host_copt=-U_FORTIFY_SOURCE --copt=-U_FORTIFY_SOURCE --host_copt=-Wno-error --copt=-Wno-error --host_cxxopt=-Wno-error --cxxopt=-Wno-error"
  3. 修改 Ray .bazelrc 代码中的编译选项:尽管我们设置了 BAZEL_ARGS,但 Ray 源码目录下的 .bazelrc 文件中包含的 build:linux —per_file_copt=”…-Werror” 规则具有极高的优先级,强制将 implicit-fallthrough 等警告升级为错误。需要手动将其注释。
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    diff --git a/.bazelrc b/.bazelrc
    index 3c84ce36a7..84a5b3fa7a 100644
    --- a/.bazelrc
    +++ b/.bazelrc
    @@ -43,10 +43,10 @@ build:windows --enable_runfiles
    # for compiling assembly files is fixed on Windows:
    # https://github.com/bazelbuild/bazel/issues/8924
    # Warnings should be errors
    -build:linux --per_file_copt="-\\.(asm|S)$@-Werror"
    -build:macos --per_file_copt="-\\.(asm|S)$@-Werror"
    -build:clang-cl --per_file_copt="-\\.(asm|S)$@-Werror"
    -build:msvc-cl --per_file_copt="-\\.(asm|S)$@-WX"
    +# build:linux --per_file_copt="-\\.(asm|S)$@-Werror"
    +# build:macos --per_file_copt="-\\.(asm|S)$@-Werror"
    +# build:clang-cl --per_file_copt="-\\.(asm|S)$@-Werror"
    +# build:msvc-cl --per_file_copt="-\\.(asm|S)$@-WX"
    # Ignore warnings for protobuf generated files and external projects.
    build --per_file_copt="\\.pb\\.cc$@-w"
    build:linux --per_file_copt="-\\.(asm|S)$,external/.*@-w,-Wno-error=implicit-function-declaration,-Wno-error=unused-function"

完成以上修改后,即可成功执行 pip install -e . --verbose 完成编译。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
...
...
...
Options like `package-data`, `include/exclude-package-data` or
`packages.find.exclude/include` may have no effect.

adding '__editable___ray_2_40_0_finder.py'
adding '__editable__.ray-2.40.0.pth'
creating '/tmp/pip-ephem-wheel-cache-rq0i6oso/wheels/3b/a3/3e/5871189f4113432e73b7e4659ab9a4d2edef3998a6dcfea06f/tmp5xknhp72/.tmp-_9ctuqe5/ray-2.40.0-0.editable-cp311-cp311-linux_x86_64.whl' and adding '/tmp/tmpr9lwou7pray-2.40.0-0.editable-cp311-cp311-linux_x86_64.whl' to it
adding 'ray-2.40.0.dist-info/METADATA'
adding 'ray-2.40.0.dist-info/WHEEL'
adding 'ray-2.40.0.dist-info/entry_points.txt'
adding 'ray-2.40.0.dist-info/top_level.txt'
adding 'ray-2.40.0.dist-info/RECORD'
/tmp/pip-build-env-mlv1uqa4/overlay/lib/python3.11/site-packages/setuptools/command/editable_wheel.py:351: InformationOnly: Editable installation.
!!

********************************************************************************
Please be careful with folders in your working directory with the same
name as your package as they may take precedence during imports.
********************************************************************************

!!
with strategy, WheelFile(wheel_path, "w") as wheel_obj:
Building editable for ray (pyproject.toml) ... done
Created wheel for ray: filename=ray-2.40.0-0.editable-cp311-cp311-linux_x86_64.whl size=7272 sha256=18b317c847a6088a316df5f5c98bda8e245fb62cd7acb720a374447e4b94646c
Stored in directory: /tmp/pip-ephem-wheel-cache-rq0i6oso/wheels/3b/a3/3e/5871189f4113432e73b7e4659ab9a4d2edef3998a6dcfea06f
Successfully built ray
Installing collected packages: ray
changing mode of /root/anaconda3/envs/ray-compile/bin/ray to 755
changing mode of /root/anaconda3/envs/ray-compile/bin/rllib to 755
changing mode of /root/anaconda3/envs/ray-compile/bin/serve to 755
changing mode of /root/anaconda3/envs/ray-compile/bin/tune to 755
Successfully installed ray-2.40.0

下一步也可以使用 pip wheel . --verbose 来打包成 wheel 供其它环境安装使用。

小结

与 macOS 上的编译经历类似,CentOS 8 上的编译同样遇到了不少坑。除了需要通过 Docker 确保编译环境与生产环境一致外,还需解决三个编译问题:GCC 版本过低导致的 nullptr_t 歧义错误、GLIBC Fortify 安全检查与外部依赖冲突、以及 .bazelrc 强制启用 -Werror 导致的编译失败。最终通过升级 GCC 到 11.2.1、设置 BAZEL_ARGS 环境变量、以及注释 .bazelrc 中的 -Werror 配置,成功完成编译。

总结

本文记录了在 macOS 和 CentOS 8 上编译 Ray 2.40.0 的完整踩坑过程,共解决了五个关键问题:

macOS 编译问题:

  1. pip 模块缺失:Ray 2.43 之前的版本编译时报 No module named pip,需在 setup.pysetup_requires 中添加 pip 依赖,详见 commit
  2. zlib 兼容性问题:Ray 2.45 之前的版本在新版 macOS 上因 zlib 版本不兼容而编译失败,需 cherry-pick 此 commit 修复。

CentOS 8 编译问题:

  1. GCC 版本过低:CentOS 8 默认的 GCC 8.x 在处理 C++17 特性时存在 nullptr_t 歧义问题,需升级到 GCC 11.2.1。
  2. GLIBC Fortify 冲突:外部依赖库 upb 的内存操作与 GLIBC 的 _FORTIFY_SOURCE 安全检查冲突,需通过 BAZEL_ARGS 禁用相关检查。
  3. -Werror 强制启用.bazelrc 中的 -Werror 配置将警告升级为错误,需手动注释相关配置行。详见 commit

以上修复已合并到我维护的 release/2.40.0 分支,同时也已将解决方案回复到社区 issue 中并使得 issue 被 resolve 掉,希望能帮助后来者少走弯路。

至此,本文所有内容均已结束,感谢您的阅读和关注!


Ray 编译踩坑记:老版本在老系统上的编译之路
https://tanxinyu.work/ray-compile/
作者
谭新宇
发布于
2025年12月6日
许可协议