Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bazel Causing Server to Become Unresponsive #23931

Closed
Boring545 opened this issue Oct 10, 2024 · 6 comments
Closed

Bazel Causing Server to Become Unresponsive #23931

Boring545 opened this issue Oct 10, 2024 · 6 comments
Labels
awaiting-user-response Awaiting a response from the author more data needed team-Local-Exec Issues and PRs for the Execution (Local) team type: bug untriaged

Comments

@Boring545
Copy link

Description of the bug:

During a Bazel build, my server completely lost responsiveness after running the build for some time. I tried to limit resource usage by specifying the following options:

--jobs=8 --local_cpu_resources=HOST_CPUS*.5 --local_ram_resources=HOST_RAM*.5

However, this had no effect—the server still froze. My server has 127 CPU cores, and during the build, it shows "127 actions, 127 running." Strangely, even after specifying the above parameters, it still shows 127 actions running. How can I properly limit Bazel’s resource usage to prevent the server from crashing? I can’t provide more details on system resource usage because the server freezes completely during the build.

Which category does this issue belong to?

No response

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

No response

Which operating system are you running Bazel on?

openeuler for riscv64

What is the output of bazel info release?

release 6.5.0

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse HEAD ?

No response

If this is a regression, please try to identify the Bazel commit where the bug was introduced with bazelisk --bisect.

No response

Have you found anything relevant by searching the web?

#11868

Any other information, logs, or outputs that you want to share?

No response

@satyanandak satyanandak added the team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website label Oct 10, 2024
@Boring545
Copy link
Author

Boring545 commented Oct 11, 2024

I found the cause of the server crash: Bazel was using all the memory in the system without limits, which eventually led to resource exhaustion and the server crashing. How can I fix this issue? I set --local_ram_resources=HOST_RAM*.5, but it didn't help.
Maybe I should use the parameter --host_jvm_args=-Xmx64g (the system has 121GB of memory).
Please help me.

@meisterT meisterT added team-Local-Exec Issues and PRs for the Execution (Local) team and removed team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website labels Oct 12, 2024
@meisterT
Copy link
Member

Can you share a JSON trace profile? Also, can you use --announce_rc to see whether there are any other flags that could be relevant?

@meisterT
Copy link
Member

Are you using the embedded JDK or do you specify one yourself?

@Boring545
Copy link
Author

I'm sorry for the late reply. I didn't specify any JDK for Bazel. A particular point is that my server runs on a RISC-V architecture CPU, and my colleague built Bazel 6.5.0 for that architecture. The issue might stem from the fact that our Bazel is a non-official release version, which led to this error. Later, I limited --jobs=4 during the bazel build, which prevented the memory usage from exceeding my local memory.

@Boring545
Copy link
Author

After adding the --announce_rc parameter, the output of executing bazel build is as follows:

[zjq@openeuler-riscv-4-4 proxy]$ make build
Starting local Bazel server and connecting to it...
export PATH=/usr/lib/llvm-10/bin:/home/zjq/riscv_istio_test/go_golang/go/bin:/home/zjq/riscv_istio_test/go_golang/golang/bin:/home/zjq/.cargo/bin:/home/zjq/.wasmtime/bin:/home/zjq/local/bazel/bin:/home/zjq/.local/bin:/home/zjq/bin:/home/zjq/.cabal/bin:/home/zjq/build_factory/GHC/cabal/cabal2/cabal/_build/bin:/usr/local/bin:/usr/bin CC=gcc CXX=g++ && \
bazel  build  --announce_rc --fission=no --local_cpu_resources=50 --local_ram_resources=32768 --jobs=4   //...
INFO: Reading 'startup' options from /home/zjq/build_factory/proxy/proxy/envoy.bazelrc: --host_jvm_args=-Xmx3g
INFO: Options provided by the client:
  Inherited 'common' options: --isatty=1 --terminal_columns=149
INFO: Reading rc options for 'build' from /home/zjq/build_factory/proxy/proxy/envoy.bazelrc:
  Inherited 'common' options: --experimental_allow_tags_propagation
INFO: Reading rc options for 'build' from /home/zjq/build_factory/proxy/proxy/envoy.bazelrc:
  'build' options: --color=yes --workspace_status_command=bash bazel/get_workspace_status --incompatible_strict_action_env --java_runtime_version=remotejdk_11 --tool_java_runtime_version=remotejdk_11 --platform_mappings=bazel/platform_mappings --copt=-DABSL_MIN_LOG_LEVEL=4 --define envoy_mobile_listener=enabled --experimental_repository_downloader_retries=2 --action_env=CC --host_action_env=CC --action_env=CXX --host_action_env=CXX --action_env=LLVM_CONFIG --host_action_env=LLVM_CONFIG --action_env=PATH --host_action_env=PATH --action_env=BAZEL_VOLATILE_DIRTY --host_action_env=BAZEL_VOLATILE_DIRTY --action_env=BAZEL_FAKE_SCM_REVISION --host_action_env=BAZEL_FAKE_SCM_REVISION --enable_platform_specific_config --test_summary=terse --incompatible_config_setting_private_default_visibility --incompatible_enforce_config_setting_visibility --define absl=1 --@com_googlesource_googleurl//build_config:system_icu=0 --test_env=HEAPCHECK=normal --test_env=PPROF_PATH
INFO: Reading rc options for 'build' from /home/zjq/build_factory/proxy/proxy/.bazelrc:
  'build' options: --workspace_status_command=bazel/bazel_get_workspace_status --define path_normalization_by_default=true --define tcmalloc=gperftools --define wasm=v8 --copt -DNULL_PLUGIN --cxxopt -Wformat --cxxopt -Wformat-security --host_linkopt=-pthread --action_env=CXXFLAGS=-Wno-unused-variable
INFO: Found applicable config definition build:linux in file /home/zjq/build_factory/proxy/proxy/envoy.bazelrc: --copt=-fPIC --copt=-Wno-deprecated-declarations --cxxopt=-std=c++17 --host_cxxopt=-std=c++17 --conlyopt=-fexceptions --fission=dbg,opt --features=per_object_debug_info --action_env=BAZEL_LINKLIBS=-l%:libstdc++.a --action_env=BAZEL_LINKOPTS=-lm --per_file_copt=external/com_github_datadog_dd_opentracing_cpp/.*.cpp@-Wno-type-limits

@meisterT
Copy link
Member

Can you share a JSON trace profile?

@meisterT meisterT added the awaiting-user-response Awaiting a response from the author label Nov 19, 2024
@oquenchil oquenchil closed this as not planned Won't fix, can't repro, duplicate, stale Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting-user-response Awaiting a response from the author more data needed team-Local-Exec Issues and PRs for the Execution (Local) team type: bug untriaged
Projects
None yet
Development

No branches or pull requests

6 participants