Skip to content

Commit 6d06ac2

Browse files
joyeecheungMylesBorins
authored andcommitted
src: add --heapsnapshot-near-heap-limit option
This patch adds a --heapsnapshot-near-heap-limit CLI option that takes heap snapshots when the V8 heap is approaching the heap size limit. It will try to write the snapshots to disk before the program crashes due to OOM. PR-URL: #33010 Refs: #27552 Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Richard Lau <[email protected]> Reviewed-By: Gireesh Punathil <[email protected]>
1 parent 6544cfb commit 6d06ac2

20 files changed

+496
-21
lines changed

doc/api/cli.md

+47
Original file line numberDiff line numberDiff line change
@@ -337,6 +337,52 @@ reference. Code may break under this flag.
337337
`--require` runs prior to freezing intrinsics in order to allow polyfills to
338338
be added.
339339

340+
### `--heapsnapshot-near-heap-limit=max_count`
341+
<!-- YAML
342+
added: REPLACEME
343+
-->
344+
345+
> Stability: 1 - Experimental
346+
347+
Writes a V8 heap snapshot to disk when the V8 heap usage is approaching the
348+
heap limit. `count` should be a non-negative integer (in which case
349+
Node.js will write no more than `max_count` snapshots to disk).
350+
351+
When generating snapshots, garbage collection may be triggered and bring
352+
the heap usage down, therefore multiple snapshots may be written to disk
353+
before the Node.js instance finally runs out of memory. These heap snapshots
354+
can be compared to determine what objects are being allocated during the
355+
time consecutive snapshots are taken. It's not guaranteed that Node.js will
356+
write exactly `max_count` snapshots to disk, but it will try
357+
its best to generate at least one and up to `max_count` snapshots before the
358+
Node.js instance runs out of memory when `max_count` is greater than `0`.
359+
360+
Generating V8 snapshots takes time and memory (both memory managed by the
361+
V8 heap and native memory outside the V8 heap). The bigger the heap is,
362+
the more resources it needs. Node.js will adjust the V8 heap to accommondate
363+
the additional V8 heap memory overhead, and try its best to avoid using up
364+
all the memory avialable to the process. When the process uses
365+
more memory than the system deems appropriate, the process may be terminated
366+
abruptly by the system, depending on the system configuration.
367+
368+
```console
369+
$ node --max-old-space-size=100 --heapsnapshot-near-heap-limit=3 index.js
370+
Wrote snapshot to Heap.20200430.100036.49580.0.001.heapsnapshot
371+
Wrote snapshot to Heap.20200430.100037.49580.0.002.heapsnapshot
372+
Wrote snapshot to Heap.20200430.100038.49580.0.003.heapsnapshot
373+
374+
<--- Last few GCs --->
375+
376+
[49580:0x110000000] 4826 ms: Mark-sweep 130.6 (147.8) -> 130.5 (147.8) MB, 27.4 / 0.0 ms (average mu = 0.126, current mu = 0.034) allocation failure scavenge might not succeed
377+
[49580:0x110000000] 4845 ms: Mark-sweep 130.6 (147.8) -> 130.6 (147.8) MB, 18.8 / 0.0 ms (average mu = 0.088, current mu = 0.031) allocation failure scavenge might not succeed
378+
379+
380+
<--- JS stacktrace --->
381+
382+
FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
383+
....
384+
```
385+
340386
### `--heapsnapshot-signal=signal`
341387
<!-- YAML
342388
added: v12.0.0
@@ -1285,6 +1331,7 @@ Node.js options that are allowed are:
12851331
* `--force-context-aware`
12861332
* `--force-fips`
12871333
* `--frozen-intrinsics`
1334+
* `--heapsnapshot-near-heap-limit`
12881335
* `--heapsnapshot-signal`
12891336
* `--http-parser`
12901337
* `--icu-data-dir`

doc/node.1

+4
Original file line numberDiff line numberDiff line change
@@ -185,6 +185,10 @@ Same requirements as
185185
.It Fl -frozen-intrinsics
186186
Enable experimental frozen intrinsics support.
187187
.
188+
.It Fl -heapsnapshot-near-heap-limit Ns = Ns Ar max_count
189+
Generate heap snapshot when the V8 heap usage is approaching the heap limit.
190+
No more than the specified number of snapshots will be generated.
191+
.
188192
.It Fl -heapsnapshot-signal Ns = Ns Ar signal
189193
Generate heap snapshot on specified signal.
190194
.

src/debug_utils.h

+1
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@ void FWrite(FILE* file, const std::string& str);
4141
// from a provider type to a debug category.
4242
#define DEBUG_CATEGORY_NAMES(V) \
4343
NODE_ASYNC_PROVIDER_TYPES(V) \
44+
V(DIAGNOSTICS) \
4445
V(HUGEPAGES) \
4546
V(INSPECTOR_SERVER) \
4647
V(INSPECTOR_PROFILER) \

src/env-inl.h

+16
Original file line numberDiff line numberDiff line change
@@ -704,6 +704,22 @@ inline const std::string& Environment::exec_path() const {
704704
return exec_path_;
705705
}
706706

707+
inline std::string Environment::GetCwd() {
708+
char cwd[PATH_MAX_BYTES];
709+
size_t size = PATH_MAX_BYTES;
710+
const int err = uv_cwd(cwd, &size);
711+
712+
if (err == 0) {
713+
CHECK_GT(size, 0);
714+
return cwd;
715+
}
716+
717+
// This can fail if the cwd is deleted. In that case, fall back to
718+
// exec_path.
719+
const std::string& exec_path = exec_path_;
720+
return exec_path.substr(0, exec_path.find_last_of(kPathSeparator));
721+
}
722+
707723
#if HAVE_INSPECTOR
708724
inline void Environment::set_coverage_directory(const char* dir) {
709725
coverage_directory_ = std::string(dir);

src/env.cc

+146
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
#include "async_wrap.h"
44
#include "base_object-inl.h"
55
#include "debug_utils-inl.h"
6+
#include "diagnosticfilename-inl.h"
67
#include "memory_tracker-inl.h"
78
#include "node_buffer.h"
89
#include "node_context_data.h"
@@ -22,6 +23,7 @@
2223
#include <algorithm>
2324
#include <atomic>
2425
#include <cstdio>
26+
#include <limits>
2527
#include <memory>
2628

2729
namespace node {
@@ -465,6 +467,11 @@ Environment::~Environment() {
465467
// FreeEnvironment() should have set this.
466468
CHECK(is_stopping());
467469

470+
if (options_->heap_snapshot_near_heap_limit > heap_limit_snapshot_taken_) {
471+
isolate_->RemoveNearHeapLimitCallback(Environment::NearHeapLimitCallback,
472+
0);
473+
}
474+
468475
isolate()->GetHeapProfiler()->RemoveBuildEmbedderGraphCallback(
469476
BuildEmbedderGraph, this);
470477

@@ -1097,6 +1104,25 @@ void Environment::VerifyNoStrongBaseObjects() {
10971104
});
10981105
}
10991106

1107+
uint64_t GuessMemoryAvailableToTheProcess() {
1108+
uint64_t free_in_system = uv_get_free_memory();
1109+
size_t allowed = uv_get_constrained_memory();
1110+
if (allowed == 0) {
1111+
return free_in_system;
1112+
}
1113+
size_t rss;
1114+
int err = uv_resident_set_memory(&rss);
1115+
if (err) {
1116+
return free_in_system;
1117+
}
1118+
if (allowed < rss) {
1119+
// Something is probably wrong. Fallback to the free memory.
1120+
return free_in_system;
1121+
}
1122+
// There may still be room for swap, but we will just leave it here.
1123+
return allowed - rss;
1124+
}
1125+
11001126
void Environment::BuildEmbedderGraph(Isolate* isolate,
11011127
EmbedderGraph* graph,
11021128
void* data) {
@@ -1109,6 +1135,126 @@ void Environment::BuildEmbedderGraph(Isolate* isolate,
11091135
});
11101136
}
11111137

1138+
size_t Environment::NearHeapLimitCallback(void* data,
1139+
size_t current_heap_limit,
1140+
size_t initial_heap_limit) {
1141+
Environment* env = static_cast<Environment*>(data);
1142+
1143+
Debug(env,
1144+
DebugCategory::DIAGNOSTICS,
1145+
"Invoked NearHeapLimitCallback, processing=%d, "
1146+
"current_limit=%" PRIu64 ", "
1147+
"initial_limit=%" PRIu64 "\n",
1148+
env->is_processing_heap_limit_callback_,
1149+
static_cast<uint64_t>(current_heap_limit),
1150+
static_cast<uint64_t>(initial_heap_limit));
1151+
1152+
size_t max_young_gen_size = env->isolate_data()->max_young_gen_size;
1153+
size_t young_gen_size = 0;
1154+
size_t old_gen_size = 0;
1155+
1156+
v8::HeapSpaceStatistics stats;
1157+
size_t num_heap_spaces = env->isolate()->NumberOfHeapSpaces();
1158+
for (size_t i = 0; i < num_heap_spaces; ++i) {
1159+
env->isolate()->GetHeapSpaceStatistics(&stats, i);
1160+
if (strcmp(stats.space_name(), "new_space") == 0 ||
1161+
strcmp(stats.space_name(), "new_large_object_space") == 0) {
1162+
young_gen_size += stats.space_used_size();
1163+
} else {
1164+
old_gen_size += stats.space_used_size();
1165+
}
1166+
}
1167+
1168+
Debug(env,
1169+
DebugCategory::DIAGNOSTICS,
1170+
"max_young_gen_size=%" PRIu64 ", "
1171+
"young_gen_size=%" PRIu64 ", "
1172+
"old_gen_size=%" PRIu64 ", "
1173+
"total_size=%" PRIu64 "\n",
1174+
static_cast<uint64_t>(max_young_gen_size),
1175+
static_cast<uint64_t>(young_gen_size),
1176+
static_cast<uint64_t>(old_gen_size),
1177+
static_cast<uint64_t>(young_gen_size + old_gen_size));
1178+
1179+
uint64_t available = GuessMemoryAvailableToTheProcess();
1180+
// TODO(joyeecheung): get a better estimate about the native memory
1181+
// usage into the overhead, e.g. based on the count of objects.
1182+
uint64_t estimated_overhead = max_young_gen_size;
1183+
Debug(env,
1184+
DebugCategory::DIAGNOSTICS,
1185+
"Estimated available memory=%" PRIu64 ", "
1186+
"estimated overhead=%" PRIu64 "\n",
1187+
static_cast<uint64_t>(available),
1188+
static_cast<uint64_t>(estimated_overhead));
1189+
1190+
// This might be hit when the snapshot is being taken in another
1191+
// NearHeapLimitCallback invocation.
1192+
// When taking the snapshot, objects in the young generation may be
1193+
// promoted to the old generation, result in increased heap usage,
1194+
// but it should be no more than the young generation size.
1195+
// Ideally, this should be as small as possible - the heap limit
1196+
// can only be restored when the heap usage falls down below the
1197+
// new limit, so in a heap with unbounded growth the isolate
1198+
// may eventually crash with this new limit - effectively raising
1199+
// the heap limit to the new one.
1200+
if (env->is_processing_heap_limit_callback_) {
1201+
size_t new_limit = initial_heap_limit + max_young_gen_size;
1202+
Debug(env,
1203+
DebugCategory::DIAGNOSTICS,
1204+
"Not generating snapshots in nested callback. "
1205+
"new_limit=%" PRIu64 "\n",
1206+
static_cast<uint64_t>(new_limit));
1207+
return new_limit;
1208+
}
1209+
1210+
// Estimate whether the snapshot is going to use up all the memory
1211+
// available to the process. If so, just give up to prevent the system
1212+
// from killing the process for a system OOM.
1213+
if (estimated_overhead > available) {
1214+
Debug(env,
1215+
DebugCategory::DIAGNOSTICS,
1216+
"Not generating snapshots because it's too risky.\n");
1217+
env->isolate()->RemoveNearHeapLimitCallback(NearHeapLimitCallback,
1218+
initial_heap_limit);
1219+
return current_heap_limit;
1220+
}
1221+
1222+
// Take the snapshot synchronously.
1223+
env->is_processing_heap_limit_callback_ = true;
1224+
1225+
std::string dir = env->options()->diagnostic_dir;
1226+
if (dir.empty()) {
1227+
dir = env->GetCwd();
1228+
}
1229+
DiagnosticFilename name(env, "Heap", "heapsnapshot");
1230+
std::string filename = dir + kPathSeparator + (*name);
1231+
1232+
Debug(env, DebugCategory::DIAGNOSTICS, "Start generating %s...\n", *name);
1233+
1234+
// Remove the callback first in case it's triggered when generating
1235+
// the snapshot.
1236+
env->isolate()->RemoveNearHeapLimitCallback(NearHeapLimitCallback,
1237+
initial_heap_limit);
1238+
1239+
heap::WriteSnapshot(env->isolate(), filename.c_str());
1240+
env->heap_limit_snapshot_taken_ += 1;
1241+
1242+
// Don't take more snapshots than the number specified by
1243+
// --heapsnapshot-near-heap-limit.
1244+
if (env->heap_limit_snapshot_taken_ <
1245+
env->options_->heap_snapshot_near_heap_limit) {
1246+
env->isolate()->AddNearHeapLimitCallback(NearHeapLimitCallback, env);
1247+
}
1248+
1249+
FPrintF(stderr, "Wrote snapshot to %s\n", filename.c_str());
1250+
// Tell V8 to reset the heap limit once the heap usage falls down to
1251+
// 95% of the initial limit.
1252+
env->isolate()->AutomaticallyRestoreInitialHeapLimit(0.95);
1253+
1254+
env->is_processing_heap_limit_callback_ = false;
1255+
return initial_heap_limit;
1256+
}
1257+
11121258
inline size_t Environment::SelfSize() const {
11131259
size_t size = sizeof(*this);
11141260
// Remove non pointer fields that will be tracked in MemoryInfo()

src/env.h

+10
Original file line numberDiff line numberDiff line change
@@ -537,6 +537,7 @@ class IsolateData : public MemoryRetainer {
537537
#undef VP
538538
inline v8::Local<v8::String> async_wrap_provider(int index) const;
539539

540+
size_t max_young_gen_size = 1;
540541
std::unordered_map<const char*, v8::Eternal<v8::String>> static_str_map;
541542

542543
inline v8::Isolate* isolate() const;
@@ -857,6 +858,9 @@ class Environment : public MemoryRetainer {
857858
void VerifyNoStrongBaseObjects();
858859
// Should be called before InitializeInspector()
859860
void InitializeDiagnostics();
861+
862+
std::string GetCwd();
863+
860864
#if HAVE_INSPECTOR
861865
// If the environment is created for a worker, pass parent_handle and
862866
// the ownership if transferred into the Environment.
@@ -1220,6 +1224,9 @@ class Environment : public MemoryRetainer {
12201224
inline void RemoveCleanupHook(CleanupCallback cb, void* arg);
12211225
void RunCleanup();
12221226

1227+
static size_t NearHeapLimitCallback(void* data,
1228+
size_t current_heap_limit,
1229+
size_t initial_heap_limit);
12231230
static void BuildEmbedderGraph(v8::Isolate* isolate,
12241231
v8::EmbedderGraph* graph,
12251232
void* data);
@@ -1340,6 +1347,9 @@ class Environment : public MemoryRetainer {
13401347
std::vector<std::string> argv_;
13411348
std::string exec_path_;
13421349

1350+
bool is_processing_heap_limit_callback_ = false;
1351+
int64_t heap_limit_snapshot_taken_ = 0;
1352+
13431353
uint32_t module_id_counter_ = 0;
13441354
uint32_t script_id_counter_ = 0;
13451355
uint32_t function_id_counter_ = 0;

src/heap_utils.cc

+3-3
Original file line numberDiff line numberDiff line change
@@ -313,7 +313,9 @@ inline void TakeSnapshot(Isolate* isolate, v8::OutputStream* out) {
313313
snapshot->Serialize(out, HeapSnapshot::kJSON);
314314
}
315315

316-
inline bool WriteSnapshot(Isolate* isolate, const char* filename) {
316+
} // namespace
317+
318+
bool WriteSnapshot(Isolate* isolate, const char* filename) {
317319
FILE* fp = fopen(filename, "w");
318320
if (fp == nullptr)
319321
return false;
@@ -323,8 +325,6 @@ inline bool WriteSnapshot(Isolate* isolate, const char* filename) {
323325
return true;
324326
}
325327

326-
} // namespace
327-
328328
void DeleteHeapSnapshot(const HeapSnapshot* snapshot) {
329329
const_cast<HeapSnapshot*>(snapshot)->Delete();
330330
}

src/inspector_profiler.cc

+2-18
Original file line numberDiff line numberDiff line change
@@ -418,22 +418,6 @@ static void EndStartedProfilers(Environment* env) {
418418
}
419419
}
420420

421-
std::string GetCwd(Environment* env) {
422-
char cwd[PATH_MAX_BYTES];
423-
size_t size = PATH_MAX_BYTES;
424-
const int err = uv_cwd(cwd, &size);
425-
426-
if (err == 0) {
427-
CHECK_GT(size, 0);
428-
return cwd;
429-
}
430-
431-
// This can fail if the cwd is deleted. In that case, fall back to
432-
// exec_path.
433-
const std::string& exec_path = env->exec_path();
434-
return exec_path.substr(0, exec_path.find_last_of(kPathSeparator));
435-
}
436-
437421
void StartProfilers(Environment* env) {
438422
AtExit(env, [](void* env) {
439423
EndStartedProfilers(static_cast<Environment*>(env));
@@ -451,7 +435,7 @@ void StartProfilers(Environment* env) {
451435
if (env->options()->cpu_prof) {
452436
const std::string& dir = env->options()->cpu_prof_dir;
453437
env->set_cpu_prof_interval(env->options()->cpu_prof_interval);
454-
env->set_cpu_prof_dir(dir.empty() ? GetCwd(env) : dir);
438+
env->set_cpu_prof_dir(dir.empty() ? env->GetCwd() : dir);
455439
if (env->options()->cpu_prof_name.empty()) {
456440
DiagnosticFilename filename(env, "CPU", "cpuprofile");
457441
env->set_cpu_prof_name(*filename);
@@ -466,7 +450,7 @@ void StartProfilers(Environment* env) {
466450
if (env->options()->heap_prof) {
467451
const std::string& dir = env->options()->heap_prof_dir;
468452
env->set_heap_prof_interval(env->options()->heap_prof_interval);
469-
env->set_heap_prof_dir(dir.empty() ? GetCwd(env) : dir);
453+
env->set_heap_prof_dir(dir.empty() ? env->GetCwd() : dir);
470454
if (env->options()->heap_prof_name.empty()) {
471455
DiagnosticFilename filename(env, "Heap", "heapprofile");
472456
env->set_heap_prof_name(*filename);

src/node.cc

+4
Original file line numberDiff line numberDiff line change
@@ -267,6 +267,10 @@ static void AtomicsWaitCallback(Isolate::AtomicsWaitEvent event,
267267
void Environment::InitializeDiagnostics() {
268268
isolate_->GetHeapProfiler()->AddBuildEmbedderGraphCallback(
269269
Environment::BuildEmbedderGraph, this);
270+
if (options_->heap_snapshot_near_heap_limit > 0) {
271+
isolate_->AddNearHeapLimitCallback(Environment::NearHeapLimitCallback,
272+
this);
273+
}
270274
if (options_->trace_uncaught)
271275
isolate_->SetCaptureStackTraceForUncaughtExceptions(true);
272276
if (options_->trace_atomics_wait) {

0 commit comments

Comments
 (0)