Observing Host OS panic/crash due to use-after-free error related to CPU OS memory when gdr_unmap
is not called before gdr_close
#314
Milestone
gdr_unmap
is not called before gdr_close
#314
Impacted platform
All server side products, first observed on Grace-Hopper system
Impacted gdrcopy versions
2.0 and later
Impacted gdrcopy configs
Both persistent and non-persistent mode
Scenarios
If an application opens a connection to the driver (
gdr_open
), allocates a GPU memory via CUDA, pins and maps the allocated memory to CPU (gdr_pin_buffer
,gdr_map
) for read/write operations. Subsequently, if the application closes the connection, without explicitly unmapping the GPU memory, it results in a use-after-free (UAF) condition of OS memory, which can result in functional issues in unrelated areas, or even kernel panic or crash.Known Mitigations
applications should explicitly call
gdr_unmap
beforegdr_close
.Fixed gdrcopy version
2.4.4
The text was updated successfully, but these errors were encountered: