Skip to content

Commit 6486678

Browse files
Michal Hockohnaz
Michal Hocko
authored andcommitted
Revert "mm, mempool: only set __GFP_NOMEMALLOC if there are free elements"
This reverts f9054c7 ("mm, mempool: only set __GFP_NOMEMALLOC if there are free elements"). There has been a report about OOM killer invoked when swapping out to a dm-crypt device. The primary reason seems to be that the swapout out IO managed to completely deplete memory reserves. Ondrej was able to bisect and explained the issue by pointing to f9054c7 ("mm, mempool: only set __GFP_NOMEMALLOC if there are free elements"). The reason is that the swapout path is not throttled properly because the md-raid layer needs to allocate from the generic_make_request path which means it allocates from the PF_MEMALLOC context. dm layer uses mempool_alloc in order to guarantee a forward progress which used to inhibit access to memory reserves when using page allocator. This has changed by f9054c7 ("mm, mempool: only set __GFP_NOMEMALLOC if there are free elements") which has dropped the __GFP_NOMEMALLOC protection when the memory pool is depleted. If we are running out of memory and the only way forward to free memory is to perform swapout we just keep consuming memory reserves rather than throttling the mempool allocations and allowing the pending IO to complete up to a moment when the memory is depleted completely and there is no way forward but invoking the OOM killer. This is less than optimal. The original intention of f9054c7 was to help with the OOM situations where the oom victim depends on mempool allocation to make a forward progress. David has mentioned the following backtrace: schedule schedule_timeout io_schedule_timeout mempool_alloc __split_and_process_bio dm_request generic_make_request submit_bio mpage_readpages ext4_readpages __do_page_cache_readahead ra_submit filemap_fault handle_mm_fault __do_page_fault do_page_fault page_fault We do not know more about why the mempool is depleted without being replenished in time, though. In any case the dm layer shouldn't depend on any allocations outside of the dedicated pools so a forward progress should be guaranteed. If this is not the case then the dm should be fixed rather than papering over the problem and postponing it to later by accessing more memory reserves. mempools are a mechanism to maintain dedicated memory reserves to guaratee forward progress. Allowing them an unbounded access to the page allocator memory reserves is going against the whole purpose of this mechanism. Bisected by Ondrej Kozina. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Reported-by: Ondrej Kozina <[email protected]> Reviewed-by: Johannes Weiner <[email protected]> Acked-by: NeilBrown <[email protected]> Cc: David Rientjes <[email protected]> Cc: Mikulas Patocka <[email protected]> Cc: Ondrej Kozina <[email protected]> Cc: Tetsuo Handa <[email protected]> Cc: Mel Gorman <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
1 parent c12a1dd commit 6486678

File tree

1 file changed

+4
-16
lines changed

1 file changed

+4
-16
lines changed

mm/mempool.c

+4-16
Original file line numberDiff line numberDiff line change
@@ -306,36 +306,25 @@ EXPORT_SYMBOL(mempool_resize);
306306
* returns NULL. Note that due to preallocation, this function
307307
* *never* fails when called from process contexts. (it might
308308
* fail if called from an IRQ context.)
309-
* Note: neither __GFP_NOMEMALLOC nor __GFP_ZERO are supported.
309+
* Note: using __GFP_ZERO is not supported.
310310
*/
311-
void *mempool_alloc(mempool_t *pool, gfp_t gfp_mask)
311+
void * mempool_alloc(mempool_t *pool, gfp_t gfp_mask)
312312
{
313313
void *element;
314314
unsigned long flags;
315315
wait_queue_t wait;
316316
gfp_t gfp_temp;
317317

318-
/* If oom killed, memory reserves are essential to prevent livelock */
319-
VM_WARN_ON_ONCE(gfp_mask & __GFP_NOMEMALLOC);
320-
/* No element size to zero on allocation */
321318
VM_WARN_ON_ONCE(gfp_mask & __GFP_ZERO);
322-
323319
might_sleep_if(gfp_mask & __GFP_DIRECT_RECLAIM);
324320

321+
gfp_mask |= __GFP_NOMEMALLOC; /* don't allocate emergency reserves */
325322
gfp_mask |= __GFP_NORETRY; /* don't loop in __alloc_pages */
326323
gfp_mask |= __GFP_NOWARN; /* failures are OK */
327324

328325
gfp_temp = gfp_mask & ~(__GFP_DIRECT_RECLAIM|__GFP_IO);
329326

330327
repeat_alloc:
331-
if (likely(pool->curr_nr)) {
332-
/*
333-
* Don't allocate from emergency reserves if there are
334-
* elements available. This check is racy, but it will
335-
* be rechecked each loop.
336-
*/
337-
gfp_temp |= __GFP_NOMEMALLOC;
338-
}
339328

340329
element = pool->alloc(gfp_temp, pool->pool_data);
341330
if (likely(element != NULL))
@@ -359,12 +348,11 @@ void *mempool_alloc(mempool_t *pool, gfp_t gfp_mask)
359348
* We use gfp mask w/o direct reclaim or IO for the first round. If
360349
* alloc failed with that and @pool was empty, retry immediately.
361350
*/
362-
if ((gfp_temp & ~__GFP_NOMEMALLOC) != gfp_mask) {
351+
if (gfp_temp != gfp_mask) {
363352
spin_unlock_irqrestore(&pool->lock, flags);
364353
gfp_temp = gfp_mask;
365354
goto repeat_alloc;
366355
}
367-
gfp_temp = gfp_mask;
368356

369357
/* We must not sleep if !__GFP_DIRECT_RECLAIM */
370358
if (!(gfp_mask & __GFP_DIRECT_RECLAIM)) {

0 commit comments

Comments
 (0)