-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update ARC memory limits to account for SLUB internal fragmentation #660
Conversation
This patch seems to have had the side effect of increasing the arc hit rate on my desktop. Memory pressure had led to the ARC being purged quite frequently, which reduced its effective hit rate to about 95%. With this patch, the effective hit rate is 99%. I calculated this according to 1 - hits / ( hits + misses ), using the values from /proc/spl/kstat/zfs/arcstats. |
Interesting, and down right counter intuitive. Decreasing the cache since increases the odds of a cache hit, at least for desktop workloads. I think this might be reasonable for desktops but probably not OK for large memory servers. I think we'd have better luck addressing these issues in the short term by decreasing SPL_KMEM_CACHE_OBJ_PER_SLAB and SPL_KMEM_CACHE_OBJ_PER_SLAB_MIN. By decreasing these values we put fewer objects on a slab by default which increases the odds of being able to free them despite fragmentation. I picked these values originally as a best guess long before I had a working ZPL, so there's a good chance they are not optimal. It would be interesting to decrease them by half and see the effect. Additionally, making the changes described here would also be a good short term improvement. The long term solution of course is to move these buffers off the slab and in to the Linux page cache. That however, will need to wait for 0.7.0 at the earliest since that's a significant change. |
The commit message for 23bdb07 suggests that no accounting is done for internal fragmentation. If I recall correctly, the SLUB allocator's internal fragmentation chracteristics match those of the HOARD allocator, for which a paper was published. Here a link from Google: https://parasol.tamu.edu/~rwerger//Courses/689/spring2002/day-3-ParMemAlloc/papers/berger00hoard.pdf The theoretical limit on fragmentation for HOARD is 50%, which I believe is also true for SLUB. With the current limit of all but 4GB on a 16GB system, the maximum arc size with worst case fragmentation would be 24GB, with the unweighted average being 18GB. Both figures exceed what the system can actually store. This reality should only become worse with larger memory systems. My home server has 16GB of RAM with a 6-disk raidz2 vdev for its pool. This patch appears to have addressed instability that I observed when doing simultaneous rsyncs of the mirrors for Gentoo, FreeBSD, OpenBSD, NetBSD and Dragonfly BSD with a few virtual machines running. Without it and with the patch from issue #618, the system would crash within 12 hours. My desktop has 8GB of RAM. This effect was not nearly as pronounced as it was on my server. I believe that is because the probability of internal fragmentation causing the ARC to exceed a sane limit was remarkably low. Despite that, there was still significant memory pressure. I believe that caused excessive reclaims, which had a negative impact on the ARC hit rate. Making the zfs_arc_max default of 1/2 of system memory might be safe, although 1/3 of system memory would seem to be a better figure. With that said, I am fairly certain that the current all but 4GB default on large memory systems causes issues. |
It's true that the ARC doesn't account for internal fragmentation in either the spl or slub allocators. It's also true that we do over commit memory on the system. However, most of that memory should be easily reclaimable by the shrinker callback since we only allow 1/4 of the ARC to be dirty. That said, I'm not one to argue with the empirical evidence it helps. So I'm certainly willing to consider pulling in this change once I understand why it helps. When you mention instability on these systems what exactly happens? Does the system panic, go non-responsive, deadlock in some way? Do these systems happen to have an L2ARC device? We recently identified a deadlock in that code which is made worse by the VM patch, although it's been a long standing but rare issue. It may be possible this is in fact what your hitting, but it's hard to say without a stack from the system. See commit behlendorf/zfs@85ab09b |
My systems lack L2ARC devices. I described what happened on my server in the following comment: |
This patch closes issue #642. |
I'm looking at pulling this patch in to the master tree and wondering if you've done any testing with it set to 1/2 of memory instead of a 1/3. |
My desktop has 8GB of RAM and I had no stability issues with it before applying this patch. However, it has a small SSD, so I am not able to test the kinds of rsyncs on it that I tested on my server. For what it is worth, I now think 1/2 would be safe because allocations seem to be biased toward powers of 2. I will not have time to do testing on my server until next week. I have asked @tstudios to test zfs_arc_max set to 1/2 of his RAM in issue #642. If he has time that he is willing to volunteer, we should be able to get feedback before I can look into this. |
I have changed this patch to set arc_c_max to 1/2 instead of 1/3. |
23bdb07 updated the ARC memory limits to be 1/2 of memory or all but 4GB. Unfortunately, these values assume zero internal fragmentation in the SLUB allocator, when in reality, the internal fragmentation could be as high as 50%, effectively doubling memory usage. This poses clear safety issues, because it permits the size of ARC to exceed system memory. This patch changes this so that the default value of arc_c_max is always 1/2 of system memory. This effectively limits the ARC to the memory that the system has physically installed.
Add missing parenthesis around btop and ptob macros to ensure operation ordering is preserved after expansion. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#660
23bdb07 updated the ARC memory limits
to be 1/2 of memory or all but 4GB. Unfortunately, these values assume
zero internal fragmentation in the SLUB allocator, when in reality, the
internal fragmentation could be as high as 50%, effectively doubling
memory usage. This poses clear safety issues, because it permits the
size of ARC to exceed system memory.
This patch changes this so that the default value of arc_c_max is always
1/2 of system memory. This effectively limits the ARC to the memory that
the system has physically installed.