Memory size, overcommit, limit

From HCL
Revision as of 13:29, 9 September 2013 by Davepc (talk | contribs) (Created page with "== Paging and the OOM-Killer == Due to the nature of experiments our group runs, we often induce heavy paging and complete exhaustion of available memory on certain nodes. Linux …")

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Paging and the OOM-Killer

Due to the nature of experiments our group runs, we often induce heavy paging and complete exhaustion of available memory on certain nodes. Linux has a pair of strategies to deal with heavy memory use. First, is overcommitting. This is where a process is allowed allocate or fork even when there is no more memory available. You can seem some interesting numbers here:[1]. The assumption is that processes may not use all memory that they allocate and failing on allocation is worse than failing at a later date when the memory use is actually required. More processes may be supported by allowing them to allocate memory (provided they do not use it all). The second part of the strategy is the Out-of-Memory killer (OOM Killer). When memory has been exhausted and a process tries to use some 'overcommitted' part of memory, the OOM killer is invoked. It's job is to rank all processes in terms of their memory use, priority, privilege and some other parameters, and then select a process to kill based on the ranks.

The argument for using overcommital+OOM Killer is that rather than failing to allocate memory for some random unlucky process, which as a result would probably terminate, the kernel can instead allow the unlucky process to continue executing and then make a some-what-informed decision on which process to kill. Unfortunately, the behaviour of the OOM-killer sometimes causes problems which grind the machine to a complete halt, particularly when it decides to kill system processes. There is a good discussion on the OOM-killer here: [2]

For this reason overcommit has been disabled on the HCL cluster.

cat /proc/sys/vm/overcommit_memory 
2
cat /proc/sys/vm/overcommit_ratio 
100

To restore to default overcommit

# echo 0 > /proc/sys/vm/overcommit_memory
# echo 50 > /proc/sys/vm/overcommit_ratio

Manually Limit the Memory on the OS level

as root edit /etc/default/grub

GRUB_CMDLINE_LINUX_DEFAULT="quiet mem=128M"

then run the command

update-grub
reboot