cgroup OOM killer loop causes system to lockup (possible fix included)

Cal Leeming [Simplicity Media Ltd]'s picture

Forums: 

After submitting this problem to Kernel mailing list, I found a strange
(and even more curious problem).

I attempted to run both kernels on two seperate installs of Debian (both
6.0), but slightly different in packages.. One install did not have the
OOM problem, yet the other one did.. And they both ran the same kernel,
the same LXC container etc..

I have a feeling this may be a Debian bug..? Further to back up these
claims:

Working server:
root@courtney.internal [/mnt/encstore/lxc/github01.rootfs] > for x in
`find /proc -iname 'oom_adj' | xargs grep "\-17" | awk -F '/' '{print
$3}'` ; do ps -p $x u --no-headers ; done
grep: /proc/3090/task/3090/oom_adj: No such file or directory
grep: /proc/3090/oom_adj: No such file or directory
root 804 0.0 0.0 16992 1112 ? S for x in `find /proc -iname 'oom_adj' | xargs grep
"\-17" | awk -F '/' '{print $3}'` ; do ps -p $x u --no-headers ; done
grep: /proc/4597/task/4597/oom_adj: No such file or directory
grep: /proc/4597/oom_adj: No such file or directory
root 889 0.0 0.0 16788 860 ? S
Organization: Simplicity Media Ltd
To: linux-kernel@vger.kernel.org, linux-rt-users@vger.kernel.org

Some further logs:
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.369927] redis-server
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.369939]
[] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.399285] redis-server
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.399296]
[] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.428690] redis-server
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.428702]
[] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.487696] redis-server
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.487708]
[] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.517023] redis-server
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.517035]
[] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.546379] redis-server
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.546391]
[] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.310789] redis-server
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.310804]
[] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.369918] redis-server
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.369930]
[] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.399284] redis-server
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.399296]
[] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.433634] redis-server
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.433648]
[] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.463947] redis-server
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.463959]
[] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.493439] redis-server
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.493451]
[] ? oom_kill_process+0x82/0x283

On 29/05/2011 22:50, Cal Leeming [Simplicity Media Ltd] wrote:
> First of all, my apologies if I have submitted this problem to the
> wrong place, spent 20 minutes trying to figure out where it needs to
> be sent, and was still none the wiser.
>
> The problem is related to applying memory limitations within a cgroup.
> If the OOM killer kicks in, it gets stuck in a loop where it tries to
> kill a process which has an oom_adj of -17. This causes an infinite
> loop, which in turn locks up the system.
>
> May 30 03:13:08 vicky kernel: [ 1578.117055] Memory cgroup out of
> memory: kill process 6016 (java) score 0 or a child
> May 30 03:13:08 vicky kernel: [ 1578.117154] Memory cgroup out of
> memory: kill process 6016 (java) score 0 or a child
> May 30 03:13:08 vicky kernel: [ 1578.117248] Memory cgroup out of
> memory: kill process 6016 (java) score 0 or a child
> May 30 03:13:08 vicky kernel: [ 1578.117343] Memory cgroup out of
> memory: kill process 6016 (java) score 0 or a child
> May 30 03:13:08 vicky kernel: [ 1578.117441] Memory cgroup out of
> memory: kill process 6016 (java) score 0 or a child
>
>
> root@vicky [/home/foxx]> uname -a
> Linux vicky 2.6.32.41-grsec #3 SMP Mon May 30 02:34:43 BST 2011 x86_64
> GNU/Linux
> (this happens on both the grsec patched and non patched 2.6.32.41 kernel)
>
> When this is encountered, the memory usage across the whole server is
> still within limits (not even hitting swap).
>
> The memory configuration for the cgroup/lxc is:
> lxc.cgroup.memory.limit_in_bytes = 3000M
> lxc.cgroup.memory.memsw.limit_in_bytes = 3128M
>
> Now, what is even more strange, is that when running under the
> 2.6.32.28 kernel (both patched and unpatched), this problem doesn't
> happen. However, there is a slight difference between the two kernels.
> The 2.6.32.28 kernel gives a default of 0 in the /proc/X/oom_adj,
> where as the 2.6.32.41 gives a default of -17. I suspect this is the
> root cause of why it's showing in the later kernel, but not the earlier.
>
> To test this theory, I started up the lxc on both servers, and then
> ran a one liner which showed me all the processes with an oom_adj of -17:
>
> (the below is the older/working kernel)
> root@courtney.internal [/mnt/encstore/lxc]> uname -a
> Linux courtney.internal 2.6.32.28-grsec #3 SMP Fri Feb 18 16:09:07 GMT
> 2011 x86_64 GNU/Linux
> root@courtney.internal [/mnt/encstore/lxc]> for x in `find /proc
> -iname 'oom_adj' | xargs grep "\-17" | awk -F '/' '{print $3}'` ; do
> ps -p $x --no-headers ; done
> grep: /proc/1411/task/1411/oom_adj: No such file or directory
> grep: /proc/1411/oom_adj: No such file or directory
> 804 ? 00:00:00 udevd
> 804 ? 00:00:00 udevd
> 25536 ? 00:00:00 sshd
> 25536 ? 00:00:00 sshd
> 31861 ? 00:00:00 sshd
> 31861 ? 00:00:00 sshd
> 32173 ? 00:00:00 udevd
> 32173 ? 00:00:00 udevd
> 32174 ? 00:00:00 udevd
> 32174 ? 00:00:00 udevd
>
> (the below is the newer/broken kernel)
> root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]> uname -a
> Linux vicky 2.6.32.41-grsec #3 SMP Mon May 30 02:34:43 BST 2011 x86_64
> GNU/Linux
> root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]> for x in
> `find /proc -iname 'oom_adj' | xargs grep "\-17" | awk -F '/' '{print
> $3}'` ; do ps -p $x --no-headers ; done
> grep: /proc/3118/task/3118/oom_adj: No such file or directory
> grep: /proc/3118/oom_adj: No such file or directory
> 895 ? 00:00:00 udevd
> 895 ? 00:00:00 udevd
> 1091 ? 00:00:00 udevd
> 1091 ? 00:00:00 udevd
> 1092 ? 00:00:00 udevd
> 1092 ? 00:00:00 udevd
> 2596 ? 00:00:00 sshd
> 2596 ? 00:00:00 sshd
> 2608 ? 00:00:00 sshd
> 2608 ? 00:00:00 sshd
> 2613 ? 00:00:00 sshd
> 2613 ? 00:00:00 sshd
> 2614 pts/0 00:00:00 bash
> 2614 pts/0 00:00:00 bash
> 2620 pts/0 00:00:00 sudo
> 2620 pts/0 00:00:00 sudo
> 2621 pts/0 00:00:00 su
> 2621 pts/0 00:00:00 su
> 2622 pts/0 00:00:00 bash
> 2622 pts/0 00:00:00 bash
> 2685 ? 00:00:00 lxc-start
> 2685 ? 00:00:00 lxc-start
> 2699 ? 00:00:00 init
> 2699 ? 00:00:00 init
> 2939 ? 00:00:00 rc
> 2939 ? 00:00:00 rc
> 2942 ? 00:00:00 startpar
> 2942 ? 00:00:00 startpar
> 2964 ? 00:00:00 rsyslogd
> 2964 ? 00:00:00 rsyslogd
> 2964 ? 00:00:00 rsyslogd
> 2964 ? 00:00:00 rsyslogd
> 2980 ? 00:00:00 startpar
> 2980 ? 00:00:00 startpar
> 2981 ? 00:00:00 ctlscript.sh
> 2981 ? 00:00:00 ctlscript.sh
> 3016 ? 00:00:00 cron
> 3016 ? 00:00:00 cron
> 3025 ? 00:00:00 mysqld_safe
> 3025 ? 00:00:00 mysqld_safe
> 3032 ? 00:00:00 sshd
> 3032 ? 00:00:00 sshd
> 3097 ? 00:00:00 mysqld.bin
> 3097 ? 00:00:00 mysqld.bin
> 3097 ? 00:00:00 mysqld.bin
> 3097 ? 00:00:00 mysqld.bin
> 3097 ? 00:00:00 mysqld.bin
> 3097 ? 00:00:00 mysqld.bin
> 3097 ? 00:00:00 mysqld.bin
> 3097 ? 00:00:00 mysqld.bin
> 3097 ? 00:00:00 mysqld.bin
> 3097 ? 00:00:00 mysqld.bin
> 3113 ? 00:00:00 ctl.sh
> 3113 ? 00:00:00 ctl.sh
> 3115 ? 00:00:00 sleep
> 3115 ? 00:00:00 sleep
> 3116 ? 00:00:00 .memcached.bin
> 3116 ? 00:00:00 .memcached.bin
>
>
> As you can see, it is clear that the newer kernel is setting -17 by
> default, which in turn is causing the OOM killer loop.
>
> So I began to try and find what may have caused this problem by
> comparing the two sources...
>
> I checked the code for all references to 'oom_adj' and 'oom_adjust' in
> both code sets, but found no obvious differences:
> grep -R -e oom_adjust -e oom_adj . | sort | grep -R -e oom_adjust -e
> oom_adj
>
> Then I checked for references to "-17" in all .c and .h files, and
> found a couple of matches, but only one obvious one:
> grep -R "\-17" . | grep -e ".c:" -e ".h:" -e "\-17" | wc -l
> ./include/linux/oom.h:#define OOM_DISABLE (-17)
>
> But again, a search for OOM_DISABLE came up with nothing obvious...
>
> In a last ditch attempt, I did a search for all references to 'oom'
> (case-insensitive) in both code bases, then compared the two:
> root@annabelle [~/lol/linux-2.6.32.28]> grep -i -R "oom" . | sort -n
> > /tmp/annabelle.oom_adj
> root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]> grep -i -R
> "oom" . | sort -n> /tmp/vicky.oom_adj
>
> and this brought back (yet again) nothing obvious..
>
>
> root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]> md5sum
> ./include/linux/oom.h
> 2a32622f6cd38299fc2801d10a9a3ea8 ./include/linux/oom.h
>
> root@annabelle [~/lol/linux-2.6.32.28]> md5sum ./include/linux/oom.h
> 2a32622f6cd38299fc2801d10a9a3ea8 ./include/linux/oom.h
>
> root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]> md5sum
> ./mm/oom_kill.c
> 1ef2c2bec19868d13ec66ec22033f10a ./mm/oom_kill.c
>
> root@annabelle [~/lol/linux-2.6.32.28]> md5sum ./mm/oom_kill.c
> 1ef2c2bec19868d13ec66ec22033f10a ./mm/oom_kill.c
>
>
>
> Could anyone please shed some light as to why the default oom_adj is
> set to -17 now (and where it is actually set)? From what I can tell,
> the fix for this issue will either be:
>
> 1. Allow OOM killer to override the decision of ignoring oom_adj ==
> -17 if an unrecoverable loop is encountered.
> 2. Change the default back to 0.
>
> Again, my apologies if this bug report is slightly unorthodox, or
> doesn't follow usual procedure etc. I can assure you I have tried my
> absolute best to give all the necessary information though.
>
> Cal
>

cgroup OOM killer loop causes system to lockup (possible fix inc

Cal Leeming [Simplicity Media Ltd]'s picture

Another quick update on this...

Something really really really strange has happened this time.. I just
did a completely fresh re-install (exactly like last time) and the
problem went away.

So, I'm going to build up the lxc / kernel support step by step (and
check for the oom_adj problem after each change) and will report back
once I've found the step it kicks in at.

Hopefully, my hours of frustration will prevent someone else from going
through the same thing :/

Cal

On 29/05/2011 23:41, Cal Leeming [Simplicity Media Ltd] wrote:
> After submitting this problem to Kernel mailing list, I found a
> strange (and even more curious problem).
>
> I attempted to run both kernels on two seperate installs of Debian
> (both 6.0), but slightly different in packages.. One install did not
> have the OOM problem, yet the other one did.. And they both ran the
> same kernel, the same LXC container etc..
>
> I have a feeling this may be a Debian bug..? Further to back up these
> claims:
>
> Working server:
> root@courtney.internal [/mnt/encstore/lxc/github01.rootfs] > for x in
> `find /proc -iname 'oom_adj' | xargs grep "\-17" | awk -F '/' '{print
> $3}'` ; do ps -p $x u --no-headers ; done
> grep: /proc/3090/task/3090/oom_adj: No such file or directory
> grep: /proc/3090/oom_adj: No such file or directory
> root 804 0.0 0.0 16992 1112 ? S --daemon
> root 804 0.0 0.0 16992 1112 ? S --daemon
> root 25536 0.0 0.0 49164 1100 ? Ss Apr05 0:00
> /usr/sbin/sshd
> root 25536 0.0 0.0 49164 1100 ? Ss Apr05 0:00
> /usr/sbin/sshd
> root 32173 0.0 0.0 16988 960 ? S --daemon
> root 32173 0.0 0.0 16988 960 ? S --daemon
> root 32174 0.0 0.0 16988 916 ? S --daemon
> root 32174 0.0 0.0 16988 916 ? S --daemon
>
> Broken server:
> root@vicky [/var] > for x in `find /proc -iname 'oom_adj' | xargs
> grep "\-17" | awk -F '/' '{print $3}'` ; do ps -p $x u --no-headers ;
> done
> grep: /proc/4597/task/4597/oom_adj: No such file or directory
> grep: /proc/4597/oom_adj: No such file or directory
> root 889 0.0 0.0 16788 860 ? S --daemon
> root 889 0.0 0.0 16788 860 ? S --daemon
> root 1104 0.0 0.0 17228 1200 ? S --daemon
> root 1104 0.0 0.0 17228 1200 ? S --daemon
> root 1141 0.0 0.0 17120 1128 ? S --daemon
> root 1141 0.0 0.0 17120 1128 ? S --daemon
> root 2745 0.0 0.0 49168 1168 ? Ss 09:10 0:00
> /usr/sbin/sshd
> root 2745 0.0 0.0 49168 1168 ? Ss 09:10 0:00
> /usr/sbin/sshd
> root 2750 0.0 0.0 76672 3360 ? Ss 09:11 0:00 sshd:
> foxx [priv]
> root 2750 0.0 0.0 76672 3360 ? Ss 09:11 0:00 sshd:
> foxx [priv]
> foxx 2754 0.0 0.0 76672 1636 ? S 09:11 0:00 sshd:
> foxx@pts/0
> foxx 2754 0.0 0.0 76672 1636 ? S 09:11 0:00 sshd:
> foxx@pts/0
> foxx 2755 0.0 0.0 123984 1984 pts/0 Ss 09:11 0:00 -bash
> foxx 2755 0.0 0.0 123984 1984 pts/0 Ss 09:11 0:00 -bash
> root 2760 0.0 0.0 127176 1244 pts/0 S 09:11 0:00 sudo su
> root 2760 0.0 0.0 127176 1244 pts/0 S 09:11 0:00 sudo su
> root 2761 0.0 0.0 147820 1328 pts/0 S 09:11 0:00 su
> root 2761 0.0 0.0 147820 1328 pts/0 S 09:11 0:00 su
> root 2762 0.0 0.0 124084 2152 pts/0 S 09:11 0:00 bash
> root 2762 0.0 0.0 124084 2152 pts/0 S 09:11 0:00 bash
>
> Notice how the sshd user instances on the broken server are set to
> -17, yet the sshd user instances on the working server, are not :S
>
> Any input from anyone on the below would be VERY much appreciated.
>
> Cal
>
> -------- Original Message --------
> Subject: Re: Fwd: cgroup OOM killer loop causes system to lockup
> (possible fix included)
> Date: Sun, 29 May 2011 23:24:07 +0100
> From: Cal Leeming [Simplicity Media Ltd]
>
> Organization: Simplicity Media Ltd
> To: linux-kernel@vger.kernel.org, linux-rt-users@vger.kernel.org
>
>
>
> Some further logs:
> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.369927] redis-server
> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.369939]
> [] ? oom_kill_process+0x82/0x283
> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.399285] redis-server
> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.399296]
> [] ? oom_kill_process+0x82/0x283
> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.428690] redis-server
> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.428702]
> [] ? oom_kill_process+0x82/0x283
> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.487696] redis-server
> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.487708]
> [] ? oom_kill_process+0x82/0x283
> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.517023] redis-server
> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.517035]
> [] ? oom_kill_process+0x82/0x283
> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.546379] redis-server
> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.546391]
> [] ? oom_kill_process+0x82/0x283
> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.310789] redis-server
> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.310804]
> [] ? oom_kill_process+0x82/0x283
> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.369918] redis-server
> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.369930]
> [] ? oom_kill_process+0x82/0x283
> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.399284] redis-server
> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.399296]
> [] ? oom_kill_process+0x82/0x283
> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.433634] redis-server
> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.433648]
> [] ? oom_kill_process+0x82/0x283
> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.463947] redis-server
> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.463959]
> [] ? oom_kill_process+0x82/0x283
> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.493439] redis-server
> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.493451]
> [] ? oom_kill_process+0x82/0x283
>
>
> On 29/05/2011 22:50, Cal Leeming [Simplicity Media Ltd] wrote:
> > First of all, my apologies if I have submitted this problem to the
> > wrong place, spent 20 minutes trying to figure out where it needs to
> > be sent, and was still none the wiser.
> >
> > The problem is related to applying memory limitations within a cgroup.
> > If the OOM killer kicks in, it gets stuck in a loop where it tries to
> > kill a process which has an oom_adj of -17. This causes an infinite
> > loop, which in turn locks up the system.
> >
> > May 30 03:13:08 vicky kernel: [ 1578.117055] Memory cgroup out of
> > memory: kill process 6016 (java) score 0 or a child
> > May 30 03:13:08 vicky kernel: [ 1578.117154] Memory cgroup out of
> > memory: kill process 6016 (java) score 0 or a child
> > May 30 03:13:08 vicky kernel: [ 1578.117248] Memory cgroup out of
> > memory: kill process 6016 (java) score 0 or a child
> > May 30 03:13:08 vicky kernel: [ 1578.117343] Memory cgroup out of
> > memory: kill process 6016 (java) score 0 or a child
> > May 30 03:13:08 vicky kernel: [ 1578.117441] Memory cgroup out of
> > memory: kill process 6016 (java) score 0 or a child
> >
> >
> > root@vicky [/home/foxx]> uname -a
> > Linux vicky 2.6.32.41-grsec #3 SMP Mon May 30 02:34:43 BST 2011 x86_64
> > GNU/Linux
> > (this happens on both the grsec patched and non patched 2.6.32.41 kernel)
> >
> > When this is encountered, the memory usage across the whole server is
> > still within limits (not even hitting swap).
> >
> > The memory configuration for the cgroup/lxc is:
> > lxc.cgroup.memory.limit_in_bytes = 3000M
> > lxc.cgroup.memory.memsw.limit_in_bytes = 3128M
> >
> > Now, what is even more strange, is that when running under the
> > 2.6.32.28 kernel (both patched and unpatched), this problem doesn't
> > happen. However, there is a slight difference between the two kernels.
> > The 2.6.32.28 kernel gives a default of 0 in the /proc/X/oom_adj,
> > where as the 2.6.32.41 gives a default of -17. I suspect this is the
> > root cause of why it's showing in the later kernel, but not the earlier.
> >
> > To test this theory, I started up the lxc on both servers, and then
> > ran a one liner which showed me all the processes with an oom_adj of -17:
> >
> > (the below is the older/working kernel)
> > root@courtney.internal [/mnt/encstore/lxc]> uname -a
> > Linux courtney.internal 2.6.32.28-grsec #3 SMP Fri Feb 18 16:09:07 GMT
> > 2011 x86_64 GNU/Linux
> > root@courtney.internal [/mnt/encstore/lxc]> for x in `find /proc
> > -iname 'oom_adj' | xargs grep "\-17" | awk -F '/' '{print $3}'` ; do
> > ps -p $x --no-headers ; done
> > grep: /proc/1411/task/1411/oom_adj: No such file or directory
> > grep: /proc/1411/oom_adj: No such file or directory
> > 804 ? 00:00:00 udevd
> > 804 ? 00:00:00 udevd
> > 25536 ? 00:00:00 sshd
> > 25536 ? 00:00:00 sshd
> > 31861 ? 00:00:00 sshd
> > 31861 ? 00:00:00 sshd
> > 32173 ? 00:00:00 udevd
> > 32173 ? 00:00:00 udevd
> > 32174 ? 00:00:00 udevd
> > 32174 ? 00:00:00 udevd
> >
> > (the below is the newer/broken kernel)
> > root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]> uname -a
> > Linux vicky 2.6.32.41-grsec #3 SMP Mon May 30 02:34:43 BST 2011 x86_64
> > GNU/Linux
> > root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]> for x in
> > `find /proc -iname 'oom_adj' | xargs grep "\-17" | awk -F '/' '{print
> > $3}'` ; do ps -p $x --no-headers ; done
> > grep: /proc/3118/task/3118/oom_adj: No such file or directory
> > grep: /proc/3118/oom_adj: No such file or directory
> > 895 ? 00:00:00 udevd
> > 895 ? 00:00:00 udevd
> > 1091 ? 00:00:00 udevd
> > 1091 ? 00:00:00 udevd
> > 1092 ? 00:00:00 udevd
> > 1092 ? 00:00:00 udevd
> > 2596 ? 00:00:00 sshd
> > 2596 ? 00:00:00 sshd
> > 2608 ? 00:00:00 sshd
> > 2608 ? 00:00:00 sshd
> > 2613 ? 00:00:00 sshd
> > 2613 ? 00:00:00 sshd
> > 2614 pts/0 00:00:00 bash
> > 2614 pts/0 00:00:00 bash
> > 2620 pts/0 00:00:00 sudo
> > 2620 pts/0 00:00:00 sudo
> > 2621 pts/0 00:00:00 su
> > 2621 pts/0 00:00:00 su
> > 2622 pts/0 00:00:00 bash
> > 2622 pts/0 00:00:00 bash
> > 2685 ? 00:00:00 lxc-start
> > 2685 ? 00:00:00 lxc-start
> > 2699 ? 00:00:00 init
> > 2699 ? 00:00:00 init
> > 2939 ? 00:00:00 rc
> > 2939 ? 00:00:00 rc
> > 2942 ? 00:00:00 startpar
> > 2942 ? 00:00:00 startpar
> > 2964 ? 00:00:00 rsyslogd
> > 2964 ? 00:00:00 rsyslogd
> > 2964 ? 00:00:00 rsyslogd
> > 2964 ? 00:00:00 rsyslogd
> > 2980 ? 00:00:00 startpar
> > 2980 ? 00:00:00 startpar
> > 2981 ? 00:00:00 ctlscript.sh
> > 2981 ? 00:00:00 ctlscript.sh
> > 3016 ? 00:00:00 cron
> > 3016 ? 00:00:00 cron
> > 3025 ? 00:00:00 mysqld_safe
> > 3025 ? 00:00:00 mysqld_safe
> > 3032 ? 00:00:00 sshd
> > 3032 ? 00:00:00 sshd
> > 3097 ? 00:00:00 mysqld.bin
> > 3097 ? 00:00:00 mysqld.bin
> > 3097 ? 00:00:00 mysqld.bin
> > 3097 ? 00:00:00 mysqld.bin
> > 3097 ? 00:00:00 mysqld.bin
> > 3097 ? 00:00:00 mysqld.bin
> > 3097 ? 00:00:00 mysqld.bin
> > 3097 ? 00:00:00 mysqld.bin
> > 3097 ? 00:00:00 mysqld.bin
> > 3097 ? 00:00:00 mysqld.bin
> > 3113 ? 00:00:00 ctl.sh
> > 3113 ? 00:00:00 ctl.sh
> > 3115 ? 00:00:00 sleep
> > 3115 ? 00:00:00 sleep
> > 3116 ? 00:00:00 .memcached.bin
> > 3116 ? 00:00:00 .memcached.bin
> >
> >
> > As you can see, it is clear that the newer kernel is setting -17 by
> > default, which in turn is causing the OOM killer loop.
> >
> > So I began to try and find what may have caused this problem by
> > comparing the two sources...
> >
> > I checked the code for all references to 'oom_adj' and 'oom_adjust' in
> > both code sets, but found no obvious differences:
> > grep -R -e oom_adjust -e oom_adj . | sort | grep -R -e oom_adjust -e
> > oom_adj
> >
> > Then I checked for references to "-17" in all .c and .h files, and
> > found a couple of matches, but only one obvious one:
> > grep -R "\-17" . | grep -e ".c:" -e ".h:" -e "\-17" | wc -l
> > ./include/linux/oom.h:#define OOM_DISABLE (-17)
> >
> > But again, a search for OOM_DISABLE came up with nothing obvious...
> >
> > In a last ditch attempt, I did a search for all references to 'oom'
> > (case-insensitive) in both code bases, then compared the two:
> > root@annabelle [~/lol/linux-2.6.32.28]> grep -i -R "oom" . | sort -n
> > > /tmp/annabelle.oom_adj
> > root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]> grep -i -R
> > "oom" . | sort -n> /tmp/vicky.oom_adj
> >
> > and this brought back (yet again) nothing obvious..
> >
> >
> > root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]> md5sum
> > ./include/linux/oom.h
> > 2a32622f6cd38299fc2801d10a9a3ea8 ./include/linux/oom.h
> >
> > root@annabelle [~/lol/linux-2.6.32.28]> md5sum ./include/linux/oom.h
> > 2a32622f6cd38299fc2801d10a9a3ea8 ./include/linux/oom.h
> >
> > root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]> md5sum
> > ./mm/oom_kill.c
> > 1ef2c2bec19868d13ec66ec22033f10a ./mm/oom_kill.c
> >
> > root@annabelle [~/lol/linux-2.6.32.28]> md5sum ./mm/oom_kill.c
> > 1ef2c2bec19868d13ec66ec22033f10a ./mm/oom_kill.c
> >
> >
> >
> > Could anyone please shed some light as to why the default oom_adj is
> > set to -17 now (and where it is actually set)? From what I can tell,
> > the fix for this issue will either be:
> >
> > 1. Allow OOM killer to override the decision of ignoring oom_adj ==
> > -17 if an unrecoverable loop is encountered.
> > 2. Change the default back to 0.
> >
> > Again, my apologies if this bug report is slightly unorthodox, or
> > doesn't follow usual procedure etc. I can assure you I have tried my
> > absolute best to give all the necessary information though.
> >
> > Cal
> >
>

cgroup OOM killer loop causes system to lockup (possible fix inc

Cal Leeming [Simplicity Media Ltd]'s picture

Please note, the way I am detecting if the oom_adj bug is present, is by
checking if the sshd user processes are -17. If they are, then the bug
is present.

ATTEMPT 1:

apt-get install hashalot cryptsetup make g++ cpp automake ncurses-dev
bison flex curl build-essential kernel-package locales locales-all
- oom problem not found

(pre-built from previous install)
dpkg -i
linux-headers-2.6.32.41-grsec_2.6.32.41-grsec-10.00.Custom_amd64.deb
linux-image-2.6.32.41-grsec_2.6.32.41-grsec-10.00.Custom_amd64.deb
- oom problem not found

(booted into new kernel)
shutdown -r now
- oom problem DETECTED!!!

(booted into old kernel)
dpkg --purge linux-image-2.6.32.41-grsec linux-headers-2.6.32.41-grsec
shutdown -r now
- oom problem DETECTED!!! Hmmmm....

apt-get purge hashalot cryptsetup make g++ cpp automake ncurses-dev
bison flex curl build-essential kernel-package locales locales-all
shutdown -r now
- oom problem STILL DETECTED!!! WTF!!!!!!!!!!!!!!!

So, it would appear that either the kernel install or the "apt-get"
caused irreversible changes on the system...

REINSTALL ATTEMPT 2:
(this time, I'm going to use tripwire to detect filesystem changes
between each reboot)

Installed openssh-server and configured tripwire.
Made test modification to /usr/sbin and ran tripwire, to ensure its
logging correctly.
- oom problem not found

rebooted
- oom problem DETECTED!!

Purged openssh-server
- oom problem not found

Reinstalled openssh-server
- oom problem not found

rebooted
- oom problem DETECTED

So, we've now pinpointed it down to the openssh package... This makes
sense, because the oom_adj is inherited from whatever binary the process
forks from.. In

this case, the LXCs are started up from an ssh session, which is why it
inherits the ssh -17 oom_adj.

Here is what tripwire is reporting has changed:

Modified:
"/usr/sbin"
"/usr/sbin/sshd"
"/usr/lib"
"/usr/lib/openssh"
"/usr/lib/openssh/sftp-server"
"/usr/lib/sftp-server"
"/var/run/acpid.pid"
"/var/run/acpid.socket"
"/var/run/atd.pid"
"/var/run/crond.pid"
"/var/run/crond.reboot"
"/var/run/exim4/exim.pid"
"/var/run/mdadm/monitor.pid"
"/var/run/portmap.pid"
"/var/run/portmap_mapping"
"/var/run/rpc.statd.pid"
"/var/run/rsyslogd.pid"
"/var/run/sm-notify.pid"
"/var/log/dmesg.0"
"/etc"
"/etc/adjtime"
"/etc/default"
"/etc/default/ssh"
"/etc/lvm/cache"
"/etc/lvm/cache/.cache"
"/etc/mtab"
"/etc/network/if-up.d"
"/etc/network/if-up.d/openssh-server"
"/etc/network/run"
"/etc/network/run/ifstate"
"/etc/pam.d"
"/etc/pam.d/sshd"
"/etc/passwd-"
"/etc/shadow-"
"/etc/ssh"
"/etc/ssh/ssh_host_dsa_key"
"/etc/ssh/ssh_host_dsa_key.pub"
"/etc/ssh/ssh_host_rsa_key"
"/etc/ssh/ssh_host_rsa_key.pub"
"/etc/ssh/sshd_config"
"/etc/tripwire"
"/etc/ufw"
"/etc/ufw/applications.d"
"/etc/ufw/applications.d/openssh-server"
"/etc/init.d"
"/etc/init.d/.depend.boot"
"/etc/init.d/.depend.start"
"/etc/init.d/.depend.stop"
"/etc/init.d/ssh"
"/etc/rc2.d"
"/etc/rc2.d/S18ssh"
"/etc/rc3.d"
"/etc/rc3.d/S18ssh"
"/etc/rc4.d"
"/etc/rc4.d/S18ssh"
"/etc/rc5.d"
"/etc/rc5.d/S18ssh"
"/etc/passwd"
"/etc/shadow"
"/root"
"/root/.nano_history"

Added:
"/var/log/dmesg.1.gz"
"/var/log/dmesg.2.gz"
"/root/.bash_history"

Removed:
"/etc/nologin"

I thought maybe the reason this is happening is because ssh (when being
started from rc.d) is inheriting the -17 from init.. but it appears init
doesn't have

-17:

root@vicky:~# cat /proc/1/oom_adj
0

Then I thought maybe its because the ssh versions are diff, but they are
not:

root@vicky:/home/foxx# dpkg -l | grep openssh-server
ii openssh-server 1:5.5p1-6
secure shell (SSH) server, for secure access from remote machines

root@courtney.internal [/home/foxx] > dpkg -l | grep openssh-server
ii openssh-server 1:5.5p1-6
secure shell (SSH) server, for secure access from remote machines

foxx@vicky:~$ md5sum /usr/sbin/sshd
f8c11462e8f2a7bf80e212e06041492b /usr/sbin/sshd

root@courtney.internal [/home/foxx] > md5sum /usr/sbin/sshd
f8c11462e8f2a7bf80e212e06041492b /usr/sbin/sshd

Then I made sure that the sshd_config's matched, and that I was using
the same login process for both (shared key only)... they both matched,
yet the problem

still happens..

Then I thought maybe it's inheriting from the sshd server process, but
turns out that isn't it either:

(broken server)
root 1583 0.0 0.0 49168 1140 ? Ss 12:42 0:00
/usr/sbin/sshd
root@vicky:~# cat /proc/1583/oom_adj
-17

(working server)
root 2105 0.0 0.0 49184 1152 ? Ss 00:47 0:00
/usr/sbin/sshd
root@courtney.internal [/home/foxx] > cat /proc/2105/oom_adj
-17

So, I looked through the process tree to see where it was inheriting
from at what stage..

(working server)
root@courtney.internal [/home/foxx] > ps faux | grep sshd
(-17) root 2105 0.0 0.0 49184 1152 ? Ss 00:47 0:00
/usr/sbin/sshd
(0) root 4735 0.0 0.0 76668 3356 ? Ss 12:47 0:00
\_ sshd: foxx [priv]
foxx 4740 0.0 0.0 76668 1644 ? S 12:47
0:00 \_ sshd: foxx@pts/0
root 4757 0.0 0.0 112344 876 pts/0 S+ 12:48
0:00 \_ grep sshd

(broken server)
foxx@vicky:~$ ps faux | grep sshd
(-17) root 1583 0.0 0.0 49168 1140 ? Ss 12:42 0:00
/usr/sbin/sshd
(-17) root 1616 0.0 0.0 70488 3376 ? Ss 12:43 0:00
\_ sshd: root@pts/0
(-17) root 1685 0.2 0.0 70488 3292 ? Ss 12:50 0:00
\_ sshd: foxx [priv]
foxx 1688 0.0 0.0 70488 1576 ? S 12:50
0:00 \_ sshd: foxx@pts/1
foxx 1715 0.0 0.0 7544 840 pts/1 S+ 12:50
0:00 \_ grep sshd

As you can see, the line where it says "sshd: foxx [priv]" is causing
the -17 oom_adj. Accoridng to the documentation, this appears to be
where the privilege

seperation takes place.

So, now I started to check the ssh packages themselves, and make sure
the repos are exactly the same on both servers. At this point, I realise
that the

working server is slightly out of date on the following packages:

root@courtney.internal [/home/foxx] > md5sum /etc/apt/sources.list
00bcf3cf28e2994f9b512f0a8ffb0765 /etc/apt/sources.list

root@vicky:/etc# md5sum /etc/apt/sources.list
00bcf3cf28e2994f9b512f0a8ffb0765 /etc/apt/sources.list

root@courtney.internal [/home/foxx] > apt-get upgrade
The following packages will be upgraded:
bind9-host dnsutils exim4 exim4-base exim4-config exim4-daemon-light
host isc-dhcp-client isc-dhcp-common libbind9-60 libdns69 libisc62
libisccc60
libisccfg62 liblwres60 linux-base linux-image-2.6.32-5-amd64
linux-libc-dev sd-agent

The one that springs immediately to my attention is linux-base.

(working server)
root@courtney.internal [/home/foxx] > dpkg -l | grep linux-base
ii linux-base 2.6.32-31
Linux image base package

(broken server)
root@vicky:/etc# dpkg -l | grep linux-base
ii linux-base 2.6.32-34squeeze1
Linux image base package

Sooooo, I bite the bullet, and perform an upgrade of linux-base on the
working server...
root@courtney.internal [/home/foxx] > apt-get install linux-base
Setting up linux-base (2.6.32-34squeeze1) ...

I then re-run the dryrun upgrade command this to make sure its upgraded:
bind9-host dnsutils exim4 exim4-base exim4-config exim4-daemon-light
host isc-dhcp-client isc-dhcp-common libbind9-60 libdns69 libisc62
libisccc60
libisccfg62 liblwres60 linux-image-2.6.32-5-amd64 linux-libc-dev sd-agent
(as you can see its disappeared from the list)

I then reboot the server.. and wait for the longest 3 minutes of my
life.. But guess what... it didn't break :/ So, I bite another bullet,
and upgrade the

remaining packages on the server:
bind9-host dnsutils exim4 exim4-base exim4-config exim4-daemon-light
host isc-dhcp-client isc-dhcp-common libbind9-60 libdns69 libisc62
libisccc60
libisccfg62 liblwres60 linux-image-2.6.32-5-amd64 linux-libc-dev sd-agent

I then make sure both servers are running the exact same stock kernel
from Debian (as the working server was using an old kernel).

root@vicky:/etc# dpkg -l | grep linux | grep image
ii linux-base 2.6.32-34squeeze1
Linux image base package
ii linux-image-2.6-amd64 2.6.32+29
Linux 2.6 for 64-bit PCs (meta-package)
ii linux-image-2.6.32-5-amd64 2.6.32-34squeeze1
Linux 2.6.32 for 64-bit PCs

root@courtney.internal [/home/foxx] > dpkg -l | grep linux | grep image
ii linux-base 2.6.32-34squeeze1
Linux image base package
ii linux-image-2.6-amd64 2.6.32+29
Linux 2.6 for 64-bit PCs (meta-package)
ii linux-image-2.6.32-5-amd64 2.6.32-34squeeze1
Linux 2.6.32 for 64-bit PCs

root@vicky:/etc# uname -a
Linux vicky 2.6.32-5-amd64 #1 SMP Wed May 18 23:13:22 UTC 2011 x86_64
GNU/Linux

root@courtney.internal [/home/foxx] > uname -a
Linux courtney.internal 2.6.32-5-amd64 #1 SMP Wed May 18 23:13:22 UTC
2011 x86_64 GNU/Linux

After another long 3 minute wait, I test for the oom bug... and guess
what.. it's STILL NOT DOING IT!!! Sad

So now I check for differences in the /etc/pam.d... I notice pam_cap.so
is missing in common-auth on the broken server:

root@courtney.internal [/etc/pam.d] > dpkg -l | grep cap
ii libcap2 1:2.19-3
support for getting/setting POSIX.1e capabilities
ii libcap2-bin 1:2.19-3
basic utility programs for using capabilities

Broken server:
â PAM profiles to
enable:
â

â
â
â [*] Unix
authentication
â

Working server:
â [*] Unix
authentication
â
â [*] Inheritable Capabilities
Management
â

So, I install 'libcap2-bin' on the broken server, reboot.. and still no
god damn luck. At this point /etc/pam.d on both servers are matching
(md5sum matches

up on all files).. So now, I decide to check all files relating to
openssh-server.. again, all matches up fine.

Then I start to get really pissed off, and check the md5sum for all
files in /etc:

(left is working server, right is broken)
root@courtney.internal [/etc/pam.d] > diff /tmp/etcmd5-courtney
/tmp/etcmd5-vicky -y --suppress-common-lines
a81fbd39142e18d5ed1fb8a7b3ecce71 /etc/adjtime |
fa9192c6cdaab85ec952576ab3139fd1 /etc/adjtime
7fcee51274f69cdf5d4c8b7be799637b /etc/apt/trustdb.gpg |
1319acca28ae6475a915ca0684d0cd62 /etc/apt/trustdb.gpg
a3710991fcce0b1574586450c81095e1 /etc/apt/trusted.gpg |
d802712c9255f13bea3bea87b83180b1 /etc/apt/trusted.gpg
366d165a9f5de024d3a21b9dc51de057 /etc/bash.bashrc |
5b3c3bc73d236e4e1b6f9b6c1ed5964e /etc/bash.bashrc
109789e71a8cf8e21302bf44b5b716f7 /etc/blkid.tab |
aa0de4c3c85ae212f6c59b6b89b21b9a /etc/blkid.tab
2de357d9da09d738c179e8d269696b9c /etc/blkid.tab.old |
aa0de4c3c85ae212f6c59b6b89b21b9a /etc/blkid.tab.old
22133b5bd4023d48b50863d2b1c7b25e /etc/console-setup/cached.k |
bdd92d16a8172f9e7ea3f866b59b4fc4 /etc/console-setup/cached.k
b88b0f0a4d3b4beec0cc4b09b6c2aaea /etc/cron.daily/ntp a96425623ae854ba979f7fd8b901bd21 /etc/lvm/cache/.cache
f51730e7056489397b8b2a9a9b99662c /etc/mailname |
1cc22bdbd9f0dd61c3dbdf484f5e1954 /etc/mailname
2b0e1a3f52757145d65c353ca49d7756 /etc/mdadm/mdadm.conf |
75ddc14817703ef468326041acd8bfb1 /etc/mdadm/mdadm.conf
2ed9e1f60da7f3f87faca9879d0a6531 /etc/mtab |
d6eb5404b14026e3cb6377fdbfa36a07 /etc/mtab
0a38058aafd42b7b4105505379194e1b /etc/nanorc |
fc57b93c907fefbccf12317d40b4a204 /etc/nanorc
d8e3272886cfc3ca3e103840f17c28b3 /etc/network/interfaces |
11aed0887d5bd14a426e2a4f7c7d4f4a /etc/network/interfaces
0925a154d88cf0bc2d0f588ab7c670c9 /etc/network/run/ifstate |
40a731c20283789c9916b1978b8b56b8 /etc/network/run/ifstate
1e47bfedf84da1cdfaa45ee016fb3609 /etc/networks |
5293c479ab40a68b4fa7a6eeff755256 /etc/networks
3e250ecaf470e1d3a2b68edd5de46bfd /etc/ntp.conf d87271b624ab8e93c2e51cd59bade631 /etc/tripwire/site.key
> 8f6ebb12f511b46fbf0203d978c3bf01 /etc/tripwire/tw.cfg
> 1821c7a0d207a168f1d7c766f238e816 /etc/tripwire/twcfg.txt
> 717b4afa4f0f8614f3947441a3ddf422 /etc/tripwire/tw.pol
> 92c9b38e95c90eebf1d746633a81909c /etc/tripwire/tw.pol.bak
> d08d31fa833b50d2fb9a37f97b07cbd0 /etc/tripwire/twpol.txt
> fdbfa3e0879f0d959bbdfd5601ef4d4f /etc/tripwire/vicky-local.k
aeb6fe5dcfc873b0632ba92345ed16e2 /etc/udev/rules.d/70-persis |
0fdf03b558e118edcf8ce29abaf296f1 /etc/udev/rules.d/70-persis
24cc33b9f96e3189b7e34cf5484cb99f /etc/udev/rules.d/70-persis |
4a49e7ddeacbb3ded8bb3968f219362c /etc/udev/rules.d/70-persis

I patched up the /etc/init.d/.depend.* files so the 'ssh' entries are
matching.. still. no. luck

It's now 24 hours of trying to fix this issue, and I'm getting extremely
pissed off :@

If anyone could offer up some advice, it would be VERY much appreciated.

On 30/05/2011 01:24, Cal Leeming [Simplicity Media Ltd] wrote:
> Another quick update on this...
>
> Something really really really strange has happened this time.. I just
> did a completely fresh re-install (exactly like last time) and the
> problem went away.
>
> So, I'm going to build up the lxc / kernel support step by step (and
> check for the oom_adj problem after each change) and will report back
> once I've found the step it kicks in at.
>
> Hopefully, my hours of frustration will prevent someone else from
> going through the same thing :/
>
> Cal
>
> On 29/05/2011 23:41, Cal Leeming [Simplicity Media Ltd] wrote:
>> After submitting this problem to Kernel mailing list, I found a
>> strange (and even more curious problem).
>>
>> I attempted to run both kernels on two seperate installs of Debian
>> (both 6.0), but slightly different in packages.. One install did not
>> have the OOM problem, yet the other one did.. And they both ran the
>> same kernel, the same LXC container etc..
>>
>> I have a feeling this may be a Debian bug..? Further to back up these
>> claims:
>>
>> Working server:
>> root@courtney.internal [/mnt/encstore/lxc/github01.rootfs] > for x in
>> `find /proc -iname 'oom_adj' | xargs grep "\-17" | awk -F '/'
>> '{print $3}'` ; do ps -p $x u --no-headers ; done
>> grep: /proc/3090/task/3090/oom_adj: No such file or directory
>> grep: /proc/3090/oom_adj: No such file or directory
>> root 804 0.0 0.0 16992 1112 ? S> udevd --daemon
>> root 804 0.0 0.0 16992 1112 ? S> udevd --daemon
>> root 25536 0.0 0.0 49164 1100 ? Ss Apr05 0:00
>> /usr/sbin/sshd
>> root 25536 0.0 0.0 49164 1100 ? Ss Apr05 0:00
>> /usr/sbin/sshd
>> root 32173 0.0 0.0 16988 960 ? S> udevd --daemon
>> root 32173 0.0 0.0 16988 960 ? S> udevd --daemon
>> root 32174 0.0 0.0 16988 916 ? S> udevd --daemon
>> root 32174 0.0 0.0 16988 916 ? S> udevd --daemon
>>
>> Broken server:
>> root@vicky [/var] > for x in `find /proc -iname 'oom_adj' | xargs
>> grep "\-17" | awk -F '/' '{print $3}'` ; do ps -p $x u --no-headers
>> ; done
>> grep: /proc/4597/task/4597/oom_adj: No such file or directory
>> grep: /proc/4597/oom_adj: No such file or directory
>> root 889 0.0 0.0 16788 860 ? S> udevd --daemon
>> root 889 0.0 0.0 16788 860 ? S> udevd --daemon
>> root 1104 0.0 0.0 17228 1200 ? S> udevd --daemon
>> root 1104 0.0 0.0 17228 1200 ? S> udevd --daemon
>> root 1141 0.0 0.0 17120 1128 ? S> udevd --daemon
>> root 1141 0.0 0.0 17120 1128 ? S> udevd --daemon
>> root 2745 0.0 0.0 49168 1168 ? Ss 09:10 0:00
>> /usr/sbin/sshd
>> root 2745 0.0 0.0 49168 1168 ? Ss 09:10 0:00
>> /usr/sbin/sshd
>> root 2750 0.0 0.0 76672 3360 ? Ss 09:11 0:00
>> sshd: foxx [priv]
>> root 2750 0.0 0.0 76672 3360 ? Ss 09:11 0:00
>> sshd: foxx [priv]
>> foxx 2754 0.0 0.0 76672 1636 ? S 09:11 0:00
>> sshd: foxx@pts/0
>> foxx 2754 0.0 0.0 76672 1636 ? S 09:11 0:00
>> sshd: foxx@pts/0
>> foxx 2755 0.0 0.0 123984 1984 pts/0 Ss 09:11 0:00 -bash
>> foxx 2755 0.0 0.0 123984 1984 pts/0 Ss 09:11 0:00 -bash
>> root 2760 0.0 0.0 127176 1244 pts/0 S 09:11 0:00 sudo su
>> root 2760 0.0 0.0 127176 1244 pts/0 S 09:11 0:00 sudo su
>> root 2761 0.0 0.0 147820 1328 pts/0 S 09:11 0:00 su
>> root 2761 0.0 0.0 147820 1328 pts/0 S 09:11 0:00 su
>> root 2762 0.0 0.0 124084 2152 pts/0 S 09:11 0:00 bash
>> root 2762 0.0 0.0 124084 2152 pts/0 S 09:11 0:00 bash
>>
>> Notice how the sshd user instances on the broken server are set to
>> -17, yet the sshd user instances on the working server, are not :S
>>
>> Any input from anyone on the below would be VERY much appreciated.
>>
>> Cal
>>
>> -------- Original Message --------
>> Subject: Re: Fwd: cgroup OOM killer loop causes system to lockup
>> (possible fix included)
>> Date: Sun, 29 May 2011 23:24:07 +0100
>> From: Cal Leeming [Simplicity Media Ltd]
>>
>> Organization: Simplicity Media Ltd
>> To: linux-kernel@vger.kernel.org, linux-rt-users@vger.kernel.org
>>
>>
>>
>> Some further logs:
>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.369927] redis-server
>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.369939]
>> [] ? oom_kill_process+0x82/0x283
>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.399285] redis-server
>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.399296]
>> [] ? oom_kill_process+0x82/0x283
>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.428690] redis-server
>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.428702]
>> [] ? oom_kill_process+0x82/0x283
>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.487696] redis-server
>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.487708]
>> [] ? oom_kill_process+0x82/0x283
>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.517023] redis-server
>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.517035]
>> [] ? oom_kill_process+0x82/0x283
>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.546379] redis-server
>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>> ./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.546391]
>> [] ? oom_kill_process+0x82/0x283
>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.310789] redis-server
>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.310804]
>> [] ? oom_kill_process+0x82/0x283
>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.369918] redis-server
>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.369930]
>> [] ? oom_kill_process+0x82/0x283
>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.399284] redis-server
>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.399296]
>> [] ? oom_kill_process+0x82/0x283
>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.433634] redis-server
>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.433648]
>> [] ? oom_kill_process+0x82/0x283
>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.463947] redis-server
>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.463959]
>> [] ? oom_kill_process+0x82/0x283
>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.493439] redis-server
>> invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
>> ./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.493451]
>> [] ? oom_kill_process+0x82/0x283
>>
>>
>> On 29/05/2011 22:50, Cal Leeming [Simplicity Media Ltd] wrote:
>> > First of all, my apologies if I have submitted this problem to the
>> > wrong place, spent 20 minutes trying to figure out where it needs to
>> > be sent, and was still none the wiser.
>> >
>> > The problem is related to applying memory limitations within a cgroup.
>> > If the OOM killer kicks in, it gets stuck in a loop where it tries to
>> > kill a process which has an oom_adj of -17. This causes an infinite
>> > loop, which in turn locks up the system.
>> >
>> > May 30 03:13:08 vicky kernel: [ 1578.117055] Memory cgroup out of
>> > memory: kill process 6016 (java) score 0 or a child
>> > May 30 03:13:08 vicky kernel: [ 1578.117154] Memory cgroup out of
>> > memory: kill process 6016 (java) score 0 or a child
>> > May 30 03:13:08 vicky kernel: [ 1578.117248] Memory cgroup out of
>> > memory: kill process 6016 (java) score 0 or a child
>> > May 30 03:13:08 vicky kernel: [ 1578.117343] Memory cgroup out of
>> > memory: kill process 6016 (java) score 0 or a child
>> > May 30 03:13:08 vicky kernel: [ 1578.117441] Memory cgroup out of
>> > memory: kill process 6016 (java) score 0 or a child
>> >
>> >
>> > root@vicky [/home/foxx]> uname -a
>> > Linux vicky 2.6.32.41-grsec #3 SMP Mon May 30 02:34:43 BST 2011 x86_64
>> > GNU/Linux
>> > (this happens on both the grsec patched and non patched 2.6.32.41 kernel)
>> >
>> > When this is encountered, the memory usage across the whole server is
>> > still within limits (not even hitting swap).
>> >
>> > The memory configuration for the cgroup/lxc is:
>> > lxc.cgroup.memory.limit_in_bytes = 3000M
>> > lxc.cgroup.memory.memsw.limit_in_bytes = 3128M
>> >
>> > Now, what is even more strange, is that when running under the
>> > 2.6.32.28 kernel (both patched and unpatched), this problem doesn't
>> > happen. However, there is a slight difference between the two kernels.
>> > The 2.6.32.28 kernel gives a default of 0 in the /proc/X/oom_adj,
>> > where as the 2.6.32.41 gives a default of -17. I suspect this is the
>> > root cause of why it's showing in the later kernel, but not the earlier.
>> >
>> > To test this theory, I started up the lxc on both servers, and then
>> > ran a one liner which showed me all the processes with an oom_adj of -17:
>> >
>> > (the below is the older/working kernel)
>> > root@courtney.internal [/mnt/encstore/lxc]> uname -a
>> > Linux courtney.internal 2.6.32.28-grsec #3 SMP Fri Feb 18 16:09:07 GMT
>> > 2011 x86_64 GNU/Linux
>> > root@courtney.internal [/mnt/encstore/lxc]> for x in `find /proc
>> > -iname 'oom_adj' | xargs grep "\-17" | awk -F '/' '{print $3}'` ; do
>> > ps -p $x --no-headers ; done
>> > grep: /proc/1411/task/1411/oom_adj: No such file or directory
>> > grep: /proc/1411/oom_adj: No such file or directory
>> > 804 ? 00:00:00 udevd
>> > 804 ? 00:00:00 udevd
>> > 25536 ? 00:00:00 sshd
>> > 25536 ? 00:00:00 sshd
>> > 31861 ? 00:00:00 sshd
>> > 31861 ? 00:00:00 sshd
>> > 32173 ? 00:00:00 udevd
>> > 32173 ? 00:00:00 udevd
>> > 32174 ? 00:00:00 udevd
>> > 32174 ? 00:00:00 udevd
>> >
>> > (the below is the newer/broken kernel)
>> > root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]> uname -a
>> > Linux vicky 2.6.32.41-grsec #3 SMP Mon May 30 02:34:43 BST 2011 x86_64
>> > GNU/Linux
>> > root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]> for x in
>> > `find /proc -iname 'oom_adj' | xargs grep "\-17" | awk -F '/' '{print
>> > $3}'` ; do ps -p $x --no-headers ; done
>> > grep: /proc/3118/task/3118/oom_adj: No such file or directory
>> > grep: /proc/3118/oom_adj: No such file or directory
>> > 895 ? 00:00:00 udevd
>> > 895 ? 00:00:00 udevd
>> > 1091 ? 00:00:00 udevd
>> > 1091 ? 00:00:00 udevd
>> > 1092 ? 00:00:00 udevd
>> > 1092 ? 00:00:00 udevd
>> > 2596 ? 00:00:00 sshd
>> > 2596 ? 00:00:00 sshd
>> > 2608 ? 00:00:00 sshd
>> > 2608 ? 00:00:00 sshd
>> > 2613 ? 00:00:00 sshd
>> > 2613 ? 00:00:00 sshd
>> > 2614 pts/0 00:00:00 bash
>> > 2614 pts/0 00:00:00 bash
>> > 2620 pts/0 00:00:00 sudo
>> > 2620 pts/0 00:00:00 sudo
>> > 2621 pts/0 00:00:00 su
>> > 2621 pts/0 00:00:00 su
>> > 2622 pts/0 00:00:00 bash
>> > 2622 pts/0 00:00:00 bash
>> > 2685 ? 00:00:00 lxc-start
>> > 2685 ? 00:00:00 lxc-start
>> > 2699 ? 00:00:00 init
>> > 2699 ? 00:00:00 init
>> > 2939 ? 00:00:00 rc
>> > 2939 ? 00:00:00 rc
>> > 2942 ? 00:00:00 startpar
>> > 2942 ? 00:00:00 startpar
>> > 2964 ? 00:00:00 rsyslogd
>> > 2964 ? 00:00:00 rsyslogd
>> > 2964 ? 00:00:00 rsyslogd
>> > 2964 ? 00:00:00 rsyslogd
>> > 2980 ? 00:00:00 startpar
>> > 2980 ? 00:00:00 startpar
>> > 2981 ? 00:00:00 ctlscript.sh
>> > 2981 ? 00:00:00 ctlscript.sh
>> > 3016 ? 00:00:00 cron
>> > 3016 ? 00:00:00 cron
>> > 3025 ? 00:00:00 mysqld_safe
>> > 3025 ? 00:00:00 mysqld_safe
>> > 3032 ? 00:00:00 sshd
>> > 3032 ? 00:00:00 sshd
>> > 3097 ? 00:00:00 mysqld.bin
>> > 3097 ? 00:00:00 mysqld.bin
>> > 3097 ? 00:00:00 mysqld.bin
>> > 3097 ? 00:00:00 mysqld.bin
>> > 3097 ? 00:00:00 mysqld.bin
>> > 3097 ? 00:00:00 mysqld.bin
>> > 3097 ? 00:00:00 mysqld.bin
>> > 3097 ? 00:00:00 mysqld.bin
>> > 3097 ? 00:00:00 mysqld.bin
>> > 3097 ? 00:00:00 mysqld.bin
>> > 3113 ? 00:00:00 ctl.sh
>> > 3113 ? 00:00:00 ctl.sh
>> > 3115 ? 00:00:00 sleep
>> > 3115 ? 00:00:00 sleep
>> > 3116 ? 00:00:00 .memcached.bin
>> > 3116 ? 00:00:00 .memcached.bin
>> >
>> >
>> > As you can see, it is clear that the newer kernel is setting -17 by
>> > default, which in turn is causing the OOM killer loop.
>> >
>> > So I began to try and find what may have caused this problem by
>> > comparing the two sources...
>> >
>> > I checked the code for all references to 'oom_adj' and 'oom_adjust' in
>> > both code sets, but found no obvious differences:
>> > grep -R -e oom_adjust -e oom_adj . | sort | grep -R -e oom_adjust -e
>> > oom_adj
>> >
>> > Then I checked for references to "-17" in all .c and .h files, and
>> > found a couple of matches, but only one obvious one:
>> > grep -R "\-17" . | grep -e ".c:" -e ".h:" -e "\-17" | wc -l
>> > ./include/linux/oom.h:#define OOM_DISABLE (-17)
>> >
>> > But again, a search for OOM_DISABLE came up with nothing obvious...
>> >
>> > In a last ditch attempt, I did a search for all references to 'oom'
>> > (case-insensitive) in both code bases, then compared the two:
>> > root@annabelle [~/lol/linux-2.6.32.28]> grep -i -R "oom" . | sort -n
>> > > /tmp/annabelle.oom_adj
>> > root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]> grep -i -R
>> > "oom" . | sort -n> /tmp/vicky.oom_adj
>> >
>> > and this brought back (yet again) nothing obvious..
>> >
>> >
>> > root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]> md5sum
>> > ./include/linux/oom.h
>> > 2a32622f6cd38299fc2801d10a9a3ea8 ./include/linux/oom.h
>> >
>> > root@annabelle [~/lol/linux-2.6.32.28]> md5sum ./include/linux/oom.h
>> > 2a32622f6cd38299fc2801d10a9a3ea8 ./include/linux/oom.h
>> >
>> > root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41]> md5sum
>> > ./mm/oom_kill.c
>> > 1ef2c2bec19868d13ec66ec22033f10a ./mm/oom_kill.c
>> >
>> > root@annabelle [~/lol/linux-2.6.32.28]> md5sum ./mm/oom_kill.c
>> > 1ef2c2bec19868d13ec66ec22033f10a ./mm/oom_kill.c
>> >
>> >
>> >
>> > Could anyone please shed some light as to why the default oom_adj is
>> > set to -17 now (and where it is actually set)? From what I can tell,
>> > the fix for this issue will either be:
>> >
>> > 1. Allow OOM killer to override the decision of ignoring oom_adj ==
>> > -17 if an unrecoverable loop is encountered.
>> > 2. Change the default back to 0.
>> >
>> > Again, my apologies if this bug report is slightly unorthodox, or
>> > doesn't follow usual procedure etc. I can assure you I have tried my
>> > absolute best to give all the necessary information though.
>> >
>> > Cal
>> >
>>
>

cgroup OOM killer loop causes system to lockup (possible fix inc

Ben Hutchings's picture

On Mon, 2011-05-30 at 21:03 +0100, Cal Leeming [Simplicity Media Ltd]
wrote:
> More strangeness..
>
> If I keep the kernel module loaded, but disable the entry
> in /etc/network/interfaces for eth0, the oom_adj problem disappears.
> But then ofc, I'm left with no network interface. I then tried
> binding /etc/ssh/sshd_config to only listen on 127.0.0.1.. effectively
> bypassing the eth0 interface, whilst still allowing it to be loaded.
> But the problem still happens.
[...]

My guess is that sshd tries to protect itself against the OOM-killer so
that you can still log in to a system that has gone OOM. If there is no
network available, it doesn't do this because you cannot log in remotely
anyway.

The bug seems to be that sshd does not reset the OOM adjustment before
running the login shell (or other program). Therefore, please report a
bug against openssh-server.

Ben.

cgroup OOM killer loop causes system to lockup (possible fix inc

Cal Leeming [Simplicity Media Ltd]'s picture

Thanks for the response. I have sent this across to the guys at
openssh-server.

Although, I did check the openssh source code myself, and from what I
could tell, everything was being done correctly.

I have a feeling there gonna be a lot of 'buck passing' on this one Sad

Cal

On 30/05/2011 21:25, Ben Hutchings wrote:
> On Mon, 2011-05-30 at 21:03 +0100, Cal Leeming [Simplicity Media Ltd]
> wrote:
>> More strangeness..
>>
>> If I keep the kernel module loaded, but disable the entry
>> in /etc/network/interfaces for eth0, the oom_adj problem disappears.
>> But then ofc, I'm left with no network interface. I then tried
>> binding /etc/ssh/sshd_config to only listen on 127.0.0.1.. effectively
>> bypassing the eth0 interface, whilst still allowing it to be loaded.
>> But the problem still happens.
> [...]
>
> My guess is that sshd tries to protect itself against the OOM-killer so
> that you can still log in to a system that has gone OOM. If there is no
> network available, it doesn't do this because you cannot log in remotely
> anyway.
>
> The bug seems to be that sshd does not reset the OOM adjustment before
> running the login shell (or other program). Therefore, please report a
> bug against openssh-server.
>
> Ben.
>

cgroup OOM killer loop causes system to lockup (possible fix inc

Cal Leeming [Simplicity Media Ltd]'s picture

FYI everyone, I found a bug within openssh-server which caused this problem.

I've patched and submitted to the openssh list.

You can find details of this by googling for:
"port-linux.c bug with oom_adjust_restore() - causes real bad oom_adj -
which can cause DoS conditions"

It's extremely strange.. :S

Cal

On 30/05/2011 21:31, Cal Leeming [Simplicity Media Ltd] wrote:
> Thanks for the response. I have sent this across to the guys at
> openssh-server.
>
> Although, I did check the openssh source code myself, and from what I
> could tell, everything was being done correctly.
>
> I have a feeling there gonna be a lot of 'buck passing' on this one Sad
>
> Cal
>
> On 30/05/2011 21:25, Ben Hutchings wrote:
>> On Mon, 2011-05-30 at 21:03 +0100, Cal Leeming [Simplicity Media Ltd]
>> wrote:
>>> More strangeness..
>>>
>>> If I keep the kernel module loaded, but disable the entry
>>> in /etc/network/interfaces for eth0, the oom_adj problem disappears.
>>> But then ofc, I'm left with no network interface. I then tried
>>> binding /etc/ssh/sshd_config to only listen on 127.0.0.1.. effectively
>>> bypassing the eth0 interface, whilst still allowing it to be loaded.
>>> But the problem still happens.
>> [...]
>>
>> My guess is that sshd tries to protect itself against the OOM-killer so
>> that you can still log in to a system that has gone OOM. If there is no
>> network available, it doesn't do this because you cannot log in remotely
>> anyway.
>>
>> The bug seems to be that sshd does not reset the OOM adjustment before
>> running the login shell (or other program). Therefore, please report a
>> bug against openssh-server.
>>
>> Ben.
>>
>