Wednesday, July 20, 2016

Where did my memory go on linux (no cache/slab/shm/ipcs)




This is a headless server with 8GB RAM (kernel 3.12)... even after only a few days, i get low on memory. in fact, this server has OOMed a few days ago... something is losing memory, but i don't know where...



see the output below:



in short:




  • 64bit system & OS

  • not a hypervisor nor a virtual machine


  • low free mem

  • swap in use

  • low cache

  • low buffer

  • inactive+active == 1GB ???

  • low ipcs

  • low shm

  • low slab

  • ~500MB tmpfs usage

  • in fact total RSS of all processes is 262MB


  • and HWM of all processes is less than 600MB

  • i lost more than 6GB somewhere...?




[root@localhost ~]# cat /proc/meminfo
MemTotal: 8186440 kB
MemFree: 251188 kB
Buffers: 144 kB
Cached: 853548 kB

SwapCached: 9988 kB
Active: 480036 kB
Inactive: 529456 kB
Active(anon): 256196 kB
Inactive(anon): 333072 kB
Active(file): 223840 kB
Inactive(file): 196384 kB
Unevictable: 13656 kB
Mlocked: 0 kB
SwapTotal: 4194300 kB

SwapFree: 4092540 kB
Dirty: 356 kB
Writeback: 0 kB
AnonPages: 161576 kB
Mapped: 50116 kB
Shmem: 419812 kB
Slab: 72680 kB
SReclaimable: 50648 kB
SUnreclaim: 22032 kB
KernelStack: 1824 kB

PageTables: 10260 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 8287520 kB
Committed_AS: 1883404 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 91804 kB
VmallocChunk: 34359637332 kB
HardwareCorrupted: 0 kB

AnonHugePages: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 83180 kB
DirectMap2M: 8296448 kB

[root@localhost ~]# ipcs -m


------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x01123bac 0 root 600 1000 8

[root@localhost ~]# df -h
Filesystem Size Used Avail Use% Mounted on
tmpfs 4.0G 393M 3.6G 10% /run

[root@localhost ~]# for i in /proc/*/status ; do grep VmRSS $i; done | awk '{ s = s + $2 } END { print s / 1024 }'

262.375

[root@localhost ~]# for i in /proc/*/status ; do grep VmHWM $i; done | awk '{ s = s + $2 } END { print s / 1024 }'
526.77



Edit: i've set overcommit=2 (disabled) just in case (i rebooted 2 days ago)





[root@localhost linux]# cat /proc/sys/vm/overcommit_memory
2
[root@localhost linux]# df -h | grep tmpfs
devtmpfs 3.9G 0 3.9G 0% /dev
tmpfs 4.0G 0 4.0G 0% /dev/shm
tmpfs 4.0G 532K 4.0G 1% /run
tmpfs 4.0G 0 4.0G 0% /sys/fs/cgroup
tmpfs 4.0G 0 4.0G 0% /tmp
tmpfs 4.0G 532K 4.0G 1% /var/spool/postfix/run/saslauthd
[root@localhost linux]# for i in /proc/*/status ; do grep VmRSS $i; done | awk '{ s = s + $2 } END { print s / 1024 }'

434.188
[root@localhost linux]# for i in /proc/*/status ; do grep VmHWM $i; done | awk '{ s = s + $2 } END { print s / 1024 }'
545.551
[root@localhost linux]# cat /proc/meminfo
MemTotal: 8186440 kB
MemFree: 146576 kB
Buffers: 1728 kB
Cached: 5212588 kB
SwapCached: 0 kB
Active: 2560112 kB

Inactive: 2874464 kB
Active(anon): 94464 kB
Inactive(anon): 136528 kB
Active(file): 2465648 kB
Inactive(file): 2737936 kB
Unevictable: 9772 kB
Mlocked: 0 kB
SwapTotal: 4194300 kB
SwapFree: 4194300 kB
Dirty: 1436 kB

Writeback: 0 kB
AnonPages: 230032 kB
Mapped: 50540 kB
Shmem: 960 kB
Slab: 316804 kB
SReclaimable: 291712 kB
SUnreclaim: 25092 kB
KernelStack: 1880 kB
PageTables: 11184 kB
NFS_Unstable: 0 kB

Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 8287520 kB
Committed_AS: 1160812 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 91676 kB
VmallocChunk: 34359582672 kB
HardwareCorrupted: 0 kB
AnonHugePages: 0 kB
HugePages_Total: 0

HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 91372 kB
DirectMap2M: 8288256 kB


so, i'm using 8GB:





  • 5GB is cached

  • 0.5MB tmpfs

  • 450MB RSS

  • ~1GB slab+pages+whatever (in meminfo)



i'm still short 1.5GB ... is this a kernel leak? or what is going on here???



Edit2: i have the same issue on another atom board




I also checked if kmemleak saw something, but nothing... i'm out of ideas...



Edit3: updating to kernel 3.17.2 seems to have resolved this issue, but i still don't know how to trace these memory leaks...


Answer



lkml thinks that it might have been https://lkml.org/lkml/2014/10/15/447 , but that patch wasn't in 3.17.2 and the thp allocation don't point that way



however, /proc kpageflags might show what part allocated what pages, so that might help. in tools/vm/page-types.c in kernel sources, that might hold some info on the structure of the kpageflags binary output.


No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...