Problem


IQ sometimes gets aborted silently logs on Linux.
In this case there are no stacktraces and error messages.

Cause


An OOM Killer (Out of memory Killer) kills a process when the no available free memory on system.
If IQ process is killed by OOM Killer, IQ will crash without any error and stacktrace.

We can see the “out of memory” within message log.

[message log]

Nov 12 12:48:00 BDDBBETLMV01 kernel: iqsrv16 invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
Nov 12 12:48:00 BDDBBETLMV01 kernel: iqsrv16 cpuset=/ mems_allowed=0
Nov 12 12:48:00 BDDBBETLMV01 kernel: Pid: 29293, comm: iqsrv16 Not tainted 2.6.32-279.el6.x86_64 #1
Nov 12 12:48:00 BDDBBETLMV01 kernel: Call Trace:
Nov 12 12:48:00 BDDBBETLMV01 kernel: [<ffffffff810c4971>] ? cpuset_print_task_mems_allowed+0x91/0xb0
Nov 12 12:48:00 BDDBBETLMV01 kernel: [<ffffffff811170e0>] ? dump_header+0x90/0x1b0
Nov 12 12:48:00 BDDBBETLMV01 kernel: [<ffffffff812146fc>] ? security_real_capable_noaudit+0x3c/0x70
Nov 12 12:48:00 BDDBBETLMV01 kernel: [<ffffffff81117562>] ? oom_kill_process+0x82/0x2a0
Nov 12 12:48:00 BDDBBETLMV01 kernel: [<ffffffff8111745e>] ? select_bad_process+0x9e/0x120
Nov 12 12:48:00 BDDBBETLMV01 kernel: [<ffffffff811179a0>] ? out_of_memory+0x220/0x3c0


Nov 12 12:48:00 BDDBBETLMV01 kernel: Out of memory: Kill process 29081 (iqsrv16) score 950 or sacrifice child
Nov 12 12:48:00 BDDBBETLMV01 kernel: Killed process 29081, UID 501, (iqsrv16) total-vm:24941028kB, anon-rss:3586296kB, file-rss:1332kB

Resolution


It’s not an IQ issue, It’s resource issue.
There are a couple of solutions on it.

1) Increase the physical memory or decrease the value of iqmc/iqtc/-ch/-cl

2) Decrease the number of concurrent jobs


HTH


Regards

Gi-Sung Jang

To report this post you need to login first.

Be the first to leave a comment

You must be Logged on to comment or reply to a post.

Leave a Reply