Im having a problem with a stalling Linux system and I have found sysstat/sar to report huge peaks in disk I/O utilization, average service time as well as average wait time at the time of the system stall.
How could I go about to determine which process is causing these peaks the next time it happen?
Is it possible to do with sar (ie: can I find this info from the alreade recorded sar files?
Output for "sar -d", system stall happened around 12.58-13.01pm.
12:40:01 DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util
12:40:01 dev8-0 11.57 0.11 710.08 61.36 0.01 0.97 0.37 0.43
12:45:01 dev8-0 13.36 0.00 972.93 72.82 0.01 1.00 0.32 0.43
12:50:01 dev8-0 13.55 0.03 616.56 45.49 0.01 0.70 0.35 0.47
12:55:01 dev8-0 13.99 0.08 917.00 65.55 0.01 0.86 0.37 0.52
13:01:02 dev8-0 6.28 0.00 400.53 63.81 0.89 141.87 141.12 88.59
13:05:01 dev8-0 22.75 0.03 932.13 40.97 0.01 0.65 0.27 0.62
13:10:01 dev8-0 13.11 0.00 634.55 48.42 0.01 0.71 0.38 0.50
This is a follow-up question to a thread I started yesterday: Sudden peaks in load and disk block wait, I hope its ok that I created a new topic/question on the matter since I have not been able to resolve the problem yet.
Answer
If you are lucky enough to catch the next peak utilization period, you can study per-process I/O stats interactively, using iotop.
No comments:
Post a Comment