High I/O as possible source of Internal Server Error
We have Redmine running through Apache's mod_fcgid. I was seeing ruby processes using up to 100% of CPU and then getting errors in /var/log/apache2/error.log like "mod_fcgid: read data timeout in 40 seconds" and tracing that in /var/log/redmine/default/production.log to "ActionView::TemplateError (exit) on line #4 of app/views/issues/_action_menu.html.erb" which is only a safe-looking template line, "watcher_tag(@issue, User.current)". Moreover, the Internal Server Error didn't occur every time.
Which led me to look further to see what else was at play...
High I/O as source of redmine's problems?
{syntaxhighlighter class="brush:bash;gutter:false"}
$ top
%Cpu(s): 0.0 us, 0.2 sy, 0.0 ni, 0.0 id, 99.8 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 3094744 total, 2776204 used, 318540 free, 302932 buffers
KiB Swap: 0 total, 0 used, 0 free, 894488 cached
{/syntaxhighlighter}
{syntaxhighlighter class="brush:bash;gutter:false"}
$ ps -eo state,pid,cmd | grep "^D"
D 1168 [flush-253:2]
D 1177 [kjournald]
D 8185 aptitude
D 27089 /usr/bin/updatedb.mlocate
{/syntaxhighlighter}
{syntaxhighlighter class="brush:bash;gutter:false"}
$ sudo cat /proc/1168/io
rchar: 0
wchar: 0
syscr: 0
syscw: 0
read_bytes: 188305408
write_bytes: 1246294016
cancelled_write_bytes: 0
{/syntaxhighlighter}
{syntaxhighlighter class="brush:bash;gutter:false"}
$ sudo cat /proc/1177/io
rchar: 0
wchar: 0
syscr: 0
syscw: 0
read_bytes: 15683584
write_bytes: 140864806912
cancelled_write_bytes: 0
{/syntaxhighlighter}
{syntaxhighlighter class="brush:bash;gutter:false"}
$ sudo cat /proc/8185/io
rchar: 153394471
wchar: 24021194
syscr: 7325
syscw: 3278
read_bytes: 27171840
write_bytes: 73240576
cancelled_write_bytes: 327680
{/syntaxhighlighter}
{syntaxhighlighter class="brush:bash;gutter:false"}
$ sudo cat /proc/27089/io
rchar: 761211
wchar: 741376
syscr: 127
syscw: 181
read_bytes: 6540288
write_bytes: 741376
cancelled_write_bytes: 0
{/syntaxhighlighter}
Commands above via http://bencane.com/2012/08/06/troubleshooting-high-io-wait-in-linux/
{syntaxhighlighter class="brush:bash;gutter:false"}
$ vmstat 2
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 7 0 292668 311260 921952 0 0 20 83 1 0 20 7 71 2
0 6 0 292660 311260 921956 0 0 0 1582 76 145 0 0 0 99
0 6 0 292536 311260 921956 0 0 0 234 71 137 0 0 0 100
0 6 0 293032 311260 921952 0 0 0 298 84 181 1 0 0 99
0 6 0 292692 311260 921956 0 0 0 66 70 138 0 0 0 100
0 4 0 293196 311260 921952 0 0 0 976 84 145 0 0 0 99
0 4 0 292676 311260 921956 0 0 0 210 95 191 0 0 0 100
0 4 0 292676 311260 921956 0 0 0 68 86 169 0 0 0 100
1 5 0 277872 311260 921956 0 0 0 902 145 223 7 1 7 86
0 7 0 261620 311260 921976 0 0 0 88 176 242 11 2 0 87
0 5 0 261912 311264 921976 0 0 2 317 474 852 11 5 0 83
0 7 0 265896 311268 921972 0 0 0 83 471 577 25 6 0 69
0 4 0 275716 311272 921964 0 0 0 587 174 322 5 1 0 94
0 5 0 275460 311280 921964 0 0 0 294 99 194 0 0 0 99
0 6 0 275460 311280 921964 0 0 0 73 84 162 0 0 0 100
0 6 0 276352 311284 921956 0 0 0 677 105 219 0 0 0 100
0 6 0 275740 311284 921964 0 0 0 74 98 181 1 0 0 99
{/syntaxhighlighter}
Command via http://www.chileoffshore.com/en/interesting-articles/126-linux-wait-io-problem
They D processes would not stop. Staying put for hours.
{syntaxhighlighter class="brush:bash;gutter:false"}
$ ps -eo state,pid,cmd | grep "^D"
D 1168 [flush-253:2]
D 1177 [kjournald]
D 27089 /usr/bin/updatedb.mlocate
{/syntaxhighlighter}
You cannot kill -9 "uninterruptible sleep" processes.
Shut down the whole server?
{syntaxhighlighter class="brush:bash;gutter:false"}
sudo shutdown -r now
{/syntaxhighlighter}
{syntaxhighlighter class="brush:bash;gutter:false"}
$ ps -eo state,pid,cmd | grep "^D"
D 1168 [flush-253:2]
D 1177 [kjournald]
D 17487 shutdown -r 0 w
{/syntaxhighlighter}
An hour later or so later, the reboot went through and ps -eo | grep "^D" comes up clean.
Comments
Post new comment