¿How can I debug this problem?
(I've got full tcpdump captures)
I have a TCP server into which many clients establish persistent connections. Normally all these clients behave, and I never reach the 1024 default Linux limit connections (open files) per process.
Yesterday someone (or something) started misbehaving and leaving a lot of open connections, forcing me to restart the server. You can see its behavior on the following munin netstat graph:
Every time connections reach 1000, I restart the server. Only the fourth time the misbehavior stopped as mysteriously as it started, without any apparent reason. Something similar happened one week ago.
All the bad connections come from the same (sub)network: I can isolate them, but there are some valid connections that come from the same network too (so I can't deny connections from that network).
So far I've used tcpdump, ethereal and ngrep, but I haven't found a way to look at connections that are established, but that don't transfer data.
- How should I look the tcpdump (pcap) captures to isolate the misbehaving connections and study them?
- What would you suggest to stop this happening?
Thanks!
Answer
In Wireshark, go to Statistics->Conversations->TCP. Try eyeballing the list to see if anything looks odd, e.g. a host with an abnormally large number of connections, low bytes transferred, or a low transfer rate. If you really need to you can copy the data to a spreadsheet. (You can do something similar on the server side using netstat, e.g. on Linux you could run netstat -nt | sort -n -t . -k5,5 -k6,6 -k7,7 -k8,8
to list connections sorted by client IP address).
If the problem is limited to one or two clients, you can look at their traffic to try to narrow the problem down further.
(And if you really are using Ethereal, you should upgrade to Wireshark immediately. Disclosure: I'm the lead developer.)
No comments:
Post a Comment