linux - Irresponsive nginx while doing “nginx reload”

Saturday, February 25, 2017

linux - Irresponsive nginx while doing “nginx reload”

While reloading nginx, I started getting errors in messages log "possible SYN flooding on port 443", and it seems like nginx becomes completely irresponsive at that time (quite for a while), cause zabbix reports "nginx is down" with ping 0s. RPS at that time is about 1800.

But, server stays responsive on the other non-web ports (SSH, etc.)

Where should I look into and what configs (sysctl, nginx) should I show to find the root cause of this.

Thanks in advance.

UPD:

Some additional info:

$ netstat -tpn |awk '/nginx/{print $6,$7}' |sort |uniq -c
   3266 ESTABLISHED 31253/nginx
   3289 ESTABLISHED 31254/nginx
   3265 ESTABLISHED 31255/nginx
   3186 ESTABLISHED 31256/nginx

nginx.conf sample:

worker_processes  4;
timer_resolution 100ms;
worker_priority -15;
worker_rlimit_nofile 200000;

events {

  worker_connections  65536;
  multi_accept on;
  use epoll;
}

http {

  sendfile on;
  tcp_nopush on;
  tcp_nodelay on;


  keepalive_requests 100;
  keepalive_timeout  65;

}

custom sysctl.conf

net.ipv4.ip_local_port_range=1024 65535

net.ipv4.conf.all.accept_redirects=0
net.ipv4.conf.all.secure_redirects=0
net.ipv4.conf.all.send_redirects=0
net.core.netdev_max_backlog=10000
net.ipv4.tcp_syncookies=0
net.ipv4.tcp_max_syn_backlog=20480
net.ipv4.tcp_synack_retries=2
net.ipv4.tcp_syn_retries=2
net.ipv4.tcp_rmem=4096 87380 16777216
net.ipv4.tcp_wmem=4096 65536 16777216

net.core.rmem_max=16777216
net.core.wmem_max=16777216
net.netfilter.nf_conntrack_max=1048576
net.ipv4.tcp_congestion_control=htcp
net.ipv4.tcp_timestamps=1
net.ipv4.tcp_no_metrics_save=1
net.ipv4.tcp_tw_reuse=1
net.ipv4.tcp_tw_recycle=0
net.ipv4.tcp_max_tw_buckets=1400000
net.core.somaxconn=250000

net.ipv4.tcp_keepalive_time=900
net.ipv4.tcp_keepalive_intvl=15
net.ipv4.tcp_keepalive_probes=5
net.ipv4.tcp_fin_timeout=10

UPD2

Under normal load at about 1800 RPS, when I set backlog on nginx to 10000 on 80 and 443 ports, and then reloaded nginx it became to use more RAM (3.8Gb out of my 4GB instance were used, and some workers were killed by OOM-killer), and with worker_priority at -15 load was over 6 (while my instance has 4 cores only). So, the instance was quite laggy, and I set worker_priority to -5, and backlog to 1000 for every port. For now, it uses less memory, and peak load was 3.8, but, nginx still becomes unresponsive for a minute or two after reload. So, the problem still persists.

Some netstat details:

netstat -tpn |awk '/:80/||/:443/{print $6}' |sort |uniq -c
      6 CLOSE_WAIT
     14 CLOSING
  17192 ESTABLISHED
    350 FIN_WAIT1
   1040 FIN_WAIT2
    216 LAST_ACK
    338 SYN_RECV

  52541 TIME_WAIT

Answer

If you have:

  keepalive_timeout  65;

I can imagine that it can take a while for connections to get terminated and workers restarted. I am not quite sure without looking in the code if nginx is waiting for them to expire once it gets a reload.

You could try lowering the value and see if it helps.

Blog

Saturday, February 25, 2017

linux - Irresponsive nginx while doing “nginx reload”

No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server