Thursday, September 5, 2019

nginx - Recurrent 502, 504 errors despite low cpu and memory usage

I'm struggling with this problem since quite a while. I have t2.site instance on AWS hosting a very low traffic personal website built with wordpress, nothing extraordinary. I first used Apache2, and after a few months running perfectly I suddenly started to got some 502, 504 errors. At that time I decided to move to Nginx/php5-fpm, things went well no more 502, 504 errors for a while. And since yesterday it just all happened again, I noticed that if I restart php5-fpm or mysql the site is accessible again, but only for 5-10 minutes before giving a 502 or 504 errors again.



I have followed multiple threads with similar problems nothing really worked, so I surrender and ask for help here. I changed multiple time the nginx and php5-fpm configuration without any durable change. I can see in the htop that there is a lot of mysql process and start suspecting that something (wordpress or ?) is creating a bunch of unnecessary connections to mysql but I don't know how to investigate further.



Here first my htop when things go wrong:



enter image description here




My nginx config :



server {
listen 80;
root /var/www/html/mysite;
index index.php index.html index.htm;
server_name my-site.com;

location / {

try_files $uri $uri/ /index.php?q=$uri&$args;
}

location ~ \.php$ {
try_files $uri =404;
#fastcgi_pass unix:/var/run/php5-fpm.sock;
fastcgi_read_timeout 150;
fastcgi_pass 127.0.0.1:9000;
fastcgi_buffer_size 16k;
fastcgi_buffers 4 16k;

fastcgi_index index.php;
include fastcgi_params;
}

location = /favicon.ico {
log_not_found off;
access_log off;
}

location = /robots.txt {

allow all;
log_not_found off;
access_log off;
}

location ~ /\. {
deny all;
}

location ~* /(?:uploads|files)/.*\.php$ {

deny all;
}
}


My /etc/php5/fpm/pool.d/www.conf file for php5-fpm configuration



; Start a new pool named 'www'.
; the variable $pool can we used in any directive and will be replaced by the
; pool name ('www' here)

[www]

; Per pool prefix
; It only applies on the following directives:
; - 'slowlog'
; - 'listen' (unixsocket)
; - 'chroot'
; - 'chdir'
; - 'php_values'
; - 'php_admin_values'

; When not set, the global prefix (or /usr) applies instead.
; Note: This directive can also be relative to the global prefix.
; Default Value: none
;prefix = /path/to/pools/$pool

; Unix user/group of processes
; Note: The user is mandatory. If the group is not set, the default user's group
; will be used.
user = www-data
group = www-data


; The address on which to accept FastCGI requests.
; Valid syntaxes are:
; 'ip.add.re.ss:port' - to listen on a TCP socket to a specific address on
; a specific port;
; 'port' - to listen on a TCP socket to all addresses on a
; specific port;
; '/path/to/unix/socket' - to listen on a unix socket.
; Note: This value is mandatory.
listen = 127.0.0.1:9000


; Set listen(2) backlog.
; Default Value: 65535 (-1 on FreeBSD and OpenBSD)
;listen.backlog = 65535

; Set permissions for unix socket, if one is used. In Linux, read/write
; permissions must be set in order to allow connections from a web server. Many
; BSD-derived systems allow connections regardless of permissions.
; Default Values: user and group are set as the running user
; mode is set to 0660

listen.owner = www-data
listen.group = www-data
;listen.mode = 0660

; List of ipv4 addresses of FastCGI clients which are allowed to connect.
; Equivalent to the FCGI_WEB_SERVER_ADDRS environment variable in the original
; PHP FCGI (5.2.2+). Makes sense only with a tcp listening socket. Each address
; must be separated by a comma. If this value is left blank, connections will be
; accepted from any ip address.
; Default Value: any

;listen.allowed_clients = 127.0.0.1

; Specify the nice(2) priority to apply to the pool processes (only if set)
; The value can vary from -19 (highest priority) to 20 (lower priority)
; Note: - It will only work if the FPM master process is launched as root
; - The pool processes will inherit the master process priority
; unless it specified otherwise
; Default Value: no set
; priority = -19


; Choose how the process manager will control the number of child processes.
; Possible Values:
; static - a fixed number (pm.max_children) of child processes;
; dynamic - the number of child processes are set dynamically based on the
; following directives. With this process management, there will be
; always at least 1 children.
; pm.max_children - the maximum number of children that can
; be alive at the same time.
; pm.start_servers - the number of children created on startup.
; pm.min_spare_servers - the minimum number of children in 'idle'

; state (waiting to process). If the number
; of 'idle' processes is less than this
; number then some children will be created.
; pm.max_spare_servers - the maximum number of children in 'idle'
; state (waiting to process). If the number
; of 'idle' processes is greater than this
; number then some children will be killed.
; ondemand - no children are created at startup. Children will be forked when
; new requests will connect. The following parameter are used:
; pm.max_children - the maximum number of children that

; can be alive at the same time.
; pm.process_idle_timeout - The number of seconds after which
; an idle process will be killed.
; Note: This value is mandatory.
pm = dynamic

; The number of child processes to be created when pm is set to 'static' and the
; maximum number of child processes when pm is set to 'dynamic' or 'ondemand'.
; This value sets the limit on the number of simultaneous requests that will be
; served. Equivalent to the ApacheMaxClients directive with mpm_prefork.

; Equivalent to the PHP_FCGI_CHILDREN environment variable in the original PHP
; CGI. The below defaults are based on a server without much resources. Don't
; forget to tweak pm.* to fit your needs.
; Note: Used when pm is set to 'static', 'dynamic' or 'ondemand'
; Note: This value is mandatory.
pm.max_children = 5

; The number of child processes created on startup.
; Note: Used only when pm is set to 'dynamic'
; Default Value: min_spare_servers + (max_spare_servers - min_spare_servers) / 2

pm.start_servers = 2

; The desired minimum number of idle server processes.
; Note: Used only when pm is set to 'dynamic'
; Note: Mandatory when pm is set to 'dynamic'
pm.min_spare_servers = 1

; The desired maximum number of idle server processes.
; Note: Used only when pm is set to 'dynamic'
; Note: Mandatory when pm is set to 'dynamic'

pm.max_spare_servers = 3

; The number of seconds after which an idle process will be killed.
; Note: Used only when pm is set to 'ondemand'
; Default Value: 10s
;pm.process_idle_timeout = 10s;

; The number of requests each child process should execute before respawning.
; This can be useful to work around memory leaks in 3rd party libraries. For
; endless request processing specify '0'. Equivalent to PHP_FCGI_MAX_REQUESTS.

; Default Value: 0
pm.max_requests = 500

; The URI to view the FPM status page. If this value is not set, no URI will be
; recognized as a status page. It shows the following informations:
; pool - the name of the pool;
; process manager - static, dynamic or ondemand;
; start time - the date and time FPM has started;
; start since - number of seconds since FPM has started;
; accepted conn - the number of request accepted by the pool;

; listen queue - the number of request in the queue of pending
; connections (see backlog in listen(2));
; max listen queue - the maximum number of requests in the queue
; of pending connections since FPM has started;
; listen queue len - the size of the socket queue of pending connections;
; idle processes - the number of idle processes;
; active processes - the number of active processes;
; total processes - the number of idle + active processes;
; max active processes - the maximum number of active processes since FPM
; has started;

; max children reached - number of times, the process limit has been reached,
; when pm tries to start more children (works only for
; pm 'dynamic' and 'ondemand');
; Value are updated in real time.
; Example output:
; pool: www
; process manager: static
; start time: 01/Jul/2011:17:53:49 +0200
; start since: 62636
; accepted conn: 190460

; listen queue: 0
; max listen queue: 1
; listen queue len: 42
; idle processes: 4
; active processes: 11
; total processes: 15
; max active processes: 12
; max children reached: 0
;
; By default the status page output is formatted as text/plain. Passing either

; 'html', 'xml' or 'json' in the query string will return the corresponding
; output syntax. Example:
; http://www.foo.bar/status
; http://www.foo.bar/status?json
; http://www.foo.bar/status?html
; http://www.foo.bar/status?xml
;
; By default the status page only outputs short status. Passing 'full' in the
; query string will also return status for each pool process.
; Example:

; http://www.foo.bar/status?full
; http://www.foo.bar/status?json&full
; http://www.foo.bar/status?html&full
; http://www.foo.bar/status?xml&full
; The Full status returns for each process:
; pid - the PID of the process;
; state - the state of the process (Idle, Running, ...);
; start time - the date and time the process has started;
; start since - the number of seconds since the process has started;
; requests - the number of requests the process has served;

; request duration - the duration in µs of the requests;
; request method - the request method (GET, POST, ...);
; request URI - the request URI with the query string;
; content length - the content length of the request (only with POST);
; user - the user (PHP_AUTH_USER) (or '-' if not set);
; script - the main script called (or '-' if not set);
; last request cpu - the %cpu the last request consumed
; it's always 0 if the process is not in Idle state
; because CPU calculation is done when the request
; processing has terminated;

; last request memory - the max amount of memory the last request consumed
; it's always 0 if the process is not in Idle state
; because memory calculation is done when the request
; processing has terminated;
; If the process is in Idle state, then informations are related to the
; last request the process has served. Otherwise informations are related to
; the current request being served.
; Example output:
; ************************
; pid: 31330

; state: Running
; start time: 01/Jul/2011:17:53:49 +0200
; start since: 63087
; requests: 12808
; request duration: 1250261
; request method: GET
; request URI: /test_mem.php?N=10000
; content length: 0
; user: -
; script: /home/fat/web/docs/php/test_mem.php

; last request cpu: 0.00
; last request memory: 0
;
; Note: There is a real-time FPM status monitoring sample web page available
; It's available in: ${prefix}/share/fpm/status.html
;
; Note: The value must start with a leading slash (/). The value can be
; anything, but it may not be a good idea to use the .php extension or it
; may conflict with a real PHP file.
; Default Value: not set

;pm.status_path = /status

; The ping URI to call the monitoring page of FPM. If this value is not set, no
; URI will be recognized as a ping page. This could be used to test from outside
; that FPM is alive and responding, or to
; - create a graph of FPM availability (rrd or such);
; - remove a server from a group if it is not responding (load balancing);
; - trigger alerts for the operating team (24/7).
; Note: The value must start with a leading slash (/). The value can be
; anything, but it may not be a good idea to use the .php extension or it

; may conflict with a real PHP file.
; Default Value: not set
;ping.path = /ping

; This directive may be used to customize the response of a ping request. The
; response is formatted as text/plain with a 200 response code.
; Default Value: pong
;ping.response = pong

; The access log file

; Default: not set
;access.log = log/$pool.access.log

; The access log format.
; The following syntax is allowed
; %%: the '%' character
; %C: %CPU used by the request
; it can accept the following format:
; - %{user}C for user CPU only
; - %{system}C for system CPU only

; - %{total}C for user + system CPU (default)
; %d: time taken to serve the request
; it can accept the following format:
; - %{seconds}d (default)
; - %{miliseconds}d
; - %{mili}d
; - %{microseconds}d
; - %{micro}d
; %e: an environment variable (same as $_ENV or $_SERVER)
; it must be associated with embraces to specify the name of the env

; variable. Some exemples:
; - server specifics like: %{REQUEST_METHOD}e or %{SERVER_PROTOCOL}e
; - HTTP headers like: %{HTTP_HOST}e or %{HTTP_USER_AGENT}e
; %f: script filename
; %l: content-length of the request (for POST request only)
; %m: request method
; %M: peak of memory allocated by PHP
; it can accept the following format:
; - %{bytes}M (default)
; - %{kilobytes}M

; - %{kilo}M
; - %{megabytes}M
; - %{mega}M
; %n: pool name
; %o: output header
; it must be associated with embraces to specify the name of the header:
; - %{Content-Type}o
; - %{X-Powered-By}o
; - %{Transfert-Encoding}o
; - ....

; %p: PID of the child that serviced the request
; %P: PID of the parent of the child that serviced the request
; %q: the query string
; %Q: the '?' character if query string exists
; %r: the request URI (without the query string, see %q and %Q)
; %R: remote IP address
; %s: status (response code)
; %t: server time the request was received
; it can accept a strftime(3) format:
; %d/%b/%Y:%H:%M:%S %z (default)

; %T: time the log has been written (the request has finished)
; it can accept a strftime(3) format:
; %d/%b/%Y:%H:%M:%S %z (default)
; %u: remote user
;
; Default: "%R - %u %t \"%m %r\" %s"
;access.format = "%R - %u %t \"%m %r%Q%q\" %s %f %{mili}d %{kilo}M %C%%"

; The log file for slow requests
; Default Value: not set

; Note: slowlog is mandatory if request_slowlog_timeout is set
;slowlog = /var/log/php5-fpm.log

; The timeout for serving a single request after which a PHP backtrace will be
; dumped to the 'slowlog' file. A value of '0s' means 'off'.
; Available units: s(econds)(default), m(inutes), h(ours), or d(ays)
; Default Value: 0
;request_slowlog_timeout = 10s

; The timeout for serving a single request after which the worker process will

; be killed. This option should be used when the 'max_execution_time' ini option
; does not stop script execution for some reason. A value of '0' means 'off'.
; Available units: s(econds)(default), m(inutes), h(ours), or d(ays)
; Default Value: 0
request_terminate_timeout = 30s

; Set open file descriptor rlimit.
; Default Value: system defined value
;rlimit_files = 1024
; Set max core size rlimit.

; Possible Values: 'unlimited' or an integer greater or equal to 0
; Default Value: system defined value
;rlimit_core = 0

; Chroot to this directory at the start. This value must be defined as an
; absolute path. When this value is not set, chroot is not used.
; Note: you can prefix with '$prefix' to chroot to the pool prefix or one
; of its subdirectories. If the pool prefix is not set, the global prefix
; will be used instead.
; Note: chrooting is a great security feature and should be used whenever

; possible. However, all PHP paths will be relative to the chroot
; (error_log, sessions.save_path, ...).
; Default Value: not set
;chroot =

; Chdir to this directory at the start.
; Note: relative path can be used.
; Default Value: current directory or / when chroot
chdir = /


; Redirect worker stdout and stderr into main error log. If not set, stdout and
; stderr will be redirected to /dev/null according to FastCGI specs.
; Note: on highloaded environement, this can cause some delay in the page
; process time (several ms).
; Default Value: no
;catch_workers_output = yes

; Limits the extensions of the main script FPM will allow to parse. This can
; prevent configuration mistakes on the web server side. You should only limit
; FPM to .php extensions to prevent malicious users to use other extensions to

; exectute php code.
; Note: set an empty value to allow all extensions.
; Default Value: .php
;security.limit_extensions = .php .php3 .php4 .php5

; Pass environment variables like LD_LIBRARY_PATH. All $VARIABLEs are taken from
; the current environment.
; Default Value: clean env
;env[HOSTNAME] = $HOSTNAME
;env[PATH] = /usr/local/bin:/usr/bin:/bin

;env[TMP] = /tmp
;env[TMPDIR] = /tmp
;env[TEMP] = /tmp

; Additional php.ini defines, specific to this pool of workers. These settings
; overwrite the values previously defined in the php.ini. The directives are the
; same as the PHP SAPI:
; php_value/php_flag - you can set classic ini defines which can
; be overwritten from PHP call 'ini_set'.
; php_admin_value/php_admin_flag - these directives won't be overwritten by

; PHP call 'ini_set'
; For php_*flag, valid values are on, off, 1, 0, true, false, yes or no.
; Pass environment variables like LD_LIBRARY_PATH. All $VARIABLEs are taken from
; the current environment.
; Default Value: clean env
;env[HOSTNAME] = $HOSTNAME
;env[PATH] = /usr/local/bin:/usr/bin:/bin
;env[TMP] = /tmp
;env[TMPDIR] = /tmp
;env[TEMP] = /tmp


; Additional php.ini defines, specific to this pool of workers. These settings
; overwrite the values previously defined in the php.ini. The directives are the
; same as the PHP SAPI:
; php_value/php_flag - you can set classic ini defines which can
; be overwritten from PHP call 'ini_set'.
; php_admin_value/php_admin_flag - these directives won't be overwritten by
; PHP call 'ini_set'
; For php_*flag, valid values are on, off, 1, 0, true, false, yes or no.


; Defining 'extension' will load the corresponding shared extension from
; extension_dir. Defining 'disable_functions' or 'disable_classes' will not
; overwrite previously defined php.ini values, but will append the new value
; instead.

; Note: path INI options can be relative and will be expanded with the prefix
; (pool, global or /usr)

; Default Value: nothing is defined by default except the values in php.ini and
; specified at startup with the -d argument

;php_admin_value[sendmail_path] = /usr/sbin/sendmail -t -i -f www@my.domain.com
;php_flag[display_errors] = off
;php_admin_value[error_log] = /var/log/fpm-php.www.log
;php_admin_flag[log_errors] = on
;php_admin_value[memory_limit] = 32M


My /var/log/nginx/error.log



2016/09/15 16:48:11 [error] 29608#0: *62681 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 180.76.15.19, server: my-site.com, request: "GET / HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "www.my-site.com"

2016/09/15 16:49:29 [error] 29608#0: *62801 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 180.76.15.141, server: my-site.com, request: "GET / HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "www.my-site.com"
2016/09/15 22:46:55 [error] 29607#0: *84028 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 180.76.15.139, server: my-site.com, request: "GET / HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "www.my-site.com"
2016/09/15 22:47:23 [error] 29607#0: *84244 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 180.76.15.145, server: my-site.com, request: "GET / HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "www.my-site.com"
2016/09/15 23:26:12 [error] 29607#0: *90756 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 66.249.71.17, server: my-site.com, request: "GET /category/distribution/ HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "my-site.com"
2016/09/15 23:49:28 [error] 29608#0: *94579 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 180.76.15.33, server: my-site.com, request: "GET / HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "www.my-site.com"
2016/09/15 23:50:50 [error] 29608#0: *94786 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 180.76.15.163, server: my-site.com, request: "GET / HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "www.my-site.com"


EDIT :
After checking php5-fpm log I found this :




My /var/log/upstart/php5-fpm.log



[16-Sep-2016 00:05:02] NOTICE: fpm is running, pid 32006
[16-Sep-2016 00:05:02] NOTICE: ready to handle connections
[16-Sep-2016 00:05:02] NOTICE: systemd monitor interval set to 10000ms
[16-Sep-2016 00:05:07] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it
[16-Sep-2016 01:03:39] NOTICE: Terminating ...
[16-Sep-2016 01:03:39] NOTICE: exiting, bye-bye!
[16-Sep-2016 01:03:39] NOTICE: fpm is running, pid 32699

[16-Sep-2016 01:03:39] NOTICE: ready to handle connections
[16-Sep-2016 01:03:39] NOTICE: systemd monitor interval set to 10000ms
[16-Sep-2016 01:03:43] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it


So I updated the php5-fpm config for the following parameters :
Currently I change the php5-fpm configuration, especially increasing the the pm.max_children value :



pm = dynamic
pm.max_children = 50

pm.start_servers = 5
pm.min_spare_servers = 5
pm.max_spare_servers = 35


The site is running better since a few moments, but I doubt this is a long term solution to the problem.



EDIT 2 :



After a long investigation of mysql.log (all queries), and of the access I found that I was under a brute force attack.

You can find information about this attack here: https://www.digitalocean.com/community/tutorials/how-to-protect-wordpress-from-xml-rpc-attacks-on-ubuntu-14-04



The access log was showing repeatedly this :



191.96.249.75 - - [16/Sep/2016:05:28:52 +0000] "POST /xmlrpc.php HTTP/1.0" 499 0 "-" "Mozilla/4.0 (compatible: MSIE 7.0; Windows NT 6.0)"
191.96.249.75 - - [16/Sep/2016:05:28:54 +0000] "POST /xmlrpc.php HTTP/1.0" 499 0 "-" "Mozilla/4.0 (compatible: MSIE 7.0; Windows NT 6.0)"
191.96.249.75 - - [16/Sep/2016:05:28:57 +0000] "POST /xmlrpc.php HTTP/1.0" 499 0 "-" "Mozilla/4.0 (compatible: MSIE 7.0; Windows NT 6.0)"
191.96.249.75 - - [16/Sep/2016:05:28:58 +0000] "POST /xmlrpc.php HTTP/1.0" 499 0 "-" "Mozilla/4.0 (compatible: MSIE 7.0; Windows NT 6.0)"
191.96.249.75 - - [16/Sep/2016:05:29:01 +0000] "POST /xmlrpc.php HTTP/1.0" 499 0 "-" "Mozilla/4.0 (compatible: MSIE 7.0; Windows NT 6.0)"

No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...