Over the past couple of days my website's performance has been very sluggish, with queries taking a lot of time to execute. My CPU usage hit around 100% 4 times this week. Here is the output of top
at one such time
top - 00:08:03 up 3 days, 21:47, 2 users, load average: 6.06, 1.95, 0.84
Tasks: 92 total, 2 running, 90 sleeping, 0 stopped, 0 zombie
%Cpu(s): 86.1 us, 12.9 sy, 0.0 ni, 0.0 id, 0.0 wa, 1.0 hi, 0.0 si, 0.0 st
KiB Mem: 1017948 total, 773520 used, 244428 free, 107200 buffers
KiB Swap: 0 total, 0 used, 0 free. 257228 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
28433 www-data 20 0 854660 69288 5608 S 98.7 6.8 0:47.36 apache2
28469 www-data 20 0 529692 7692 3012 S 0.7 0.8 0:00.13 apache2
28514 root 20 0 24820 1488 1064 R 0.7 0.1 0:00.08 top
25 root 20 0 0 0 0 S 0.3 0.0 1:00.70 kworker/0:1
28518 postgres 20 0 370016 6984 4276 S 0.3 0.7 0:00.01 postgres
1 root 20 0 33384 1288 0 S 0.0 0.1 0:11.70 init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.13 kthreadd
3 root 20 0 0 0 0 S 0.0 0.0 0:09.40 ksoftirqd/0
5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H
7 root 20 0 0 0 0 S 0.0 0.0 0:45.06 rcu_sched
8 root 20 0 0 0 0 R 0.0 0.0 1:54.47 rcuos/0
9 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcu_bh
10 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcuob/0
11 root rt 0 0 0 0 S 0.0 0.0 0:00.00 migration/0
Apache seems to be taking up a lot of CPU but I have no idea why. It was working perfectly up until a couple of days ago. I've optimized Apache by removing unused modules, tuned it to only have a small number of spare children running but that doesn't seem to have made a difference. I've also installed mod-evasive
and mod-qos
to protect against DDOS. Here is my apache configuration
Timeout 30
KeepAlive On
MaxKeepAliveRequests 100
KeepAliveTimeout 5
StartServers 1
MinSpareServers 1
MaxSpareServers 3
MaxClients 10
MaxRequestsPerChild 3000
StartServers 1
MinSpareThreads 5
MaxSpareThreads 15
ThreadLimit 25
ThreadsPerChild 5
MaxClients 25
MaxRequestsPerChild 200
StartServers 1
MinSpareThreads 5
MaxSpareThreads 15
ThreadLimit 25
ThreadsPerChild 5
MaxClients 25
MaxRequestsPerChild 200
MS_METHODS POST,PUT,OPTIONS,CONNECT
MS_WhiteList /etc/spamhaus.wl
MS_CacheSize 256
Here is my VirtualHost configuration
RewriteEngine On
RewriteCond %{HTTP_HOST} ^example.com [nocase]
RewriteRule ^(.*) http://www.example.com$1 [last,redirect=301]
ServerName example.com
ServerAlias www.example.com
ServerAdmin admin@example.com
WSGIDaemonProcess example python-path=/home/abc/example:/home/abc/example/env/lib/python2.7/site-packages
WSGIProcessGroup example
WSGIApplicationGroup %{GLOBAL}
WSGIScriptAlias / /home/abc/example/wsgi.py
DocumentRoot /home/abc/example
Require all granted
Alias /static/ /home/abc/example/static/
Order deny,allow
Allow from all
Alias /media/ /home/abc/example/media/
Order deny,allow
Allow from all
ErrorLog ${APACHE_LOG_DIR}/error.log
CustomLog ${APACHE_LOG_DIR}/access.log combined
Here is my .htaccess
file
Header set Cache-Control "max-age=31536000, public"
Header set Cache-Control "max-age=604800, public"
AddType application/javascript js
AddType application/vnd.ms-fontobject eot
AddType application/x-font-ttf ttf ttc
AddType font/opentype otf
AddType application/x-font-woff woff
AddType image/svg+xml svg svgz
AddEncoding gzip svgz
AddOutputFilterByType DEFLATE text/html text/plain text/css application/json
AddOutputFilterByType DEFLATE application/javascript
AddOutputFilterByType DEFLATE text/xml application/xml text/x-component
AddOutputFilterByType DEFLATE application/xhtml+xml application/rss+xml application/atom+xml
AddOutputFilterByType DEFLATE image/x-icon image/svg+xml application/vnd.ms-fontobject application/x-font-ttf font/opentype
I'm using memcached to cache most of the queries. Web pages with few basic queries are faster (still not as fast as before though) while pages with complex queries take a lot of time. The server response time for such pages has increased from 0.2 seconds to 4 seconds (measured using Google PageSpeed Insights).
I'm using a PostgreSQL 9.3 database. Following is my postgresql.conf tuned using PgTune.
default_statistics_target = 50
maintenance_work_mem = 60MB
constraint_exclusion = on
checkpoint_completion_target = 0.9
effective_cache_size = 704MB
work_mem = 6MB
wal_buffers = 8MB
checkpoint_segments = 16
shared_buffers = 240MB
max_connections = 80
Here is the graph of CPU, Disk and Bandwidth usage over the past month
Though the bandwidth shows an increase in the past week or so, but the actual traffic hasn't increased. I am getting an average of 1500 visitors per day for the past 15-20 days. The increased bandwidth usage could probably be an increase in bot activity.
My website is a Django application hosted on droplet with the configuration - 1GB Ram, 30GB SSD Disk, Ubuntu 14.04 x64. I have tried every possible thing I could think and cannot for the life of me figure out what's wrong here. I am not very good at handling servers and the only thing I can think of now is to switch from Apache to nginx and from PostgreSQL to MySQL. Any suggestions that can help me figure out how to fix this would be greatly appreciated.
No comments:
Post a Comment