Sunday, January 31, 2016

linux - MySQL server sporadically runs rampage

I'm faced with a baffling problem where mysqld sporadically gets in an awkward state of very high CPU usage (several hundred threads, all fighting for CPU time) and quick memory accumulation (to not say leaking). Once this starts happening the server gets very slow at fulfilling queries, often slow enough to trigger timeouts in applications waiting for it.



Eventually it will have accumulated enough memory to start triggering the Linux kernel OOM-killer, at which point the entire system is already practically unusably slow from constant swapping that a hard reset is needed.
Setting tighter ulimit values for the mysql user for memory allocation has fixed that specific symptom, but the underlying problem still remains.



Once the MySQL server is in this state it takes several minutes after the shutdown command to actually stop, which makes restarting it nerve-racking too. Although I still prefer to try and shut it down cleanly to avoid breaking replication.



The slow query log did not prove helpful as once the server is in this state every query gets logged as a slow query, making it difficult to pick out any actual offenders.



Upgrading the mysql server to version 5.6.43-1debian8 has reduced the problem from occurring every few days to every couple of weeks.




I have not yet been able to acquire a SHOW FULL PROCESS LIST; during an ongoing situation.



The machine this is running on is a 12 core / 24 thread with 64 GiB of memory (with most of that available to MySQL) and 32 GiB of HDD swap.



Here's my entire my.cnf after suggestions from here (It's uncertain if these changes have fixed the problem already, I'll be able to tell with time).



#
# The MySQL database server configuration file.
#

# You can copy this to one of:
# - "/etc/mysql/my.cnf" to set global options,
# - "~/.my.cnf" to set user-specific options.
#
# One can use all long options that the program supports.
# Run program with --help to get a list of available options and with
# --print-defaults to see which it would actually understand and use.
#
# For explanations see
# http://dev.mysql.com/doc/mysql/en/server-system-variables.html


# This will be passed to all mysql clients
# It has been reported that passwords should be enclosed with ticks/quotes
# escpecially if they contain "#" chars...
# Remember to edit /etc/mysql/debian.cnf when changing the socket location.
[client]
port = 3306
socket = /var/run/mysqld/mysqld.sock
# default-character-set=latin1


# Here is entries for some specific programs
# The following values assume you have at least 32M ram

# This was formally known as [safe_mysqld]. Both versions are currently parsed.
[mysqld_safe]
log_error = /var/log/mysql/mysql_error.log
socket = /var/run/mysqld/mysqld.sock
nice = 0

[mysqld]

#
# * Basic Settings
#
user = mysql
pid-file = /var/run/mysqld/mysqld.pid
socket = /var/run/mysqld/mysqld.sock
port = 3306
#basedir = /usr/local
#ledir = /usr/local/bin
#datadir = /var/lib/mysql

#datadir = /ssd/mysql
datadir = /mnt/vg0_lv0/mysql/
tmpdir = /tmp
#language = /usr/local/share/mysql/english
#open-files-limit = 32000
sql_mode = NO_ENGINE_SUBSTITUTION

character-set-server=latin1
collation-server=latin1_swedish_ci


net_read_timeout = 120
skip-external-locking
skip-name-resolve
#
# Instead of skip-networking the default is now to listen only on
# localhost which is more compatible and is not less secure.
bind-address = ::
#
# * Fine Tuning
#

# mysqld might use { key_buffer + (read_buffer_size + read_rnd_buffer_size + sort_buffer_size + join_buffer_size) * max_threads } bytes of memory
#max_threads = 2048 # probably bound to max_connections
key_buffer = 1536M # global limit
key_buffer_size = 1536M # not sure which spelling is correct, use both

# as said above, these limits are per connection and NOT global
read_buffer_size = 16M
read_rnd_buffer_size = 16M
sort_buffer_size = 16M
myisam_sort_buffer_size = 16M

join_buffer_size = 20M

max_allowed_packet = 128M
preload_buffer_size = 384K
tmp_table_size = 192M
max_heap_table_size = 192M
thread_stack = 256K
thread_cache_size = 24

# This replaces the startup script and checks MyISAM tables if needed

# the first time they are touched
myisam-recover = BACKUP
max_connections = 500
table_open_cache = 3200
table_open_cache_instances = 2 # approx. half of cores used routinely
#table_definition_cache # automatically set to min(400+(table_open_cache/2),2000)
#thread_concurrency = 48
#
# * Query Cache Configuration
#

# Defaults: 1M, 16M
#query_cache_limit = 12M
#query_cache_size = 2048M
#query_cache_type = 1
#query_cache_min_res_unit = 32K
# Query Cache disabled due to lock contention
query_cache_size = 0
query_cache_type = 0

# query_prealloc_size = 64M


wait_timeout = 3600
interactive_timeout = 28800

#
# * Logging and Replication
#
# Both location gets rotated by the cronjob.
# Be aware that this log type is a performance killer.
# As of 5.1 you can enable the log at runtime!

#general_log_file = /var/log/mysql/mysql.log
#general_log = 1
log_error = /var/log/mysql/mysql_error.log
#
# Error logging goes to syslog due to /etc/mysql/conf.d/mysqld_safe_syslog.cnf.
#
# Here you can see queries with especially long duration
slow_query_log = 1
slow_query_log_file = /var/log/mysql/mysql-slow.log
long_query_time = 10

#log-queries-not-using-indexes = 1 # logs all queries that filter without an index, not just bad JOINs (spammy)
#log_slow_queries = /var/log/mysql/mysql-slow.log # old, invalid
#long_query_time = 2
#log-queries-not-using-indexes = 1
#
# The following can be used as easy to replay backup logs or for replication.
# note: if you are setting up a replication slave, see README.Debian about
# other settings you may need to change.
server-id = 3


log_bin = /mnt/vg0_lv0/mysql_binlog/mysql-bin.log
expire_logs_days = 31
max_binlog_size = 1073741824

#log_bin = /var/log/mysql/mysql-bin.log
#expire_logs_days = 31
#max_binlog_size = 8192M
#binlog_do_db = include_database_name
#binlog_ignore_db = include_database_name
#slave-net-timeout = 30

#master-retry-count = 99999
#slave-skip-errors=1062,1053
#slave-skip-errors=1062,1053,1050,1146,1051,1396,1060,1054

#
# * InnoDB
#
# InnoDB is enabled by default with a 10MB datafile in /var/lib/mysql/.
# Read the manual for more InnoDB related options. There are many!
innodb_buffer_pool_size = 6144M

innodb_buffer_pool_instances = 6 # only >1 if each instance gets at least 1GiB, i_buffer_pool_size will be divided by this
innodb_log_file_size = 768M
innodb_file_per_table = on
innodb_ft_cache_size = 32M
innodb_ft_total_cache_size = 640M
innodb_additional_mem_pool = 32M
innodb_thread_concurrency = 0 # 0=autodetect
innodb_read_io_threads = 64 # 64=max
innodb_write_io_threads = 64
innodb_log_buffer_size = 8M


# * Security Features
#
# Read the manual, too, if you want chroot!
# chroot = /var/lib/mysql/
#
# For generating SSL certificates I recommend the OpenSSL GUI "tinyca".
#
# ssl-ca=/etc/mysql/cacert.pem
# ssl-cert=/etc/mysql/server-cert.pem

# ssl-key=/etc/mysql/server-key.pem



[mysqldump]
quick
quote-names
max_allowed_packet = 32M

[mysql]

#no-auto-rehash # faster start of mysql but no tab completition

[isamchk]
key_buffer = 16M

#
# * IMPORTANT: Additional settings that can override those from this file!
# The files must end with '.cnf', otherwise they'll be ignored.
#
!includedir /etc/mysql/conf.d/



A mysqltuner report after suggestions from here (comments added where snipped):



 >>  MySQLTuner 1.7.13 - Major Hayden 
>> Bug reports, feature requests, and downloads at http://mysqltuner.com/
>> Run with '--help' for additional options and output filtering

[--] Skipped version check for MySQLTuner script
[OK] Logged in using credentials passed on the command line

[OK] Currently running supported MySQL version 5.6.44-log
[OK] Operating on 64-bit architecture

-------- Log file Recommendations ------------------------------------------------------------------
[--] Log file: /var/log/mysql/mysql_error.log(2M)
[OK] Log file /var/log/mysql/mysql_error.log exists
[OK] Log file /var/log/mysql/mysql_error.log is readable.
[OK] Log file /var/log/mysql/mysql_error.log is not empty
[OK] Log file /var/log/mysql/mysql_error.log is smaller than 32 Mb
[!!] /var/log/mysql/mysql_error.log contains 2285 warning(s).

- Almost all of these are replication warnings about RAND()
[!!] /var/log/mysql/mysql_error.log contains 27 error(s).
- Almost all of these are "Logging to '/var/log/mysql/mysql_error.log'." and similar.
[--] 18 start(s) detected in /var/log/mysql/mysql_error.log
[--] 1) 2019-05-09 20:27:44 34546 [Note] /usr/sbin/mysqld: ready for connections.
[--] 2) 2019-05-09 20:23:44 8281 [Note] /usr/sbin/mysqld: ready for connections.
[--] 3) 2019-05-09 20:22:38 39057 [Note] /usr/sbin/mysqld: ready for connections.
[--] 4) 2019-05-09 20:01:56 43432 [Note] /usr/sbin/mysqld: ready for connections.
[--] 5) 2019-05-09 19:55:54 22899 [Note] /usr/sbin/mysqld: ready for connections.
[--] 6) 2019-05-09 19:48:51 33825 [Note] /usr/sbin/mysqld: ready for connections.

[--] 7) 2019-05-09 19:26:38 20046 [Note] /usr/sbin/mysqld: ready for connections.
[--] 8) 2019-05-01 14:28:48 1915 [Note] /usr/sbin/mysqld: ready for connections.
[--] 9) 2019-04-15 19:50:40 9309 [Note] /usr/sbin/mysqld: ready for connections.
[--] 10) 2019-04-15 19:40:28 43854 [Note] /usr/sbin/mysqld: ready for connections.
[--] 16 shutdown(s) detected in /var/log/mysql/mysql_error.log
[--] 1) 2019-05-09 20:27:20 8281 [Note] /usr/sbin/mysqld: Shutdown complete
[--] 2) 2019-05-09 20:23:20 39057 [Note] /usr/sbin/mysqld: Shutdown complete
[--] 3) 2019-05-09 20:22:12 43432 [Note] /usr/sbin/mysqld: Shutdown complete
[--] 4) 2019-05-09 20:01:32 22899 [Note] /usr/sbin/mysqld: Shutdown complete
[--] 5) 2019-05-09 19:55:29 33825 [Note] /usr/sbin/mysqld: Shutdown complete

[--] 6) 2019-05-09 19:48:21 20046 [Note] /usr/sbin/mysqld: Shutdown complete
[--] 7) 2019-05-09 19:26:06 1915 [Note] /usr/sbin/mysqld: Shutdown complete
[--] 8) 2019-05-01 14:28:01 9309 [Note] /usr/sbin/mysqld: Shutdown complete
[--] 9) 2019-04-15 19:50:16 43854 [Note] /usr/sbin/mysqld: Shutdown complete
[--] 10) 2019-04-15 19:40:02 7435 [Note] /usr/sbin/mysqld: Shutdown complete

-------- Storage Engine Statistics -----------------------------------------------------------------
[--] Status: +ARCHIVE +BLACKHOLE +CSV -FEDERATED +InnoDB +MEMORY +MRG_MYISAM +MyISAM +PERFORMANCE_SCHEMA
[--] Data in InnoDB tables: 25.4G (Tables: 9193)
[--] Data in MyISAM tables: 6.1G (Tables: 4007)

[OK] Total fragmented tables: 0

-------- Analysis Performance Metrics --------------------------------------------------------------
[--] innodb_stats_on_metadata: OFF
[OK] No stat updates during querying INFORMATION_SCHEMA.

-------- Security Recommendations ------------------------------------------------------------------
[OK] There are no anonymous accounts for any database users
[OK] All database users have passwords assigned
[!!] User '*****@%' has user name as password.

- A few more like these, removed for privacy.
[!!] User '*****@%' does not specify hostname restrictions.
- Several dozen more like these, removed for privacy. Restrictions are implemented via iptables.
[--] There are 618 basic passwords in the list.
[!!] User '*****@localhost' is using weak password: 123456 in a lower, upper or capitalize derivative version.
[!!] User '*****@%' is using weak password: test in a lower, upper or capitalize derivative version.

-------- CVE Security Recommendations --------------------------------------------------------------
[OK] NO SECURITY CVE FOUND FOR YOUR VERSION


-------- Performance Metrics -----------------------------------------------------------------------
[--] Up for: 2d 21h 36m 2s (112M q [450.481 qps], 1M conn, TX: 225G, RX: 21G)
[--] Reads / Writes: 95% / 5%
[--] Binary logging is enabled (GTID MODE: OFF)
[--] Physical Memory : 63.0G
[--] Max MySQL memory : 41.5G
[--] Other process memory: 4.9G
[--] Total buffers: 7.7G global + 68.2M per thread (500 max threads)
[--] P_S Max memory usage: 487M
[--] Galera GCache Max memory usage: 0B

[OK] Maximum reached memory usage: 11.6G (18.42% of installed RAM)
[OK] Maximum possible memory usage: 41.5G (65.92% of installed RAM)
[OK] Overall possible memory usage with other process is compatible with memory available
[OK] Slow queries: 0% (64K/112M)
[OK] Highest usage of available connections: 10% (51/500)
[OK] Aborted connections: 0.16% (2506/1587046)
[OK] Query cache is disabled by default due to mutex contention on multiprocessor machines.
[OK] Sorts requiring temporary tables: 0% (0 temp sorts / 4M sorts)
[!!] Joins performed without indexes: 1703245
[OK] Temporary tables created on disk: 6% (998K on disk / 15M total)

[OK] Thread cache hit rate: 99% (64 created / 1M connections)
[!!] Table cache hit rate: 0% (3K open / 405K opened)
[OK] Open file limit used: 4% (1K/24K)
[OK] Table locks acquired immediately: 99% (142M immediate / 142M locks)
[OK] Binlog cache memory access: 99.99% (4902582 Memory / 4902885 Total)

-------- Performance schema ------------------------------------------------------------------------
[--] Memory used by P_S: 487.9M
[--] Sys schema isn't installed.


-------- ThreadPool Metrics ------------------------------------------------------------------------
[--] ThreadPool stat is disabled.

-------- MyISAM Metrics ----------------------------------------------------------------------------
[!!] Key buffer used: 22.7% (365M used / 1B cache)
[OK] Key buffer size / total MyISAM indexes: 1.5G/1.4G
[OK] Read Key buffer hit rate: 97.7% (1B cached / 37M reads)
[!!] Write Key buffer hit rate: 15.9% (454K cached / 72K writes)

-------- InnoDB Metrics ----------------------------------------------------------------------------

[--] InnoDB is enabled.
[--] InnoDB Thread Concurrency: 0
[OK] InnoDB File per table is activated
[!!] InnoDB buffer pool / data size: 6.0G/25.4G
[OK] Ratio InnoDB log file size / InnoDB Buffer pool size: 768.0M * 2/6.0G should be equal 25%
[OK] InnoDB buffer pool instances: 6
[--] InnoDB Buffer Pool Chunk Size not used or defined in your version
[OK] InnoDB Read buffer efficiency: 100.00% (180155088749 hits/ 180158103035 total)
[!!] InnoDB Write Log efficiency: 37.14% (922821 hits/ 2484433 total)
[OK] InnoDB log waits: 0.00% (0 waits / 1561612 writes)


-------- AriaDB Metrics ----------------------------------------------------------------------------
[--] AriaDB is disabled.

-------- TokuDB Metrics ----------------------------------------------------------------------------
[--] TokuDB is disabled.

-------- XtraDB Metrics ----------------------------------------------------------------------------
[--] XtraDB is disabled.


-------- Galera Metrics ----------------------------------------------------------------------------
[--] Galera is disabled.

-------- Replication Metrics -----------------------------------------------------------------------
[--] Galera Synchronous replication: NO
[--] This server is acting as master for 1 server(s).
[--] Binlog format: STATEMENT
[--] XA support enabled: ON
[--] Semi synchronous replication Master: Not Activated
[--] Semi synchronous replication Slave: Not Activated

[--] No replication setup for this server or replication not started.

-------- Recommendations ---------------------------------------------------------------------------
General recommendations:
Control warning line(s) into /var/log/mysql/mysql_error.log file
Control error line(s) into /var/log/mysql/mysql_error.log file
Set up a Secure Password for user@host ( SET PASSWORD FOR 'user'@'SpecificDNSorIp' = PASSWORD('secure_password'); )
Restrict Host for user@% to user@SpecificDNSorIp
4 user(s) used basic or weak password.
Adjust your join queries to always utilize indexes

Increase table_open_cache gradually to avoid file descriptor limits
Read this before increasing table_open_cache over 64: [snip - bit.ly not allowed]
Read this before increasing for MariaDB https://mariadb.com/kb/en/library/optimizing-table_open_cache/
This is MyISAM only table_cache scalability problem, InnoDB not affected.
See more details here: https://bugs.mysql.com/bug.php?id=49177
This bug already fixed in MySQL 5.7.9 and newer MySQL versions.
Beware that open_files_limit (24000) variable
should be greater than table_open_cache (3200)
Consider installing Sys schema from https://github.com/mysql/mysql-sys
Variables to adjust:

join_buffer_size (> 20.0M, or always use indexes with JOINs)
table_open_cache (> 3200)
innodb_buffer_pool_size (>= 25.4G) if possible.


What steps can I take to try and debug this problem while it isn't currently happening?
What steps can I take to try and fix the problem?

No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...