mysqlfailoverをデーモンになってから試してみた

2014年4月1日

komi

※ 古い記事ですのでご注意ください。

こんにちは。小宮です。

まだ使いたい人がいるかわかりませんが検証してみましたので載せておきます。
長い記事になりますのでお時間のあるときにどうぞ。

mysqlfailoverを–forceつけず–daemon=startで起動させる

以前検証したときは
　–forceつけないと動かなかったのと
　デーモンで起動させるオプションは存在しなかった
というわけでそこを再度たしかめてみます。

・構成：

192.168.1.133 komiya-test-mysql01 my1

192.168.1.155 komiya-test-mysql02 my2

192.168.1.150 komiya-test-mysql03 my3

192.168.1.241 komiya-test-mysql04 my4 manager

192.168.1.222 vip

・インストール：
パッケージは公式とかこのへんからダウンロードできます。
とりあえずchefでmysql5.6とutility等を入れておきました。
ssh-copy-idとknife solo prepareして
nodeファイルのrunlistにdbロール指定して
knife solo cookしただけで以下と必要な設定ファイルが置かれて
server_idとかreport_hostとか自動的に入るようにしました。
参考にしたレシピはこちら。

以下が関連パッケージです。
mysql-utilitiesはpythonで書かれたツールなのでmysql-connector-pythonが必要です。

$ rpm -qa|grep -i mysql

MySQL-shared-compat-5.6.15-1.linux_glibc2.5.x86_64

MySQL-test-5.6.15-1.linux_glibc2.5.x86_64

perl-DBD-MySQL-4.013-3.el6.x86_64

mysql-utilities-1.4.1-1.el6.noarch

mysqltuner-1.1.1-1.el6.noarch

MySQL-client-5.6.15-1.linux_glibc2.5.x86_64

MySQL-server-5.6.15-1.linux_glibc2.5.x86_64

MySQL-devel-5.6.15-1.linux_glibc2.5.x86_64

mysql-connector-python-1.1.4-1.el6.noarch

mysqlreport-3.5-4.el6.noarch

・replicaitonを組む
server_idはipアドレスの第4オクテットにしたので重複しないはず。
report_hostも自分のIPに自動的になってるはず。

replicaitonに関する設定は以下のとおりとなっています

## replication (master/slave)

log-bin=mysql-bin

log-bin-index=mysql-bin.index

binlog_format=mixed

server-id = 133

relay-log=mysqld-relay-bin

relay-log-index=mysql-relay-bin.index

log_slave_updates=1

replicate-ignore-db=mysql,information_schema,performance_schema

binlog-ignore-db=mysql,information_schema,performance_schema

skip_slave_start

read_only

#slave_net_timeout=120

## replication (for 5.6)

gtid-mode = OFF

enforce_gtid_consistency=false

master-info-repository=TABLE

relay-log-info-repository=TABLE

relay_log_recovery=ON

#sync-master-info=1

slave-parallel-workers=0

binlog-checksum=CRC32

#master-verify-checksum=1

#slave-sql-verify-checksum=1

binlog-rows-query-log_events=1

#log_bin_use_v1_row_events=ON

#sync_binlog=1

report-port=3306

report-host = 192.168.1.133

※パラメータ調整が必要でした。

gtid-mode = ON

enforce_gtid_consistency=true

でないとmysqlfailoverは動かないです。たしか。
ちなみにGTIDをONにするとトランザクションセーフでない処理ができなくなります。
（MyISAMストレージエンジンが利用できない、Create..Selectできない等）

sed -i 's/gtid-mode = OFF/gtid-mode = ON/g' /etc/my.cnf

sed -i 's/enforce_gtid_consistency=false/enforce_gtid_consistency=true/g' /etc/my.cnf

service mysql restart

1をマスタ、他をスレーブにする

repl等の必要なアカウント追加のGRANT文はレシピ内に含まれているので確認だけする。

mysql> select user,host,password from mysql.user;

+------+---------------------+-------------------------------------------+

| user | host | password |

+------+---------------------+-------------------------------------------+

| root | localhost | *E8DD65E018E30F27D962FB9BFA2F4E8206DC3AF8 |

| root | komiya-test-mysql01 | *6A60A70C59535B75A79FDE4C7C55FDA55FC40A55 |

| root | 127.0.0.1 | *E8DD65E018E30F27D962FB9BFA2F4E8206DC3AF8 |

| root | ::1 | *E8DD65E018E30F27D962FB9BFA2F4E8206DC3AF8 |

| root | 192.168.% | *E8DD65E018E30F27D962FB9BFA2F4E8206DC3AF8 |

| repl | 192.168.% | *43E209EED080057E35C2630AC06D32960A46D120 |

+------+---------------------+-------------------------------------------+

1～3でreset master;を打ってポジションを初期化しておく(データが無い為実施。ある状態での実施は禁止。不整合に注意)

2,3にて

mysql> CHANGE MASTER TO

MASTER_HOST='192.168.1.133',

MASTER_PORT=3306,

MASTER_USER='repl',

MASTER_PASSWORD='re*****',

MASTER_LOG_FILE='mysql-bin.000001',

MASTER_LOG_POS=120;

mysql> start slave;

mysql> show slave status\G

mysql> set global read_only=1;

mysql> show global variables like 'read_only';

+---------------+-------+

| Variable_name | Value |

+---------------+-------+

| read_only | ON |

+---------------+-------+

autoポジションを有効化する場合

stop slave;

mysql> CHANGE MASTER TO

MASTER_HOST='192.168.1.133',

MASTER_PORT=3306,

MASTER_USER='repl',

MASTER_PASSWORD='re*****',

MASTER_AUTO_POSITION = 1;

start slave;

※GTIDが有効な場合、ポジション自動指定が可能なので
　chefでreplicaiton組むのに複雑にならず楽にレシピ書けそう。

1にてreplicaiton確認

mysql> show slave hosts;

+-----------+---------------+------+-----------+--------------------------------------+

+-----------+---------------+------+-----------+--------------------------------------+

| 150 | 192.168.1.150 | 3306 | 133 | afee6fde-978f-11e3-9f2a-02883e765295 |

| 155 | 192.168.1.155 | 3306 | 133 | 896d4156-9846-11e3-a3d2-02619050bb48 |

+-----------+---------------+------+-----------+--------------------------------------+

2 rows in set (0.00 sec)

4にてutilityを用いてreplicaiton確認
ひとまず各ホストにssh鍵認証でアクセス可能な必要がある
(replicaiton設定が可能な権限が必要らしくrootで鍵とおしておいた）

# mysqlrpladmin --master=root:`cat /path_to_file`@192.168.1.133:3306 \

> --slaves=root:`cat /path_to_file`@192.168.1.155:3306,root:`cat /path_to_file`@192.168.1.150:3306 health

# Checking privileges.

# Replication Topology Health:

+----------------+-------+---------+--------+------------+---------+

+----------------+-------+---------+--------+------------+---------+

| 192.168.1.133 | 3306 | MASTER | UP | ON | OK |

| 192.168.1.150 | 3306 | SLAVE | UP | ON | OK |

| 192.168.1.155 | 3306 | SLAVE | UP | ON | OK |

+----------------+-------+---------+--------+------------+---------+

# ...done.

# mysqlrplcheck --master=root:`cat /path_to_file`@192.168.1.133:3306 --slave=root:`cat /path_to_file`@192.168.1.155:3306

# master on 192.168.1.133: ... connected.

# slave on 192.168.1.155: ... connected.

Test Description Status

---------------------------------------------------------------------------

Checking for binary logging on master [pass]

Are there binlog exceptions? [WARN]

+---------+--------+----------------------------------------------+

| server | do_db | ignore_db |

+---------+--------+----------------------------------------------+

| master | | mysql,information_schema,performance_schema |

| slave | | mysql,information_schema,performance_schema |

+---------+--------+----------------------------------------------+

Replication user exists? [pass]

Checking server_id values [pass]

Checking server_uuid values [pass]

Is slave connected to master? [pass]

Check master information file [pass]

Checking InnoDB compatibility [pass]

Checking storage engines compatibility [pass]

Checking lower_case_table_names settings [pass]

Checking slave delay (seconds behind master) [pass]

# ...done.

# mysqlrplcheck --master=root:`cat /path_to_file`@192.168.1.133:3306 --slave=root:`cat /path_to_file`@192.168.1.150:3306

# master on 192.168.1.133: ... connected.

# slave on 192.168.1.150: ... connected.

Test Description Status

---------------------------------------------------------------------------

Checking for binary logging on master [pass]

Are there binlog exceptions? [WARN]

+---------+--------+----------------------------------------------+

| server | do_db | ignore_db |

+---------+--------+----------------------------------------------+

| master | | mysql,information_schema,performance_schema |

| slave | | mysql,information_schema,performance_schema |

+---------+--------+----------------------------------------------+

Replication user exists? [pass]

Checking server_id values [pass]

Checking server_uuid values [pass]

Is slave connected to master? [pass]

Check master information file [pass]

Checking InnoDB compatibility [pass]

Checking storage engines compatibility [pass]

Checking lower_case_table_names settings [pass]

Checking slave delay (seconds behind master) [pass]

# ...done.

# mysqlrplshow \

> --master=root:`cat /path_to_file`@192.168.1.133:3306 \

> --discover-slaves-login=root:`cat /path_to_file`

# master on 192.168.1.133: ... connected.

# Finding slaves for master: 192.168.1.133:3306

# Replication Topology Graph

192.168.1.133:3306 (MASTER)

+--- 192.168.1.150:3306 - (SLAVE)

+--- 192.168.1.155:3306 - (SLAVE)

・mysqlfailoverコマンドのヘルプを確認してみる

# mysqlfailover --help

------------------------------------------------

MySQL Utilities mysqlfailover version 1.4.1 (part of MySQL Workbench Distribution 6.0.0)

License type: GPLv2

Usage: mysqlfailover --master=root@localhost --discover-slaves-login=root --candidates=root@host123:3306,root@host456:3306

mysqlfailover - automatic replication health monitoring and failover

Options:

--version show program's version number and exit

--help display this help message and exit

--license display program's license and exit

--candidates=CANDIDATES

connection information for candidate slave servers for

failover in the form:

<user>[:<password>]@<host>[:<port>][:<socket>] or

<login-path>[:<port>][:<socket>]. Valid only with

failover command. List multiple slaves in comma-

separated list.

--discover-slaves-login=DISCOVER

at startup, query master for all registered slaves and

use the user name and password specified to connect.

Supply the user and password in the form

<user>[:<password>] or <login-path>. For example,

--discover-slaves-login=joe:secret will use 'joe' as

the user and 'secret' as the password for each

discovered slave.

～略～

–daemon=DAEMONのまま指定すると’start’, ‘stop’, ‘restart’, ‘nodetach’から選べといわれる。

1 2	mysqlfailover: error: option --daemon: invalid choice: 'DAEMON' (choose from 'start', 'stop', 'restart', 'nodetach')

4にて以下を実施してみる。

mysqlfailover \

--master=root:`cat /path_to_file`@192.168.1.133:3306 \

--candidate=root:`cat /path_to_file`@192.168.1.155:3306,root:`cat /path_to_file`@192.168.1.150:3306 \

--discover-slaves-login=root:`cat /path_to_file` \

--log=/tmp/failover.log \

--rpl-user=repl:re***** \

--rediscover \

--failover-mode=auto \

--daemon=start \

-v

# mysqlfailover \

> --master=root:`cat /path_to_file`@192.168.1.133:3306 \

> --candidate=root:`cat /path_to_file`@192.168.1.155:3306,root:`cat /path_to_file`@192.168.1.150:3306 \

> --discover-slaves-login=root:`cat /path_to_file` \

> --log=/tmp/failover.log \

> --rpl-user=repl:re***** \

> --rediscover \

> --failover-mode=auto \

> --daemon=start \

> -v

NOTE: Log file '/tmp/failover.log' does not exist. Will be created.

Starting failover daemon...

標準出力には何も状態が出ないのでログを確認する

# tail /tmp/failover.log

2014-02-17 23:34:12 PM INFO Unregistering existing instances from slaves.

2014-02-17 23:34:12 PM INFO Registering instance on master.

2014-02-17 23:34:12 PM INFO Checking privileges.

2014-02-17 23:34:12 PM INFO Checking privileges on candidates.

2014-02-17 23:34:12 PM CRITICAL User root on 192.168.1.133 does not have sufficient privileges to execute the failover command.

2014-02-17 23:34:12 PM CRITICAL User root on 192.168.1.150 does not have sufficient privileges to execute the failover command.

2014-02-17 23:34:12 PM CRITICAL User root on 192.168.1.155 does not have sufficient privileges to execute the failover command.

2014-02-17 23:34:12 PM CRITICAL User root on 192.168.1.150 does not have sufficient privileges to execute the failover command.

2014-02-17 23:34:12 PM INFO Unregistering instance on master.

指定してるrootにfailoverさす権限が無いといわれているようです。
よくわからないのでググったらでてきたORACLEのマニュアル(英語)をみてみる
MySQL Utilities（pdfのマニュアル）
3.4.3 Setup Automatic Failover (P28)くらいにかいてある。
日本語はなさそうだがhtmlは存在するようだ。(でもpdfのほうがちゃんと書いてある様子）
MySQL Utilities（htmlのマニュアル）

権限の説明のページがあった。

3.4.3.4 Permissions Required

The user must have permissions to configure replication.

ユーザーは、レプリケーションを設定する権限を持っている必要があります。

えーでもrootはALLだけどなあ。with grantoptionとかのことを言ってるのか？

以下はreplication設定するコマンド(mysqlrpladminとか)に必要な権限の解説。

3.4.2.4 Permissions Required

The m_account user needs the following privileges for the mysqlreplicate: SELECT and INSERT privileges on mysql database, REPLICATION SLAVE, REPLICATION CLIENT and GRANT OPTION. As for the slave_acc users, they need the SUPER privilege. The repl user, used as the argument for the --rpl-user option, is either created automatically or if it exists, it needs the REPLICATION SLAVE privilege.

To run the mysqlrpladmin utility with the health command, the m_account used on the master needs an extra SUPER privilege.

As for the switchover command all the users need the following privileges: SUPER, GRANT OPTION, SELECT, RELOAD, DROP, CREATE and REPLICATION SLAVE

★必要な権限の記述を翻訳してみると

m_account(マスタに接続するユーザ)は次の権限を必要とする：

SELECT and INSERT on mysql database, REPLICATION SLAVE, REPLICATION CLIENT and GRANT OPTION.

slave_acc(スレーブに接続するユーザ)は次の権限を必要とする：

SUPER privilege.

repl(レプリケーションユーザ)はREPLICATION SLAVE権限を必要とする。--rpl-userオプションを用いると、自動的に生成される(か既に生成されている)

切替コマンドを使う全てのユーザは次の権限が必要：

SUPER, GRANT OPTION, SELECT, RELOAD, DROP, CREATE and REPLICATION SLAVE

以下は他のヒント

3.4.3.5 Tips and Tricks

さっき紹介したのはコンソールモード。しかしデーモンとして実行させることもできます。

そのためには、--daemonを使う必要があり、具体的には、 '--daemon=start' を使用して起動させます。

この時mysqlfailoverはデーモンとして実行されるとコンソールには何も出力せず指定したファイルに記録します。

mysqlfailoverデーモンを停止する時は単に '--daemon=stop'を使用します。

--pidfileオプションを起動時に指定していない限り他のオプションは必要なく、指定してる場合は同じオプションが必要です。

他の便利な機能は、実行時に拡張スクリプトを指定して環境をカスタマイズ可能なことです。

--exec-fail-check デフォルトのチェックのそれぞれ事前に定義された間隔で定期的に実行するスクリプトを指定

--exec-before フェイルオーバー開始する前に実行するスクリプトを指定

--exec-after フェイルオーバープロセスの終了時に実行するスクリプトを指定

--exec-post-failover フェイルオーバー後に実行するスクリプトを指定(ヘルスレポート等)

ともかく今のアカウントの権限を確認してみる。

# pt-show-grants -u root -p`cat /path_to_file`

-- Grants dumped by pt-show-grants

-- Dumped from server Localhost via UNIX socket, MySQL 5.6.15-log at 2014-02-18 15:32:11

-- Grants for 'repl'@'192.168.%'

GRANT REPLICATION CLIENT, REPLICATION SLAVE ON *.* TO 'repl'@'192.168.%' IDENTIFIED BY PASSWORD '*43E209EED080057E35C2630AC06D32960A46D120';

-- Grants for 'root'@'127.0.0.1'

GRANT ALL PRIVILEGES ON *.* TO 'root'@'127.0.0.1' IDENTIFIED BY PASSWORD '*E8DD65E018E30F27D962FB9BFA2F4E8206DC3AF8' WITH GRANT OPTION;

-- Grants for 'root'@'192.168.%'

GRANT ALL PRIVILEGES ON *.* TO 'root'@'192.168.%' IDENTIFIED BY PASSWORD '*E8DD65E018E30F27D962FB9BFA2F4E8206DC3AF8';

-- Grants for 'root'@'::1'

GRANT ALL PRIVILEGES ON *.* TO 'root'@'::1' IDENTIFIED BY PASSWORD '*E8DD65E018E30F27D962FB9BFA2F4E8206DC3AF8' WITH GRANT OPTION;

-- Grants for 'root'@'komiya-test-mysql01'

GRANT ALL PRIVILEGES ON *.* TO 'root'@'komiya-test-mysql01' IDENTIFIED BY PASSWORD '*6A60A70C59535B75A79FDE4C7C55FDA55FC40A55' WITH GRANT OPTION;

GRANT PROXY ON ''@'' TO 'root'@'komiya-test-mysql01' WITH GRANT OPTION;

-- Grants for 'root'@'localhost'

GRANT ALL PRIVILEGES ON *.* TO 'root'@'localhost' IDENTIFIED BY PASSWORD '*E8DD65E018E30F27D962FB9BFA2F4E8206DC3AF8' WITH GRANT OPTION;

GRANT PROXY ON ''@'' TO 'root'@'localhost' WITH GRANT OPTION;

トポロジ変えるときにCHANGE MASTERしないといけない関係で、おそらく
mysqlreplicateとかmysqladmin switchoverの時に必要になる権限が要るものと思われる。

前章の翻訳してみたところ次の権限が必要だった。

m_account(マスタに接続するユーザ)は次の権限を必要とする：

SELECT and INSERT on mysql database, REPLICATION SLAVE, REPLICATION CLIENT and GRANT OPTION.

slave_acc(スレーブに接続するユーザ)は次の権限を必要とする：

SUPER privilege.

repl(レプリケーションユーザ)は次の権限を必要とする：

REPLICATION SLAVE。--rpl-userオプションを用いると自動的に生成される(または既に生成されている)

切替コマンドを使う全てのユーザは次の権限が必要：

SUPER, GRANT OPTION, SELECT, RELOAD, DROP, CREATE and REPLICATION SLAVE

見た感じマスタのgrantオプションが足らない。rootでやらないでfailoverユーザとか作ってみることにする。
rootのようなわかりやすすぎるアカウントにNW越しに実行できるgrantオプションを付けたくない。
grantオプションは今まで生きてきてローカルユーザにしか付けたことなかった。
リモートからつなげるアカウントについてたらオペミスだと認識していました。

マスタにこれを追加。でもマスタも切り替わる可能性があるので全部に追加すればいいと思われる。

1 2	GRANT all on . to failover@'192.168.%' identified by "xxxxxxxx" WITH GRANT OPTION;

厳密にやるなら以下のような感じに。

GRANT SELECT, INSERT, REPLICATION SLAVE, REPLICATION CLIENT on *.* to m_failover@'192.168.%' identified by "xxxxxxxx" WITH GRANT OPTION;

GRANT SUPER on *.* to s_failover@'192.168.%' identified by "xxxxxxxx";

GRANT REPLICATION SLAVE on *.* to repl@'192.168.%' identified by "xxxxxxxx";

起動してみます

mysqlfailover \

--master=failover:xxxxxxxx@192.168.1.133:3306 \

--candidate=failover:xxxxxxxx@192.168.1.155:3306 \

--discover-slaves-login=failover:xxxxxxxx \

--log=/tmp/failover.log \

--rpl-user=repl:re***** \

--rediscover \

--failover-mode=auto \

--daemon=start \

-v

Starting failover daemon...

Multiple instances of failover daemon found for master 192.168.1.133:3306.

If this is an error, restart the daemon with --force.

Failover mode changed to 'FAIL' for this instance.

Daemon will start in 10 seconds.

.........starting Daemon.

ログを見てみます

# tail /tmp/failover.log

2014-03-27 02:06:32 AM INFO Discovering slave at 192.168.1.150:3306

2014-03-27 02:06:32 AM INFO Discovering slave at 192.168.1.155:3306

2014-03-27 02:06:32 AM INFO Master Information

2014-03-27 02:06:32 AM INFO Binary Log File: mysql-bin.000008, Position: 191, Binlog_Do_DB: N/A, Binlog_Ignore_DB: mysql,information_schema,performance_schema

2014-03-27 02:06:32 AM INFO GTID Executed Set: e97b08ca-6798-11e3-a666-02c0661dc6e6:1-64

2014-03-27 02:06:32 AM INFO Getting health for master: 192.168.1.133:3306.

2014-03-27 02:06:32 AM INFO Health Status:

2014-03-27 02:06:32 AM INFO host: 192.168.1.133, port: 3306, role: MASTER, state: UP, gtid_mode: ON, health: OK, version: 5.6.15-log, master_log_file: mysql-bin.000008, master_log_pos: 191, IO_Thread: , SQL_Thread: , Secs_Behind: , Remaining_Delay: , IO_Error_Num: , IO_Error: , SQL_Error_Num: , SQL_Error: , Trans_Behind:

2014-03-27 02:06:32 AM INFO host: 192.168.1.150, port: 3306, role: SLAVE, state: UP, gtid_mode: ON, health: OK, version: 5.6.15-log, master_log_file: mysql-bin.000008, master_log_pos: 191, IO_Thread: Yes, SQL_Thread: Yes, Secs_Behind: 0, Remaining_Delay: No, IO_Error_Num: 0, IO_Error: , SQL_Error_Num: 0, SQL_Error: , Trans_Behind: 0

2014-03-27 02:06:32 AM INFO host: 192.168.1.155, port: 3306, role: SLAVE, state: UP, gtid_mode: ON, health: OK, version: 5.6.15-log, master_log_file: mysql-bin.000008, master_log_pos: 191, IO_Thread: Yes, SQL_Thread: Yes, Secs_Behind: 0, Remaining_Delay: No, IO_Error_Num: 0, IO_Error: , SQL_Error_Num: 0, SQL_Error: , Trans_Behind: 0

ちゃんと動いた風のログが出ていました。

プロセスを確認してみると以下のような感じです

# ps -ef|grep failover

root 1091 1 0 02:06 ? 00:00:00 /usr/bin/python /usr/bin/mysqlfailover --master=failover:xxxxxxxx@192.168.1.133:3306 --candidate=failover:xxxxxxxx@192.168.1.155:3306,failover:xxxxxxxx@192.168.1.150:3306 --discover-slaves-login=failover:xxxxxxxx --log=/tmp/failover.log --rpl-user=repl:re***** --rediscover --failover-mode=auto --daemon=start -v

ログをよくみると

1 2	2014-03-27 02:16:11 AM INFO Failover mode = fail.

とか出てるのが気になりますね。なんでだろう。
failはfailoverしないという動作になるようなので困ります。
検索してみたところ、以下のページを見つけました。
MySQL5.6-rcでmysqlfailoverを試してみた – hiroi10の日記

mysql> select * from mysql.failover_console;

+---------------+------+

| host | port |

+---------------+------+

| 192.168.1.133 | 3306 |

+---------------+------+

これを消してからmysqlfailoverを再度起動すればFailover modeがfailにならないようです。
MHAでいうところのロックファイル的な感じでしょうか。
“Failover mode = fail”のログ監視が必要そうです。
名前からしてfailでログ監視するのはアレなのでキーワードはまじめに選ばないとダメそうです。

とりあえず止めます

# mysqlfailover \

> --master=failover:xxxxxxxx@192.168.1.133:3306 \

> --candidate=failover:xxxxxxxx@192.168.1.155:3306 \

> --discover-slaves-login=failover:xxxxxxxx \

> --log=/tmp/failover.log \

> --rpl-user=repl:re***** \

> --rediscover \

> --failover-mode=auto \

> --daemon=stop \

> -v

Stopping failover daemon...

# ps -ef|grep failover

マスタにて

mysql> delete from mysql.failover_console;

Query OK, 1 row affected (0.02 sec)

mysql> select * from mysql.failover_console;

Empty set (0.00 sec)

このとき既存のreplication構成に変化はなし。

mysql> show slave hosts;

+-----------+---------------+------+-----------+--------------------------------------+

+-----------+---------------+------+-----------+--------------------------------------+

| 150 | 192.168.1.150 | 3306 | 133 | afee6fde-978f-11e3-9f2a-02883e765295 |

| 155 | 192.168.1.155 | 3306 | 133 | 896d4156-9846-11e3-a3d2-02619050bb48 |

+-----------+---------------+------+-----------+--------------------------------------+

起動します

# mysqlfailover \

> --master=failover:xxxxxxxx@192.168.1.133:3306 \

> --candidate=failover:xxxxxxxx@192.168.1.155:3306 \

> --discover-slaves-login=failover:xxxxxxxx \

> --log=/tmp/failover.log \

> --rpl-user=repl:re***** \

> --rediscover \

> --failover-mode=auto \

> --daemon=start \

> -v

Starting failover daemon...

–pidfile=つけたほうがいいかも

ログを確認してみると以下のようにめでたくautoになっていました。

# view /tmp/failover.log

2014-03-27 03:00:21 AM INFO Failover mode = auto.

ここであと残りの課題を確認してみます。
残りの確認項目：
　–daemon=restartなども確認
　切替テスト
　vipの移動他の外部スクリプトを追加してみる
　起動スクリプト作ってみる

・再起動などできるか確認

ほかにdeamonモードで選べるのは’start’, ‘stop’, ‘restart’, ‘nodetach’なので全部やってみます
ここにもマニュアルを発見
MySQLの:: MySQLのユーティリティ:: 5.9.1 mysqlfailover – 自動複製健康監視とフェイルオーバ
マニュアルを見た感じ’nodetach’はコンソール画面も表示する感じのようです。

# ps -ef|grep failover

root 1343 1 0 03:00 ? 00:00:08 /usr/bin/python /usr/bin/mysqlfailover --master=failover:xxxxxxxx@192.168.1.133:3306 --candidate=failover:xxxxxxxx@192.168.1.155:3306 --discover-slaves-login=failover:xxxxxxxx --log=/tmp/failover.log --rpl-user=repl:re***** --rediscover --failover-mode=auto --daemon=start -v

# mysqlfailover \

> --master=failover:xxxxxxxx@192.168.1.133:3306 \

> --candidate=failover:xxxxxxxx@192.168.1.155:3306 \

> --discover-slaves-login=failover:xxxxxxxx \

> --log=/tmp/failover.log \

> --rpl-user=repl:re***** \

> --rediscover \

> --failover-mode=auto \

> --daemon=restart \

> -v

Restarting failover daemon...

Multiple instances of failover daemon found for master 192.168.1.133:3306.

If this is an error, restart the daemon with --force.

Failover mode changed to 'FAIL' for this instance.

Daemon will start in 10 seconds.

.........starting Daemon.

# ps -ef|grep failover

root 1500 1 0 03:30 ? 00:00:00 /usr/bin/python /usr/bin/mysqlfailover --master=failover:xxxxxxxx@192.168.1.133:3306 --candidate=failover:xxxxxxxx@192.168.1.155:3306 --discover-slaves-login=failover:xxxxxxxx --log=/tmp/failover.log --rpl-user=repl:re***** --rediscover --failover-mode=auto --daemon=restart -v

# view /tmp/failover.log

2014-03-27 03:30:09 AM INFO Failover mode = fail.

というわけで、restartは基本つかわない方向がよさそうで、
使うなら–forceつけろという話になるかと思われます。

–forceつけてみた

# mysqlfailover --master=failover:xxxxxxxx@192.168.1.133:3306 --candidate=failover:xxxxxxxx@192.168.1.155:3306 --discover-slaves-login=failover:xxxxxxxx --log=/tmp/failover.log --rpl-user=repl:re***** --rediscover --failover-mode=auto --daemon=restart --force -v

Restarting failover daemon...

# ps -ef|grep failover

root 1523 1 0 03:33 ? 00:00:00 /usr/bin/python /usr/bin/mysqlfailover --master=failover:xxxxxxxx@192.168.1.133:3306 --candidate=failover:xxxxxxxx@192.168.1.155:3306 --discover-slaves-login=failover:xxxxxxxx --log=/tmp/failover.log --rpl-user=repl:re***** --rediscover --failover-mode=auto --daemon=restart --force -v

# view /tmp/failover.log

2014-03-27 03:33:52 AM INFO Failover mode = auto.

まあ大丈夫そうですね。psしたときに–forceでるのなんとなく負けた感じがします。

一回とめてからnodetachしてみます。

Stopping failover daemon...

# ps -ef|grep failover

マスタでコンソール消すのも忘れずに。

mysql> select * from mysql.failover_console;

+---------------+------+

| host | port |

+---------------+------+

| 192.168.1.133 | 3306 |

+---------------+------+

1 row in set (0.00 sec)

mysql> delete from mysql.failover_console;

Query OK, 1 row affected (0.00 sec)

mysql> select * from mysql.failover_console;

Empty set (0.00 sec)

nodetachで起動します

# mysqlfailover \

> --master=failover:xxxxxxxx@192.168.1.133:3306 \

> --candidate=failover:xxxxxxxx@192.168.1.155:3306 \

> --discover-slaves-login=failover:xxxxxxxx \

> --log=/tmp/failover.log \

> --rpl-user=repl:re***** \

> --rediscover \

> --failover-mode=auto \

> --daemon=nodetach \

> -v

Starting failover daemon...

# Discovering slaves for master at 192.168.1.133:3306

# Discovering slave at 192.168.1.150:3306

# Found slave: 192.168.1.150:3306

# Discovering slave at 192.168.1.155:3306

# Found slave: 192.168.1.155:3306

# Checking privileges.

# Checking privileges on candidates.

# Discovering slaves for master at 192.168.1.133:3306

# Attempting to contact 192.168.1.133 ... Success

# Attempting to contact 192.168.1.150 ... Success

# Attempting to contact 192.168.1.155 ... Success

# Discovering slaves for master at 192.168.1.133:3306

# Attempting to contact 192.168.1.133 ... Success

# Attempting to contact 192.168.1.150 ... Success

# Attempting to contact 192.168.1.155 ... Success

# Discovering slaves for master at 192.168.1.133:3306

# Attempting to contact 192.168.1.133 ... Success

# Attempting to contact 192.168.1.150 ... Success

# Attempting to contact 192.168.1.155 ... Success

という感じに延々とコンソールにログが流れるようです。
Ctl+Cで抜けるとプロセスは落ちる模様です。

ps -ef|grep failover

なにもいない

・切替テスト

とりあえずVIPとかは扱わずにマスタのmysqlプロセスを落として
replicationのマスタが切り替わるかを確認してみます。

db1:

mysql> select * from mysql.failover_console;

mysql> delete from mysql.failover_console;

mysql> select * from mysql.failover_console;

db4:

ps -ef |grep failover

tail -f /tmp/failover.log

mysqlfailover \

--master=failover:xxxxxxxx@192.168.1.133:3306 \

--candidate=failover:xxxxxxxx@192.168.1.155:3306 \

--discover-slaves-login=failover:xxxxxxxx \

--log=/tmp/failover.log \

--rpl-user=repl:re***** \

--rediscover \

--failover-mode=auto \

--daemon=start \

-v

db1:

1 2	service mysql stop

db4:

tail /tmp/failover.log

2014-03-27 03:48:41 AM INFO Failed to reconnect to the master after 3 attemps.

2014-03-27 03:48:41 AM CRITICAL Master is confirmed to be down or unreachable.

2014-03-27 03:48:41 AM INFO Failover starting in 'auto' mode...

2014-03-27 03:48:41 AM INFO Checking eligibility of slave 192.168.1.155:3306 for candidate.

2014-03-27 03:48:41 AM INFO GTID_MODE=ON ... Ok

2014-03-27 03:48:41 AM INFO Replication user exists ... Ok

2014-03-27 03:48:41 AM INFO Candidate slave 192.168.1.155:3306 will become the new master.

2014-03-27 03:48:41 AM INFO Checking slaves status (before failover).

2014-03-27 03:48:41 AM WARNING Problem detected with SQL thread for slave '192.168.1.150'@'3306' that can result on a unstable topology.

2014-03-27 03:48:41 AM WARNING - SQL thread running: No

2014-03-27 03:48:41 AM WARNING - SQL error: 1146 - Worker 0 failed executing transaction 'e97b08ca-6798-11e3-a666-02c0661dc6e6:65' at master log mysql-bin.000008, end_log_pos 485; Error executing row event: 'Table 'mysql.failover_console' doesn't exist'

2014-03-27 03:48:41 AM WARNING Problem detected with SQL thread for slave '192.168.1.155'@'3306' that can result on a unstable topology.

2014-03-27 03:48:41 AM WARNING - SQL thread running: No

2014-03-27 03:48:41 AM WARNING - SQL error: 1146 - Error executing row event: 'Table 'mysql.failover_console' doesn't exist'

2014-03-27 03:48:41 AM INFO Preparing candidate for failover.

2014-03-27 03:48:41 AM INFO Reading events in relay log for slave 192.168.1.150:3306

2014-03-27 03:48:41 AM INFO Missing transactions found on 192.168.1.150:3306. SELECT gtid_subset() = 0

2014-03-27 03:48:41 AM INFO Connecting candidate to 192.168.1.150:3306 as a temporary slave to retrieve unprocessed GTIDs.

2014-03-27 03:48:41 AM INFO Waiting for candidate to catch up to slave 192.168.1.150:3306.

2014-03-27 03:48:42 AM INFO Creating replication user if it does not exist.

2014-03-27 03:48:42 AM INFO Stopping slaves.

2014-03-27 03:48:42 AM INFO Performing STOP on all slaves.

2014-03-27 03:48:42 AM WARNING Executing stop on slave 192.168.1.150:3306 WARN - slave is not configured with this master

2014-03-27 03:48:42 AM INFO Executing stop on slave 192.168.1.150:3306 Ok

2014-03-27 03:48:42 AM WARNING Executing stop on slave 192.168.1.155:3306 WARN - slave is not configured with this master

2014-03-27 03:48:42 AM INFO Executing stop on slave 192.168.1.155:3306 Ok

2014-03-27 03:48:42 AM INFO Switching slaves to new master.

2014-03-27 03:48:42 AM INFO Disconnecting new master as slave.

2014-03-27 03:48:42 AM INFO Execute on 192.168.1.155:3306: RESET SLAVE ALL

2014-03-27 03:48:42 AM INFO Starting slaves.

2014-03-27 03:48:42 AM INFO Performing START on all slaves.

2014-03-27 03:48:42 AM INFO Executing start on slave 192.168.1.150:3306 Ok

2014-03-27 03:48:42 AM INFO Checking slaves for errors.

2014-03-27 03:48:42 AM INFO 192.168.1.150:3306 status: Ok

2014-03-27 03:48:42 AM INFO Failover complete.

2014-03-27 03:48:42 AM INFO Discovering slaves for master at 192.168.1.155:3306

2014-03-27 03:48:42 AM INFO Discovering slave at 192.168.1.150:3306

2014-03-27 03:48:42 AM INFO Found slave: 192.168.1.150:3306

2014-03-27 03:48:47 AM INFO Unregistering existing instances from slaves.

2014-03-27 03:48:47 AM INFO Registering instance on new master 192.168.1.155:3306.

2014-03-27 03:48:47 AM INFO Master Information

2014-03-27 03:48:47 AM INFO Binary Log File: mysql-bin.000004, Position: 547000, Binlog_Do_DB: N/A, Binlog_Ignore_DB: mysql,information_schema,performance_schema

2014-03-27 03:48:47 AM INFO GTID Executed Set: e97b08ca-6798-11e3-a666-02c0661dc6e6:1-64

2014-03-27 03:48:47 AM INFO Getting health for master: 192.168.1.155:3306.

2014-03-27 03:48:47 AM INFO Health Status:

2014-03-27 03:48:47 AM INFO host: 192.168.1.155, port: 3306, role: MASTER, state: UP, gtid_mode: ON, health: OK, version: 5.6.15-log, master_log_file: mysql-bin.000004, master_log_pos: 547000, IO_Thread: , SQL_Thread: , Secs_Behind: , Remaining_Delay: , IO_Error_Num: , IO_Error: , SQL_Error_Num: , SQL_Error: , Trans_Behind:

2014-03-27 03:48:47 AM INFO host: 192.168.1.150, port: 3306, role: SLAVE, state: UP, gtid_mode: ON, health: OK, version: 5.6.15-log, master_log_file: mysql-bin.000004, master_log_pos: 547000, IO_Thread: Yes, SQL_Thread: Yes, Secs_Behind: 0, Remaining_Delay: No, IO_Error_Num: 0, IO_Error: , SQL_Error_Num: 0, SQL_Error: , Trans_Behind: 0

2014-03-27 03:49:05 AM INFO Discovering slaves for master at 192.168.1.155:3306

2014-03-27 03:49:05 AM INFO Discovering slave at 192.168.1.150:3306

2014-03-27 03:49:05 AM INFO Master Information

2014-03-27 03:49:05 AM INFO Binary Log File: mysql-bin.000004, Position: 547000, Binlog_Do_DB: N/A, Binlog_Ignore_DB: mysql,information_schema,performance_schema

2014-03-27 03:49:05 AM INFO GTID Executed Set: e97b08ca-6798-11e3-a666-02c0661dc6e6:1-64

2014-03-27 03:49:05 AM INFO Getting health for master: 192.168.1.155:3306.

2014-03-27 03:49:05 AM INFO Health Status:

2014-03-27 03:49:05 AM INFO host: 192.168.1.155, port: 3306, role: MASTER, state: UP, gtid_mode: ON, health: OK, version: 5.6.15-log, master_log_file: mysql-bin.000004, master_log_pos: 547000, IO_Thread: , SQL_Thread: , Secs_Behind: , Remaining_Delay: , IO_Error_Num: , IO_Error: , SQL_Error_Num: , SQL_Error: , Trans_Behind:

2014-03-27 03:49:05 AM INFO host: 192.168.1.150, port: 3306, role: SLAVE, state: UP, gtid_mode: ON, health: OK, version: 5.6.15-log, master_log_file: mysql-bin.000004, master_log_pos: 547000, IO_Thread: Yes, SQL_Thread: Yes, Secs_Behind: 0, Remaining_Delay: No, IO_Error_Num: 0, IO_Error: , SQL_Error_Num: 0, SQL_Error: , Trans_Behind: 0

2014-03-27 03:49:23 AM INFO Discovering slaves for master at 192.168.1.155:3306

2014-03-27 03:49:23 AM INFO Discovering slave at 192.168.1.150:3306

2014-03-27 03:49:23 AM INFO Master Information

2014-03-27 03:49:23 AM INFO Binary Log File: mysql-bin.000004, Position: 547000, Binlog_Do_DB: N/A, Binlog_Ignore_DB: mysql,information_schema,performance_schema

2014-03-27 03:49:23 AM INFO GTID Executed Set: e97b08ca-6798-11e3-a666-02c0661dc6e6:1-64

2014-03-27 03:49:23 AM INFO Getting health for master: 192.168.1.155:3306.

2014-03-27 03:49:23 AM INFO Health Status:

2014-03-27 03:49:23 AM INFO host: 192.168.1.155, port: 3306, role: MASTER, state: UP, gtid_mode: ON, health: OK, version: 5.6.15-log, master_log_file: mysql-bin.000004, master_log_pos: 547000, IO_Thread: , SQL_Thread: , Secs_Behind: , Remaining_Delay: , IO_Error_Num: , IO_Error: , SQL_Error_Num: , SQL_Error: , Trans_Behind:

2014-03-27 03:49:23 AM INFO host: 192.168.1.150, port: 3306, role: SLAVE, state: UP, gtid_mode: ON, health: OK, version: 5.6.15-log, master_log_file: mysql-bin.000004, master_log_pos: 547000, IO_Thread: Yes, SQL_Thread: Yes, Secs_Behind: 0, Remaining_Delay: No, IO_Error_Num: 0, IO_Error: , SQL_Error_Num: 0, SQL_Error: , Trans_Behind: 0

2014-03-27 03:49:41 AM INFO Discovering slaves for master at 192.168.1.155:3306

2014-03-27 03:49:41 AM INFO Discovering slave at 192.168.1.150:3306

2014-03-27 03:49:41 AM INFO Master Information

2014-03-27 03:49:41 AM INFO Binary Log File: mysql-bin.000004, Position: 547000, Binlog_Do_DB: N/A, Binlog_Ignore_DB: mysql,information_schema,performance_schema

2014-03-27 03:49:41 AM INFO GTID Executed Set: e97b08ca-6798-11e3-a666-02c0661dc6e6:1-64

2014-03-27 03:49:41 AM INFO Getting health for master: 192.168.1.155:3306.

2014-03-27 03:49:42 AM INFO Health Status:

2014-03-27 03:49:42 AM INFO host: 192.168.1.155, port: 3306, role: MASTER, state: UP, gtid_mode: ON, health: OK, version: 5.6.15-log, master_log_file: mysql-bin.000004, master_log_pos: 547000, IO_Thread: , SQL_Thread: , Secs_Behind: , Remaining_Delay: , IO_Error_Num: , IO_Error: , SQL_Error_Num: , SQL_Error: , Trans_Behind:

2014-03-27 03:49:42 AM INFO host: 192.168.1.150, port: 3306, role: SLAVE, state: UP, gtid_mode: ON, health: OK, version: 5.6.15-log, master_log_file: mysql-bin.000004, master_log_pos: 547000, IO_Thread: Yes, SQL_Thread: Yes, Secs_Behind: 0, Remaining_Delay: No, IO_Error_Num: 0, IO_Error: , SQL_Error_Num: 0, SQL_Error: , Trans_Behind: 0

2014-03-27 03:50:00 AM INFO Discovering slaves for master at 192.168.1.155:3306

2014-03-27 03:50:00 AM INFO Discovering slave at 192.168.1.150:3306

2014-03-27 03:50:00 AM INFO Master Information

2014-03-27 03:50:00 AM INFO Binary Log File: mysql-bin.000004, Position: 547000, Binlog_Do_DB: N/A, Binlog_Ignore_DB: mysql,information_schema,performance_schema

2014-03-27 03:50:00 AM INFO GTID Executed Set: e97b08ca-6798-11e3-a666-02c0661dc6e6:1-64

2014-03-27 03:50:00 AM INFO Getting health for master: 192.168.1.155:3306.

2014-03-27 03:50:00 AM INFO Health Status:

2014-03-27 03:50:00 AM INFO host: 192.168.1.155, port: 3306, role: MASTER, state: UP, gtid_mode: ON, health: OK, version: 5.6.15-log, master_log_file: mysql-bin.000004, master_log_pos: 547000, IO_Thread: , SQL_Thread: , Secs_Behind: , Remaining_Delay: , IO_Error_Num: , IO_Error: , SQL_Error_Num: , SQL_Error: , Trans_Behind:

2014-03-27 03:50:00 AM INFO host: 192.168.1.150, port: 3306, role: SLAVE, state: UP, gtid_mode: ON, health: OK, version: 5.6.15-log, master_log_file: mysql-bin.000004, master_log_pos: 547000, IO_Thread: Yes, SQL_Thread: Yes, Secs_Behind: 0, Remaining_Delay: No, IO_Error_Num: 0, IO_Error: , SQL_Error_Num: 0, SQL_Error: , Trans_Behind: 0

2014-03-27 03:50:18 AM INFO Discovering slaves for master at 192.168.1.155:3306

2014-03-27 03:50:18 AM INFO Discovering slave at 192.168.1.150:3306

2014-03-27 03:50:18 AM INFO Master Information

2014-03-27 03:50:18 AM INFO Binary Log File: mysql-bin.000004, Position: 547000, Binlog_Do_DB: N/A, Binlog_Ignore_DB: mysql,information_schema,performance_schema

2014-03-27 03:50:18 AM INFO GTID Executed Set: e97b08ca-6798-11e3-a666-02c0661dc6e6:1-64

2014-03-27 03:50:18 AM INFO Getting health for master: 192.168.1.155:3306.

2014-03-27 03:50:18 AM INFO Health Status:

2014-03-27 03:50:18 AM INFO host: 192.168.1.155, port: 3306, role: MASTER, state: UP, gtid_mode: ON, health: OK, version: 5.6.15-log, master_log_file: mysql-bin.000004, master_log_pos: 547000, IO_Thread: , SQL_Thread: , Secs_Behind: , Remaining_Delay: , IO_Error_Num: , IO_Error: , SQL_Error_Num: , SQL_Error: , Trans_Behind:

2014-03-27 03:50:18 AM INFO host: 192.168.1.150, port: 3306, role: SLAVE, state: UP, gtid_mode: ON, health: OK, version: 5.6.15-log, master_log_file: mysql-bin.000004, master_log_pos: 547000, IO_Thread: Yes, SQL_Thread: Yes, Secs_Behind: 0, Remaining_Delay: No, IO_Error_Num: 0, IO_Error: , SQL_Error_Num: 0, SQL_Error: , Trans_Behind: 0

# ps -ef |grep mysql

root 1599 1 0 03:46 ? 00:00:01 /usr/bin/python /usr/bin/mysqlfailover --master=failover:xxxxxxxx@192.168.1.133:3306 --candidate=failover:xxxxxxxx@192.168.1.155:3306 --discover-slaves-login=failover:xxxxxxxx --log=/tmp/failover.log --rpl-user=repl:re***** --rediscover --failover-mode=auto --daemon=start -v

db3:

mysql> show slave status\G

*************************** 1. row ***************************

Slave_IO_State: Waiting for master to send event

Master_Host: 192.168.1.155

Master_User: repl

Master_Port: 3306

Connect_Retry: 60

Master_Log_File: mysql-bin.000004

Read_Master_Log_Pos: 547000

Relay_Log_File: mysqld-relay-bin.000002

Relay_Log_Pos: 408

Relay_Master_Log_File: mysql-bin.000004

Slave_IO_Running: Yes

Slave_SQL_Running: Yes

Replicate_Do_DB:

Replicate_Ignore_DB: mysql,information_schema,performance_schema

Replicate_Do_Table:

Replicate_Ignore_Table:

Replicate_Wild_Do_Table:

Replicate_Wild_Ignore_Table:

Last_Errno: 0

Last_Error:

Skip_Counter: 0

Exec_Master_Log_Pos: 547000

Relay_Log_Space: 613

Until_Condition: None

Until_Log_File:

Until_Log_Pos: 0

Master_SSL_Allowed: No

Master_SSL_CA_File:

Master_SSL_CA_Path:

Master_SSL_Cert:

Master_SSL_Cipher:

Master_SSL_Key:

Seconds_Behind_Master: 0

Master_SSL_Verify_Server_Cert: No

Last_IO_Errno: 0

Last_IO_Error:

Last_SQL_Errno: 0

Last_SQL_Error:

Replicate_Ignore_Server_Ids:

Master_Server_Id: 155

Master_UUID: 896d4156-9846-11e3-a3d2-02619050bb48

Master_Info_File: mysql.slave_master_info

SQL_Delay: 0

SQL_Remaining_Delay: NULL

Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it

Master_Retry_Count: 86400

Master_Bind:

Last_IO_Error_Timestamp:

Last_SQL_Error_Timestamp:

Master_SSL_Crl:

Master_SSL_Crlpath:

Retrieved_Gtid_Set:

Executed_Gtid_Set: e97b08ca-6798-11e3-a666-02c0661dc6e6:1-64

Auto_Position: 1

1 row in set (0.00 sec)

マスタが自動で切り替わってる。

db2:

mysql> show slave status\G

Empty set (0.00 sec)

mysql> show slave hosts;

+-----------+---------------+------+-----------+--------------------------------------+

+-----------+---------------+------+-----------+--------------------------------------+

| 150 | 192.168.1.150 | 3306 | 155 | afee6fde-978f-11e3-9f2a-02883e765295 |

+-----------+---------------+------+-----------+--------------------------------------+

1 row in set (0.00 sec)

mysql> show master status\G

*************************** 1. row ***************************

File: mysql-bin.000004

Position: 547000

Binlog_Do_DB:

Binlog_Ignore_DB: mysql,information_schema,performance_schema

Executed_Gtid_Set: e97b08ca-6798-11e3-a666-02c0661dc6e6:1-64

1 row in set (0.00 sec)

mysql> show global variables like 'read_only';

+---------------+-------+

| Variable_name | Value |

+---------------+-------+

| read_only | ON |

+---------------+-------+

1 row in set (0.00 sec)

マスタはちゃんと切り替わりました。
ただし新マスタのread_onlyをOFFにするのを自動でやってくれはしないようです。
helpを見る限りそういうオプションも存在しない。
外部スクリプトを用いてfailover時に処理を行う必要がある模様。

・構成を戻してみる。

データが絶対に更新されてなくてreplication組んでるホストどうしの整合性が合ってる状況という前提で
RESET MASTER;とRESET SLAVE ALL;でもう一回repはりなおす感じで戻します。

とりあえず切り替わる必要はないのでmysqlfailoverのプロセスを止めます

# mysqlfailover --master=failover:xxxxxxxx@192.168.1.155:3306 --candidate=failover:xxxxxxxx@192.168.1.150:3306 --discover-slaves-login=failover:xxxxxxxx --log=/tmp/failover.log --rpl-user=repl:re***** --rediscover --failover-mode=auto --daemon=stop -v

Stopping failover daemon...

どうも切り替わったあとに止めるときは–masterと–candidate=failoverの値を調整しないと落ちない、
と思ったら–pidfileオプション付ければstop指定するときに–masterとかつけなくても落ちるようでした。

rep復旧します。
db1：

netstat -tanp

service mysql start

mysql -u root -p

show master status\G

show slave status\G

show slave hosts;

show global variables like 'read_only';

select * from mysql.failover_console;

delete from mysql.failover_console;

select * from mysql.failover_console;

RESET MASTER;

db2:

mysql -u root -p

show master status\G

show slave status\G

stop slave;

RESET SLAVE ALL;

CHANGE MASTER TO

MASTER_HOST='192.168.1.133',

MASTER_PORT=3306,

MASTER_USER='repl',

MASTER_PASSWORD='re*****',

MASTER_AUTO_POSITION = 1;

start slave;

show slave status\G

show slave hosts;

show global variables like 'read_only';

set global read_only=1;

db3:

mysql -u root -p

show slave status\G

stop slave;

RESET SLAVE ALL;

show slave status\G

CHANGE MASTER TO

MASTER_HOST='192.168.1.133',

MASTER_PORT=3306,

MASTER_USER='repl',

MASTER_PASSWORD='re*****',

MASTER_AUTO_POSITION = 1;

start slave;

show slave status\G

show global variables like 'read_only';

set global read_only=1;

show global variables like 'read_only';

db1：

show slave hosts;

+-----------+---------------+------+-----------+--------------------------------------+

+-----------+---------------+------+-----------+--------------------------------------+

| 155 | 192.168.1.155 | 3306 | 133 | 896d4156-9846-11e3-a3d2-02619050bb48 |

| 150 | 192.168.1.150 | 3306 | 133 | afee6fde-978f-11e3-9f2a-02883e765295 |

+-----------+---------------+------+-----------+--------------------------------------+

これでreplication構成は戻りました。
MASTER_AUTO_POSITION = 1;便利ですね。
mysql.failover_console消すとスキップしないといけなくなるので、
db1でRESET MASTER;することにしました。

・VIPを付け替える外部スクリプトを検討してみる

まずスクリプトを認識させるオプションについて

--exec-fail-check デフォルトのチェックのそれぞれ事前に定義された間隔で定期的に実行するスクリプトを指定

--exec-before フェイルオーバー開始する前に実行するスクリプトを指定

--exec-after フェイルオーバープロセスの終了時に実行するスクリプトを指定

--exec-post-failover フェイルオーバー後に実行するスクリプトを指定(ヘルスレポート等)

以下4つくらい作るといいのかもしれない

１．F/Oチェックを行ってマスタのmysqlを落としに行く（mon的な役割かな。いらないかも？）
　　やるとしたらmysqlへの接続は本体が確認してるので、VIPにpingがとおらないときに落とすような感じがよさそうである
２．F/O開始前に旧マスタのVIPをはがす、
３．F/Oプロセス終了時に新マスタにVIPを付けてread_onlyをOFFにする
４．F/O完了してステータスが確認できるようになった後にレポート（zabbix監視で状況把握できれば敢えて要らないかも）

少なくとも２と３は要りそうです。
あと、スレーブの負荷分散してたらその重みを変更するようなスクリプトも環境によっては要りそうです。
検知後に手作業だと新マスタの負荷がツラいタイミングが生じて二次障害おきそうなら必要。

・起動スクリプト作ってみる

色々指定するのが面倒なのであったほうがよさそう、ということで作ってみる
この前つくったredisのやつを流用すればよさそう

# cat /etc/init.d/mysqlfailover

------------------------------------------------------

#!/bin/sh

# Simple mysqlfailover init.d script conceived to work on Linux systems

# as it does use of the /proc filesystem.

# chkconfig: - 85 15

# description: mysqlfailover

# processname: mysqlfailover

. /etc/rc.d/init.d/functions

EXEC=/usr/bin/mysqlfailover

prog=$(basename $EXEC)

PIDFILE=/var/run/mysqld/failover.pid

LOGFILE=/tmp/failover.log

PORT=3306

fouser=failover

fopass=xxxxxxxx

rpluser=repl

rplpass=re*****

old_master=192.168.1.133

new_master=192.168.1.155

intervalsec=15

exec_failchk=/usr/local/bin/failchk.sh

exec_before=/usr/local/bin/before_failover.sh

exec_after=/usr/local/bin/after_failover.sh

exec_postfail=/usr/local/bin/post_failover.sh

start() {

if [ -f $PIDFILE ]

then

echo "$PIDFILE exists, process is already running or crashed"

else

$EXEC --master=${fouser}:${fopass}@${old_master}:${PORT} \

--candidate=${fouser}:${fopass}@${new_master}:${PORT} \

--discover-slaves-login=${fouser}:${fopass} \

--log=${LOGFILE} --pidfile=${PIDFILE} -i ${intervalsec}\

--rpl-user=${rpluser}:${rplpass} --rediscover \

--failover-mode=auto --daemon=start -vv --force

##--exec-after=${exec_after} --exec-before=${exec_before} \

##--exec-post-failover=${exec_postfail} \

##--exec-fail-check=${exec_failchk} \

}

stop() {

if [ ! -f $PIDFILE ]

then

echo "$PIDFILE does not exist, process is not running"

else

PID=$(cat $PIDFILE)

$EXEC --log=${LOGFILE} --pidfile=${PIDFILE} \

--daemon=stop -vv

while [ -x /proc/${PID} ]

echo "Waiting for mysqlfailover to shutdown ..."

sleep 1

done

echo "mysqlfailover stopped"

}

rh_status() {

status $prog

}

case "$1" in

start)

start

;;

stop)

stop

;;

restart)

stop

start

;;

status)

rh_status

;;

echo "Please use start or stop as first argument"

;;

esac

------------------------------------------------------

# chmod +x mysqlfailover

# chkconfig --add mysqlfailover

一応addしてみたけどreplication構成がまともでないと起動しなさそうで
どうせ手動になると思われるため自動起動はoffにする方向にする

# /etc/init.d/mysqlfailover start

Starting failover daemon...

# ps -ef|grep fail

root 1095 1 1 00:43 ? 00:00:00 /usr/bin/python /usr/bin/mysqlfailover --master=failover:xxxxxxxx@192.168.1.133:3306 --candidate=failover:xxxxxxxx@192.168.1.155:3306 --discover-slaves-login=failover:xxxxxxxx --log=/tmp/failover.log --pidfile=/var/run/mysqld/failover.pid -i 15 --rpl-user=repl:re***** --rediscover --failover-mode=auto --daemon=start -vv

# /etc/init.d/mysqlfailover status

mysqlfailover (pid 1095) is running...

# /etc/init.d/mysqlfailover stop

Stopping failover daemon...

mysqlfailover stopped

# ps -ef|grep fail

普通に起動・停止・ステータスの取得に成功しました。

mysql.failover_console消してreplication構成しなおす手間が微妙なので
やっぱりstart時に–forceつけたほうがいいかも。
消さないとFailover modeがfailになってしまう。

# /etc/init.d/mysqlfailover start

Starting failover daemon...

Multiple instances of failover daemon found for master 192.168.1.133:3306.

If this is an error, restart the daemon with --force.

Failover mode changed to 'FAIL' for this instance.

Daemon will start in 10 seconds.

.........starting Daemon.

–forceつけたらFailover modeがautoで上手く動いた模様

別のreplication構成ごとに複数プロセス起動できるのか確認が要るっちゃ要るかもしれないが、
今回は時間に余裕が無いのでまた別の機会にテストする方向で。

・vipとread_onlyをなんとかするスクリプトを作る

起動スクリプトに以下を定義してみた。

exec_failchk=/usr/local/bin/failchk.sh

exec_before=/usr/local/bin/before_failover.sh

exec_after=/usr/local/bin/after_failover.sh

exec_postfail=/usr/local/bin/post_failover.sh

最低作るべきなのは以下。
２．F/O開始前に旧マスタのVIPをはがして、旧マスタのmysqlを落とす
３．F/Oプロセス終了時に新マスタにVIPを付けてread_onlyをOFFにする
シェルスクリプトじゃなくてもいいみたいですね。

aws上でテストしてるのでvip付け直すのにapiと通信しないといけなかったりする

手動でVIPつけなおすときのコマンド：

ip addr del 10.35.31.202/23 brd 10.35.31.255 dev eth0

ip addr add 10.35.31.202/23 brd 10.35.31.255 dev eth0

MHAでVIP切り替えてる外部スクリプトが呼び出してるsytemコマンド参考部分：

sub start_vip() {

`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;

}

# A simple system call that disable the VIP on the old_master

sub stop_vip() {

`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;

system("ssh $ssh_user\@$orig_master_host \" $ssh_stop_mysqld \"");

}

my $ssh_start_vip = "sudo /sbin/ip addr add $vip brd $brd dev $nic";

my $ssh_stop_vip = "sudo /sbin/ip addr del $vip brd $brd dev $nic";

my $ssh_stop_mysqld = "sudo /sbin/service mysql stop";

VIP付け替える処理のためにssh鍵認証できるようにしとかないとダメかな。
sudo権限が要るかな。

とりあえず管理サーバである04から各dbへのssh鍵認証（ここではroot）をとおす
db4：

# ssh-copy-id -i ~/.ssh/id_rsa komiya-test-mysql01

# ssh-copy-id -i ~/.ssh/id_rsa komiya-test-mysql02

# ssh-copy-id -i ~/.ssh/id_rsa komiya-test-mysql03

# visudo

-----

#Defaults requiretty

-----

aws上でprivate-ip切り替えてるHAに組み込んでるスクリプトを参考につくってみます。
実行するホストがマネージャーではないのでそこの調整が必要と思われ。
新旧マスタのENIをあらかじめ定義する形にする必要があるかも。

とりあえず旧マスタにコマンドでvip付けて確認

aws ec2 assign-private-ip-addresses \

--network-interface-id eni-27a2a945 \

--private-ip-addresses 192.168.1.222 --allow-reassignment

ip addr add 192.168.1.222/24 brd 192.168.1.255 dev eth0

別のサーバからpingして届く確認（aws管理上もvipついてないとpingとどかない）

1 2	ping 192.168.1.222

vi /usr/local/bin/before_failover.sh

----------------------------------------

#!/bin/bash

# before_failover.sh : F/O開始前に旧マスタのVIPをはがして、旧マスタのmysqlを落とす

# 依存関係：mysqlfailover,after_failover.sh

# 更新履歴：20140331 - create komiyay

export PATH=$PATH:/usr/local/bin

export AWS_CONFIG_FILE=/root/.ec2/aws.config

datetime=`date +%Y%m%d_%H%M%S`

mailto=＜宛先メールアドレス＞

oldmaster=192.168.1.133

newmaster=192.168.1.155

vip=192.168.1.222

subnetmask=24

brd=192.168.1.255

nic=eth0

ssh_user=root

oldmaster_eni=`aws ec2 describe-network-interfaces --filters Name=addresses.private-ip-address,Values=${oldmaster} --query 'NetworkInterfaces[].[NetworkInterfaceId]' --output text`

ssh_stop_vip="sudo /sbin/ip addr del ${vip}/${subnetmask} brd ${brd} dev ${nic}"

ssh_stop_mysqld="sudo /sbin/service mysql stop"

stop_vip() {

echo "Disabling the VIP on old master"

ssh ${ssh_user}@${oldmaster} "${ssh_stop_vip}"

}

stop_mysql() {

echo "Stop mysql on old master"

ssh ${ssh_user}@${oldmaster} "${ssh_stop_mysqld}"

}

aws_pip_unassign() {

echo "Disabling the aws's virtual private ip addres"

aws ec2 unassign-private-ip-addresses \

--network-interface-id ${oldmaster_eni} \

--private-ip-addresses ${vip}|tee /tmp/res.txt

}

## main

#さきにsshの接続性を確認してダメならawsの処理だけする

ssh ${ssh_user}@${oldmaster} ls /etc/hosts

result_ssh=`echo $?`

if [ $result_ssh -eq 0 ];then

aws_pip_unassign

grep true /tmp/res.txt

res_unassign=`echo $?`

if [ ${res_unassign} -ne 0 ];then

printf "Error: aws privateip unassign fail.\nfailover NG." \

|mail -s "mysqlfailover-err_${datetime}" ${mailto}

exit 1

stop_vip

result_vip=`echo $?`

if [ ${result_vip} -ne 0 ];then

printf "Error: stop vip is fail.\nfailover NG." \

|mail -s "mysqlfailover-err_${datetime}" ${mailto}

exit 1

stop_mysql

result_mysql=`echo $?`

if [ ${result_mysql} -ne 0 ];then

printf "Error: stop mysql is fail.\nfailover NG." \

|mail -s "mysqlfailover-err_${datetime}" ${mailto}

exit 1

else

aws_pip_unassign

grep true /tmp/res.txt

res_unassign=`echo $?`

if [ ${res_unassign} -ne 0 ];then

printf "Error: aws privateip unassign fail.\nfailover NG." \

|mail -s "mysqlfailover-err_${datetime}" ${mailto}

exit 1

printf "Warning: old master is down.\nssh NG." \

|mail -s "mysqlfailover-info_${datetime}" ${mailto}

exit 0

----------------------------------------

chmod +x /usr/local/bin/before_failover.sh

vi /usr/local/bin/after_failover.sh

----------------------------------------

#!/bin/bash

# after_failover.sh : F/Oプロセス終了時に新マスタにVIPを付けてread_onlyをOFFにする

# 依存関係：mysqlfailover,before_failover.sh

# 更新履歴：20140331 - create komiyay

export PATH=$PATH:/usr/local/bin

export AWS_CONFIG_FILE=/root/.ec2/aws.config

datetime=`date +%Y%m%d_%H%M%S`

mailto=＜宛先メールアドレス＞

oldmaster=192.168.1.133

newmaster=192.168.1.155

vip=192.168.1.222

subnetmask=24

brd=192.168.1.255

nic=eth0

ssh_user=root

mysql_user=root

mysql_pass=`/path_to_file`

newmaster_eni=`aws ec2 describe-network-interfaces --filters Name=addresses.private-ip-address,Values=${newmaster} --query 'NetworkInterfaces[].[NetworkInterfaceId]' --output text`

ssh_start_vip="sudo /sbin/ip addr add ${vip}/${subnetmask} brd ${brd} dev ${nic}"

start_vip() {

ssh ${ssh_user}@${newmaster} "${ssh_start_vip}"

}

set_readonly() {

mysql -u ${mysql_user} -p${mysql_pass} -h ${newmaster} -e 'set global read_only=0;'

}

aws_pip_assign() {

echo "enabling the aws's virtual private ip addres"

aws ec2 assign-private-ip-addresses \

--network-interface-id ${newmaster_eni} \

--private-ip-addresses ${vip} --allow-reassignment|tee /tmp/res.txt

}

## main

set_readonly

result_ro=`echo $?`

if [ ${result_ro} -ne 0 ];then

printf "Error: set read only off is fail.\nfailover NG." \

|mail -s "mysqlfailover-err_${datetime}" ${mailto}

exit 1

aws_pip_assign

grep true /tmp/res.txt

result_pip=`echo $?`

if [ ${result_pip} -ne 0 ];then

printf "Error: aws privateip assign fail.\nfailover NG." \

|mail -s "mysqlfailover-err_${datetime}" ${mailto}

exit 1

start_vip

result_vip=`echo $?`

if [ ${result_vip} -ne 0 ];then

printf "Error: start vip is fail.\nfailover NG." \

|mail -s "mysqlfailover-err_${datetime}" ${mailto}

exit 1

exit 0

----------------------------------------

chmod +x /usr/local/bin/after_failover.sh

ここで単体テストする

1 2	bash -x /usr/local/bin/before_failover.sh

主に旧マスタを確認する（ipはずれたかどうかとmysqlが落ちてるか）

1 2	bash -x /usr/local/bin/after_failover.sh

主に新マスタを確認する（ipついてるかとread_onlyがoffになってるか）

起動スクリプトを修正する

# cp -p /etc/init.d/mysqlfailover{,.`date %Y%m%d`}

# vi /etc/init.d/mysqlfailover

# diff /etc/init.d/mysqlfailover{,.`date %Y%m%d`}

date: invalid date `%Y%m%d'

39,40c39,40

< --failover-mode=auto --daemon=start -vv --force \

< --exec-before=${exec_before} --exec-after=${exec_after}

---

> --failover-mode=auto --daemon=start -vv --force

> ##--exec-after=${exec_after} --exec-before=${exec_before} \

# service mysqlfailover start

Starting failover daemon...

# ps -ef|grep fail

root 1229 1 2 14:56 ? 00:00:00 /usr/bin/python /usr/bin/mysqlfailover --master=failover:xxxxxxxx@192.168.1.133:3306 --candidate=failover:xxxxxxxx@192.168.1.155:3306 --discover-slaves-login=failover:xxxxxxxx --log=/tmp/failover.log --pidfile=/var/run/mysqld/failover.pid -i 15 --rpl-user=repl:re***** --rediscover --failover-mode=auto --daemon=start -vv --force --exec-before=/usr/local/bin/before_failover.sh --exec-after=/usr/local/bin/after_failover.sh

# view /tmp/failover.log

ここでdb1のmysqlを落としてみる

事前に確認しておく

db1,2:

ip addr show

mysql> show slave hosts;

db2,3:

show slave status\G

show global vairables like 'read_only';

db3:

1 2	ping 192.168.1.222

db4:

aws ec2 describe-network-interfaces \

--filters Name=addresses.private-ip-address,Values=192.168.1.222 \

--query 'NetworkInterfaces[].[NetworkInterfaceId]' --output text

# aws ec2 describe-network-interfaces \

--filters Name=addresses.private-ip-address,Values=192.168.1.133 \

--query 'NetworkInterfaces[].[NetworkInterfaceId]' --output text

eni-27a2a945

# aws ec2 describe-network-interfaces \

--filters Name=addresses.private-ip-address,Values=192.168.1.155 \

--query 'NetworkInterfaces[].[NetworkInterfaceId]' --output text

eni-6cb3b90e

# tail -f /tmp/failover.log

マスタを落としてみる
db1:

1 2	service mysql stop

再度事前に確認したIP他を確認する

db2:
db2にvipがついてて

# ip addr show eth0|grep sec

inet 192.168.1.222/24 brd 192.168.1.255 scope global secondary eth0

aws的なprivate-ipもdb2のnicに移動してて

# aws ec2 describe-network-interfaces \

> --filters Name=addresses.private-ip-address,Values=192.168.1.222 \

> --query 'NetworkInterfaces[].[NetworkInterfaceId]' --output text

eni-6cb3b90e

db2のread_onlyがoffられてて

mysql> show global variables like 'read_only';

+---------------+-------+

| Variable_name | Value |

+---------------+-------+

| read_only | OFF |

+---------------+-------+

db3がdb2をマスタとしてみてて

mysql> show slave hosts;

+-----------+---------------+------+-----------+--------------------------------------+

+-----------+---------------+------+-----------+--------------------------------------+

| 150 | 192.168.1.150 | 3306 | 155 | afee6fde-978f-11e3-9f2a-02883e765295 |

+-----------+---------------+------+-----------+--------------------------------------+

db2のスレーブ情報はリセットされている

mysql> show slave status\G

Empty set (0.00 sec)

念の為db1で同じコマンドでvipがついてないことを確認。

1 2	# ip addr show eth0\|grep sec

というわけで想定動作が行われたので検証完了とします。
OSも落としてみようとかレポートスクリプト等は都合により省略します。

・構成を戻す

db4:

aws ec2 unassign-private-ip-addresses \

--network-interface-id eni-6cb3b90e \

--private-ip-addresses 192.168.1.222|tee /tmp/res.txt

aws ec2 assign-private-ip-addresses \

--network-interface-id eni-27a2a945 \

--private-ip-addresses 192.168.1.222 --allow-reassignment

aws ec2 describe-network-interfaces \

--filters Name=addresses.private-ip-address,Values=192.168.1.222 \

--query 'NetworkInterfaces[].[NetworkInterfaceId]' --output text

db2:

ip addr del 192.168.1.222/24 brd 192.168.1.255 dev eth0

ip addr show

db1：

ip addr add 192.168.1.222/24 brd 192.168.1.255 dev eth0

ip addr show

※repは前述してるが再掲
db1：

netstat -tanp

service mysql start

mysql -u root -p

show slave hosts;

show global variables like 'read_only';

select * from mysql.failover_console;

delete from mysql.failover_console;

select * from mysql.failover_console;

RESET MASTER;

db2:

mysql -u root -p

show master status\G

show slave status\G

stop slave;

RESET SLAVE ALL;

CHANGE MASTER TO

MASTER_HOST='192.168.1.133',

MASTER_PORT=3306,

MASTER_USER='repl',

MASTER_PASSWORD='re*****',

MASTER_AUTO_POSITION = 1;

start slave;

show slave status\G

show slave hosts;

show global variables like 'read_only';

set global read_only=1;

db3:

mysql -u root -p

show slave status\G

stop slave;

RESET SLAVE ALL;

show slave status\G

CHANGE MASTER TO

MASTER_HOST='192.168.1.133',

MASTER_PORT=3306,

MASTER_USER='repl',

MASTER_PASSWORD='re*****',

MASTER_AUTO_POSITION = 1;

start slave;

show slave status\G

show global variables like 'read_only';

set global read_only=1;

show global variables like 'read_only';

service mysqlfailover start

ps -ef|grep fail

ちなみに、切り替わりの速度はMHAと同じくらいでスパッと切り替わっていました。
ただマスタが落ちたかもしれないことを検知してから3回interbalを待機してるログが流れてたのでデフォルトだと45秒くらい待ちがありそう。
MHAよりよいのはスレーブのリレーログを一定期間残しておくようにする必要がないこととデーモンモードで動かせる点などでしょうか。
5.6かつGTIDがONでないと利用できないという制限はありますが、5.6でGTIDオンなら今のところHA化の選択肢がmysqlfailoverのみかも。
と思ったらGTID対応してるMHAの0.56が出てるみたいですね（2014/4）！

なんかすごいタイミングですね。

ログの時刻が夜中なのはなおしてなくてEDTなせいです。
以上、長々ご覧いただきありがとうございました。

failover HA MySQLUtility

Colorkrew Blog

シゴトをたのしくするカラクリを、もっと。Colorkrewオフィシャルブログ

mysqlfailoverをデーモンになってから試してみた

komi

ps -ef|grep failover

おすすめ記事

About Us

mysqlfailoverをデーモンになってから試してみた

komi

ps -ef|grep failover

おすすめ記事

Chef-Soloから卒業、chefのlocalmodeをつかってみた

Chef-Soloでレシピを書く前の環境について(Vagrantfile,role,node,data_bags)

opsだけどgitを使ってみた～その１