硬件:2台hpe800,一条心跳线用于串口信息的检测,一条交叉线用于UDP协议的通讯。双网卡,一块用于连接交叉线,一块用于连接交换机。
软件:REDHAT 7.3(MANDRAKE 8.2测试也通过),heartbeat-0.4.9.1-1.i386.rpm
其它准备:HPE800(1),主机名:CLUSTER-101-SERVER ,IP地址:192.9.100.101
HPE800(2),主机名:CLUSTER-101-SERVER ,IP地址:192.9.100.101
虚拟主机名:CLUSTER-SERVER ,IP地址:192.9.100.100
从http://www.linux-ha.org 网站下载最新的Heartbeat 软件包,目前的版本为heartbeat-0.4.9.1,分别解压:rpm –ivh heartbeat-0.4.9.1-1.i386.rpm。
首先配置第一台hpe800 ,解压后得到目录/etc/ha.d, 主要配置三个文件/etc/ha.d/ha.cf、/etc/ha.d/haresources、/etc/ha.d/authkeys
我主要配置的http 与 smb的HA集群,三个文件的主要配置如下:
/etc/ha.d/ha.cf
# If any of debugfile, logfile and logfacility are defined then they
# will be used. If debugfile and/or logfile are not defined and
# logfacility is defined then the respective logging and debug
# messages will be loged to syslog. If logfacility is not defined
# then debugfile and logfile will be used to log messges. If
# logfacility is not defined and debugfile and/or logfile are not
# defined then defaults will be used for debugfile and logfile as
# required and messages will be sent there.
#
# File to wirte debug messages to
debugfile /var/log/ha-debug
#
#
# File to write other messages to
#
logfile /var/log/ha-log
#
#
# Facility to use for syslog()/logger
#
logfacility local0
#
#
# keepalive: how many seconds between heartbeats
#
keepalive 2
#
# deadtime: seconds-to-declare-host-dead
#
deadtime 10
#
#
# Very first dead time (initdead)
#
# On some machines/OSes, etc. the network takes a while to come up
# and start working right after you’ve been rebooted. As a result
# we have a separate dead time for when things first come up.
# It should be at least twice the normal dead time.
#
initdead 120
#
# hopfudge maximum hop count minus number of nodes in config
#hopfudge 1
#
# serial serialportname …
serial /dev/ttyS0
#
#
# Baud rate for serial ports…
#
baud 19200
#
# What UDP port to use for communication?
#
udpport 694
#
# What interfaces to heartbeat over?
#
udp eth1
#
# Set up a multicast heartbeat medium
# mcast [dev] [mcast group] [port] [ttl] [loop]
#
# [dev] device to send/rcv heartbeats on
# [mcast group] multicast group to join (class D multicast address
# 224.0.0.0 – 239.255.255.255)
# [port] udp port to sendto/rcvfrom (no real reason to differ
# from the port used for broadcast heartbeats)
# [ttl] the ttl value for outbound heartbeats. this effects
# how far the multicast packet will propagate. (0-255)
# [loop] toggles loopback for outbound multicast heartbeats.
# if enabled, an outbound packet will be looped back and
# received by the interface it was sent on. (0 or 1)
#
#
mcast eth1 225.0.0.1 694 1 1
#
# Watchdog is the watchdog timer. If our own heart doesn’t beat for
# a minute, then our machine will reboot.
#
watchdog /dev/watchdog
#
# "Legacy" STONITH support
# Using this directive assumes that there is one stonith
# device in the cluster. Parameters to this device are
# read from a configuration file. The format of this line is:
#
# stonith <stonith_type> <configfile>
#333
# NOTE: it is up to you to maintain this file on each node in the
# cluster!
#
#stonith baytech /etc/ha.d/conf/stonith.baytech
#
# STONITH support
# You can configure multiple stonith devices using this directive.
# The format of the line is:
# stonith_host <hostfrom> <stonith_type> <params…>
# <hostfrom> is the machine the stonith device is attached
# to or * to mean it is accessible from any host.
# <stonith_type> is the type of stonith device (a list of
# supported drives is in /usr/lib/stonith.)
# <params…> are driver specific parameters. To see the
# format for a particular device, run:
# stonith -l -t <stonith_type>
#
#
# Note that if you put your stonith device access information in
# here, and you make this file publically readable, you’re asking
# for a denial of service attack ;-)
#
#
#stonith_host * baytech 10.0.0.3 mylogin mysecretpassword
#stonith_host ken3 rps10 /dev/ttyS1 kathy 0
#stonith_host kathy rps10 /dev/ttyS1 ken3 0
#
# Tell what machines are in the cluster
# node nodename … — must match uname -n
node cluster-101-server
node cluster-102-server

/etc/ha.d/haresources
#
#just.linux-ha.org 135.9.216.110
#
#——————————————————————-
#
# Assuming the adminstrative addresses are on the same subnet…
# A little more complex case: One service address, default subnet
# and netmask, and you want to start and stop http when you get
# the IP address…
#
#just.linux-ha.org 135.9.216.110 http
#——————————————————————-
#
# A little more complex case: Three service addresses, default subnet
# and netmask, and you want to start and stop http when you get
# the IP address…
#
#just.linux-ha.org 135.9.216.110 135.9.215.111 135.9.216.112 httpd
#——————————————————————-
#
# One service address, with funny subnet and bcast addr
# Stop and start httpd service with the subnet address
#
#just.linux-ha.org 135.9.216.3/4/135.9.216.12 httpd
#
#——————————————————————-
#
# An example where a shared filesystem is to be used.
# Note that multiple aguments are passed to this script using
# the delimiter ‘::’ to separate each argument.
#
#node1 10.0.0.170 Filesystem::/dev/sda1::/data1::ext2

cluster-101-server 192.9.100.100 httpd smb

/etc/ha.d/authkeys
# Authentication file. Must be mode 600
#
#
# Must have exactly one auth directive at the front.
# auth send authentication using this method-id
#
# Then, list the method and key that go with that method-id
#
# Available methods: crc sha1, md5. Crc doesn’t need/want a key.
#
# You normally only have one authentication method-id listed in this file
#
# Put more than one to make a smooth transition when changing auth
# methods and/or keys.
#
#
# sha1 is believed to be the "best", md5 next best.
#
# crc adds no security, except from packet corruption.
# Use only on physically secure networks.
#
auth 1
1 crc
#2 sha1 HI!
#3 md5 Hello!

很重要的一点,一定要去确保两台机器的配置文件一样,包括smb.conf等需要集群的配置文件,如果有共享存储的话还要注意很多问题,具体我没有测试过,也没有相关的硬件设备。
现在可以开始测试,首先关闭两台机器需要集群的服务,因为heartbeat 启动时会自动服务打开(测试的时候会有几秒钟的滞后)。
/etc/rc.d/init.d/httpd stop
/etc/rc.d/init.d/smb stop
/etc/rc.d/init.d/heartbeat start

ok, 配置已经完成,服务也应该起来,如果没有的话,注意检查/var/log/messages里面的信息。
可以开始测试了:
为了清楚,把web服务器的主文件,/var/www/html/index.html 修改成可以区分两台机器的页面,例如可以把内容改为:cluster-101-server 与cluster-102-server
在别的机器里输入:http://192.9.100.100 (虚拟的地址)
可以看到cluster-101-server ,想办法让cluster-101-server死机,大概3-5秒,可以看到页面变成cluster-102-server,服务成功的转换了,等cluster-101-server服务起来后,页面又切换到
cluster-101-server,几乎没有延时。这样就提高了系统的高可用性。

忘记了一点:一定要修改文件authkeys的属性,否则服务起不来。
chmod 600 /etc/ha.d/authkeys