Dokumen.tips edb postgres-failover-manager-guide-get-failover-manager-requires-that-postgresql

EDB Postgres Failover Manager Guide
EDB Postgres Failover Manager Version 3.4
January 23, 2019

Copy right © 2013 - 2019 EnterpriseDB Corporation. All rights reserv ed.
EDB Postgres Failover Manager Guide, Version 3.4
by EnterpriseDB Corporation
Copyright © 2013 -2019 EnterpriseDBCorporation. All rights reserved.
EnterpriseDB Corporation, 34 Crosby Drive Suite 201, Bedford, MA 01730, USA
T +1 781 357 3390 F +1 978 467 1307 E info@enterprisedb.com www.enterprisedb.com

Table of Contents
1 Introduction......................................................................................................................5
1.1 What’s New..............................................................................................................6
1.2 Typographical Conventions Used in this Guide....................................................7
2 Failover Manager - Overview.........................................................................................8
2.1 Supported Platforms..............................................................................................10
2.2 Prerequisites...........................................................................................................11
2.3 Tutorial - Configuring a Simple Failover Manager Cluster................................14
3 Installing and Configuring Failover Manager..............................................................18
3.1 Installing an RPM Package on a RedHat, CentOS,or OEL Host ......................18
3.1.1 Installation Locations........................................................................................20
3.2 Installing an RPM Package on a Debian or Ubuntu Host...................................21
3.3 Installing an RPM Package on a SLES Host.......................................................22
3.4 Extending FailoverManager Permissions............................................................23
3.4.1 Running FailoverManager without sudo.........................................................25
3.5 Configuring FailoverManager..............................................................................27
3.5.1 The Cluster Properties File ...............................................................................27
3.5.1.1 Specifying Cluster Properties...................................................................28
3.5.1.2 Encrypting Your Database Password.......................................................45
3.5.2 The Cluster Members File.................................................................................47
3.6 Using Failover Manager with Virtual IP Addresses............................................48
4 Using Failover Manager...............................................................................................52
4.1 Managing a FailoverManager Cluster.................................................................52
4.1.1 Starting the FailoverManager Cluster.............................................................53
4.1.2 Adding Nodes to a Cluster................................................................................53
4.1.3 Changing the Priority ofa Standby..................................................................54
4.1.4 Promoting a FailoverManager Node...............................................................55
4.1.5 Stopping a Failover ManagerAgent................................................................56
4.1.6 Stopping a Failover Manager Cluster...............................................................57
4.1.7 Removing a Node froma Cluster.....................................................................57
4.2 Monitoring a Failover Manager Cluster...............................................................58
4.2.1 Reviewing the Cluster Status Report...............................................................58
4.2.2 Monitoring Streaming Replication with Postgres Enterprise Manager.........61

4.3 Running Multiple Agents on a Single Node........................................................64
4.3.1 RHEL 6.xor CentOS 6.x..................................................................................66
4.3.2 RHEL 7.xor CentOS 7.x..................................................................................67
5 Controlling the FailoverManager Service...................................................................68
5.1 Using the service Utility on RHEL 6.x and CentOS 6.x.....................................68
5.2 Using the systemctl Utility on RHEL 7.xand CentOS 7.x.................................70
5.3 Using the efmUtility.............................................................................................71
6 Controlling Logging.......................................................................................................75
6.1 Enabling syslog Log File Entries..........................................................................76
7 Notifications..................................................................................................................78
8 Supported Failover and Failure Scenarios....................................................................85
8.1 Master Database is Down......................................................................................86
8.2 Standby Database is Down....................................................................................88
8.3 MasterAgent Exits or Node Fails.........................................................................89
8.4 Standby Agent Exits or Node Fails ......................................................................91
8.5 Dedicated Witness Agent Exits / Node Fails.......................................................92
8.6 Nodes Become Isolated fromthe Cluster.............................................................93
9 Upgrading an Existing Cluster......................................................................................94
9.1 Un-installing FailoverManager............................................................................96
9.2 Performing a Database Update (Minor Version).................................................97
10 Troubleshooting.............................................................................................................98
11 AppendixA - Configuring Streaming Replication......................................................99
11.1 Limited Support for Cascading Replication.......................................................104
12 AppendixB - Configuring SSL Authentication on a FailoverManager Cluster.....105
13 Inquiries ........................................................................................................................107

Copy right © 2013 – 2019 EnterpriseDB Corporation. All rights reserv ed.
5
1 Introduction
EDB Postgres FailoverManager(EFM)is a high-availability module fromEnterpriseDB
that enablesa Postgres Masternode to automatically failoverto a Standbynode in the
event ofa software orhardware failure on theMaster.
This guide providesinformationaboutinstalling,configuringand using Failover
Manager3.4.
This document usesPostgresto mean eitherthe PostgreSQLor EDB PostgresAdvanced
Serverdatabase. Formore information aboutusing EDBPostgres products,please visit
the EnterpriseDBwebsiteat:
http://www.enterprisedb.com/documentation

6
1.1 What’s New
The following changes havebeenmade to EDBPostgres FailoverManagerto create
version 3.4:
 FailoverManagernowallows you touse the master.shutdown.as.failure property
to indicate thatany shutdownofthe agenton themasternode should betreated as
a failure. Fore more information,see Section 3.5.1. A notification hasbeen
added that willalert you when the master.shutdown.as.failure propertyis
set to true.
 Agent exit notificationsare nowa WARNING level; this canhelp drawattentionto
caseswhere an agenthasfailed to restart (forinstance,aftera machine reboot).
Formore information,see Section7.
 FailoverManagerwill retry verifying thata VIP is not in use duringa promotion.
Formore information,see Section3.6.

7
1.2 Typographical Conventions Used in this Guide
Certain typographicalconventionsare used in this manualto clarify the meaning and
usage ofvariouscommands,statements,programs,examples,etc.Thissection providesa
summary ofthese conventions.
In the following descriptionsa termrefers to anyword orgroupofwordsthat are
languagekeywords,user-suppliedvalues,literals,etc.A term’s exact meaning depends
upon thecontext in which it is used.
 Italic font introducesa newterm,typically,in the sentencethatdefinesit forthe
first time.
 Fixed-width (mono-spaced) font is used forterms thatmust be given
literally such as SQLcommands,specific table andcolumn namesused in the
examples,programming language keywords,etc.Forexample, SELECT * FROM
emp;
 Italic fixed-width font is usedforterms forwhich the usermust
substitute valuesin actualusage.Forexample, DELETE FROM table_name;
 A verticalpipe | denotesa choice betweenthe terms oneitherside ofthe pipe.A
verticalpipe is used to separate twoormore alternative termswithin square
brackets (optionalchoices)orbraces(one mandatorychoice).
 Square brackets []denote thatone ornoneofthe enclosedterm(s)may be
substituted.Forexample, [ a | b ], means chooseone of“a” or“b” orneither
of the two.
 Braces {}denote that exactly one ofthe enclosed alternativesmust be specified.
Forexample, { a | b }, means exactly one of“a” or“b” must be specified.
 Ellipses ... denote thatthe proceedingtermmay be repeated.Forexample, [ a |
b ] ... means that youmay havethe sequence,“b a a b a”.

8
2 Failover Manager - Overview
An EDB Postgres FailoverManager(EFM)clusteris comprisedofFailoverManager
processesthat reside onthe following hostson a network:
 A Masternode-The Masternodeis the primary database serverthat is servicing
database clients.
 One ormore Standbynodes -A Standbynodeis a streaming replication server
associatedwith the Masternode.
 A Witness node -The Witnessnode confirms assertionsofeitherthe Masterora
Standbyin a failover scenario. A clusterdoesnotneeda dedicated witnessnode
if the clustercontainsthreeormore nodes;ifyou do not havea third cluster
member that is a database host,youcan add a dedicatedWitnessnode.
Traditionally,a cluster is a single instanceofPostgresmanaging multiple databases. In
this document,the termclusterrefers to a FailoverManagercluster. A FailoverManager
clusterconsistsofa Masteragent,one ormore Standbyagents,andan optionalWitness
agent that reside on serversin a cloud oron a traditionalnetworkand communicateusing
the JGroups toolkit.
Figure 2.1 - A FM scenario employing a Virtual IP address.

9
When a non-witness agentstarts,it connectsto the localdatabase and checksthestateof
the database:
 If the agent cannotreach the database,it will start in idle mode.
 If it finds thatthe database is in recovery,the agent assumes the role of standby;
 If the database is notin recovery,the agent assumes the role ofmaster.
In the event ofa failover,FailoverManagerattemptsto ensure that the promotedstandby
is the most up-to-date standby in the cluster; please note that data lossis possible ifthe
standbynodeis not in sync with the masternode.
JGroups providestechnology thatallows FailoverManagerto createclusterswhose
member nodescan communicatewith each otheranddetectnode failures. Formore
information about JGroups,visit the officialproject siteat:
http://www.jgroups.org
Figure 2.1 illustrates a FailoverManagerclusterthatemploysa virtualIPaddress. You
can use a load balancerin place ofa virtualIPaddressifyou provide yourown fencing
script to re-configure the load balancerin the eventofa failure. Formore information
about usingFailoverManagerwith a virtualIPaddress,see Section 3.6. Formore
information about usinga fencingscript,seeSection3.5.1.

10
2.1 Supported Platforms
FailoverManager3.4is supported onEDB PostgresAdvanced ServerorPostgreSQL
(version 9.3and higher)installations running on:
 CentOS6.x and 7.x
 Red Hat Enterprise Linux6.x and 7.x
 Oracle Enterprise Linux6.x and 7.x
 Red Hat Enterprise Linux(IBM Power8Little Endian orppc64le)7.x
 Debian 9
 SLES 12
 Ubuntu 18.04

11
2.2 Prerequisites
Before configuring a FailoverManagercluster,you mustsatisfy theprerequisites
describedbelow.
Install Java 1.8 (or later)
Before using FailoverManager,you must first installJava (version1.8or later). Failover
Manageris testedwith OpenJDK,and we strongly recommend installing thatversionof
Java. InstallationinstructionsforJava are platformspecific; formore information,visit:
https://openjdk.java.net/install/
Provide an SMTP Server
You can receive notificationsfromFailoverManageras specified by a user-defined
notification script,by email,or both.
 If you are using emailnotifications,an SMTPservermust be runningon each
node ofthe FailoverManagerscenario.
 If you provide a valuein the script.notification property,youcanleave the
user.email field blank; an SMTPserveris not required.
If an event occurs, FailoverManagerinvokesthe script (ifprovided),and sends a
notification emailto any emailaddressesspecified in the user.email parameterofthe
clusterpropertiesfile. Formore information about usingan SMTPserver,visit:
https://access.redhat.com/site/documentation
Formore information,see Section3.5.1.1.
Configure Streaming Replication
FailoverManagerrequires thatPostgreSQLstreaming replication be configured between
the Masternode and the Standbynode ornodes. FailoverManagerdoesnot support
othertypesofreplication.
Unless specified with the -sourcenode option,a recovery.conf file is copied froma
randomstandbynode to the stoppedmasterduringswitchover. You should ensure that
the pathswithin therecovery.conf files on yourstandbynodes are consistent before
performing a switchover. Formore information aboutthe -sourcenode option,please
see Section 4.1.4.
Please note that FailoverManagerdoesnot supportautomatic reconfigurationofthe
standbydatabasesaftera failoverifyou use replication slots tomanageyourWAL

12
segments. Ifyou use replicationslots,youshould set the auto.reconfigure parameter
to false,and manually reconfigure thestandbyserversin the eventofa failover.
Modify the pg_hba.conf File
You must modify the pg_hba.conf file on the MasterandStandbynodes,adding
entries thatallowcommunicationbetween the allofthe nodesin the cluster. The
following example demonstratesentries thatmight be made to the pg_hba.conf file on
the Masternode:
# access for itself
host fmdb efm 127.0.0.1/32 md5
# access for standby
host fmdb efm 192.168.27.1/32 md5
# access for witness
host fmdb efm 192.168.27.34/32 md5
Where:
efm specifies the name ofa valid database user.
fmdb specifiesthe name ofa databaseto which theefm usermay connect.
Formore information aboutthe properties file,see Section 3.5.1.
By default,the pg_hba.conf file resides in the data directory,underyourPostgres
installation. Aftermodifying the pg_hba.conf file,you must reloadthe configuration
file on each node forthe changesto take effect. You can use the following command:
# systemctl reload edb-as-x
Where x specifiesthe Postgresversion.
Using Autostartfor the Database Servers
If a Masternodereboots,FailoverManagermay detectthe database is downon the
Masternodeandpromote a Standbynodeto therole ofMaster.Ifthis happens,the
FailoverManageragenton the(rebooted) Masternode willnot get a chanceto write the
recovery.conf file; the rebootedMasternodewill return to the clusteras a second
Masternode.
To prevent this,start the FailoverManageragentbeforestartingthedatabaseserver. The
agent will start in idle mode,and checkto see ifthere is already a masterin the cluster.
If there is a masternode,the agentwill verify that a recovery.conf file exists,and the
database will not startas a secondmaster.

13
Ensure Communication Through Firewalls
If a Linux firewall (i.e. iptables)is enabled on thehostofa FailoverManagernode,
you may need to addrules tothe firewallconfigurationthat allow tcp communication
between the FailoverManagerprocessesin the cluster. Forexample:
# iptables -I INPUT -p tcp --dport 7800:7810 -j ACCEPT
/sbin/service iptables save
The command shown above opensa smallrange ofports(7800 through7810). Failover
Managerwill connect via the port that correspondsto the port specified in the cluster
propertiesfile.
Ensure that the db.user has SufficientPrivileges
The database userspecified in the efm.properties file must have sufficient privileges
to invoke the following functions onbehalfofFailoverManager:
pg_current_wal_lsn()
pg_last_wal_replay_lsn()
pg_wal_replay_pause()
pg_is_wal_replay_paused()
pg_wal_replay_resume()
Fordetailed information abouteachofthesefunctions,please see the PostgreSQLcore
documentation,available at:
https://www.postgresql.org/docs/10/static/index.html

14
2.3 Tutorial - Configuring a Simple Failover Manager Cluster
This tutorialdescribes quickly configuringa FailoverManagerclusterin a test
environment. Othersectionsin this guideprovide key informationthat you should read
and understandbefore configuring FailoverManagerfora productiondeployment.
This tutorialassumesthat:
 A databaseserveris runningand streaming replicationis set up betweena master
and one ortwo standby nodes.
 You have installed FailoverManageron each node. Formore information about
installing FailoverManager,see Section 3.
The example that follows createsa clusternamed efm.
You should start the configuration process on a masterorstandbynode. Then, copythe
configurationfiles to othernodes to save time.
Step1:Create Working ConfigurationFiles
Copy the providedsample files to create EFM configurationfiles,and correct the
ownership:
cd /etc/edb/efm-3.4
cp efm.properties.in efm.properties
cp efm.nodes.in efm.nodes
chown efm:efm efm.properties
chown efm:efm efm.nodes
Step2:Create an Encrypted Password
Create the encryptedpassword (neededforthe properties file):
/usr/edb/efm-3.4/bin/efm encrypt efm
Follow the onscreeninstructionsto produce theencrypted version ofyourdatabase
password.
Step3:Update the efm.properties File
The cluster_name.properties file contains parameters that specify connection
properties andbehaviorsforyourFailoverManagercluster. Modificationsto property
settingsare applied when FailoverManagerstarts.

15
The following propertiesare the minimal propertiesrequired toconfigurea Failover
Managercluster. Ifyou are configuring a productionsystem,pleasesee 3.5.1fora
complete list ofproperties.
Database connection properties(needed evenon thewitnesssoit can connect toother
databases whenneeded):
db.user
db.password.encrypted
db.port
db.database
Ownerof the data directory (usually postgres orenterprisedb):
db.service.owner
Only one ofthe following properties is needed. Ifyou provide the service name,EFM
will use a service command to controlthe databaseserver whennecessary;ifyou
provide the locationofthe Postgres bin directory,EFM will use pg_ctl to controlthe
database server.
db.service.name
db.bin
The data directory in which EFM will find or create recovery.conf files:
db.recovery.conf.dir
Set to receive email notifications (the notification text is also includedin the agent log):
user.email
This is the localaddressofthe nodeand the portto use forEFM. Othernodeswill use
this addressto reach the agent,and the agentwill also use this addressforconnectingto
the localdatabase (as opposedto connectingto localhost). An example ofthe format is
included below:
bind.address=1.2.3.4:7800
Set this property to true on a witnessnodeand false ifit is a masterorstandby:
is.witness
If you are running ona networkwithoutaccessto the Internet,change thisto an address
that is available on yournetwork:
pingServerIp=8.8.8.8

16
When configuringa productioncluster,the following propertiescanbe either true or
false depending onyoursystemconfiguration and usage. Set thembothto true to
simplify startupifyou're configuring an EFM test cluster.
auto.allow.hosts=true
stable.nodes.file=true
Step4:Update the efm.nodes File
The cluster_name.nodes file is read at startupto tellan agenthowto find the rest of
the clusteror,in the caseofthe first node started,can be used to simplify authorizationof
subsequent nodes.
Add the addressesandportsof each nodein the clusterto this file. One node will act as
the membership coordinator;the list should include at least the membership coordinator's
address:
1.2.3.4:7800
1.2.3.5:7800
1.2.3.6:7800
Please note that the FailoverManageragentwill not verify the contentofthe efm.nodes
file; the agent expectsthatsome ofthe addressesin the file cannot be reached(e.g.that
anotheragent hasn’tbeenstarted yet). Formore information about the efm.nodes file,
see Section 3.5.2.
Step5:Configure the Other Nodes
Copy the efm.properties andefm.nodes files to the /etc/edb/efm-3.4 directory
on the othernodes in yoursample cluster. Aftercopyingthefiles,change the file
ownership sothe files are ownedby efm:efm. The efm.properties file can be the
same on every node, except forthe followingproperties:
 Modify the bind.address propertyto use thenode’slocaladdress.
 Set is.witness to true if the node is a witnessnode. Ifthe node is a witness
node,the propertiesrelatingto a localdatabaseinstallationwillbe ignored.
Step6:Startthe EFM Cluster
On any node,start the FailoverManageragent. The agent is namedefm-3.4; youcan
use yourplatform-specific servicecommand to controlthe service. Forexample,on a
CentOSorRHEL 7.x host usethe command:
systemctl start efm-3.4
On a a CentOSorRHEL 6.x host use the command:

17
service efm-3.4 start
Afterthe agent starts,run the following command to see thestatusofthe single-node
cluster. You should see theaddressesofthe othernodesin the Allowed node host
list.
/usr/edb/efm-3.4/bin/efm cluster-status efm
Start the agenton theothernodes.Run the efmcluster-statusefmcommand on any node
to see the clusterstatus.
If any agent fails to start,see thestartup log forinformationaboutwhat wentwrong:
cat /var/log/efm-3.4/startup-efm.log
Performing aSwitchover
If the clusterstatusoutput showsthat themasterandstandby(s)are in sync,youcan
performa switchoverwith the followingcommand:
/usr/edb/efm-3.4/bin/efm promote efm -switchover
That command will promote a standbyand reconfigure the masterdatabase as a new
standbyin the cluster. To switch back,run the command again.
Formore information aboutusingthe efm command line tool,see Section5.3.

18
3 Installing and Configuring Failover
Manager
Before installing and configuringFailoverManager,you must create a Postgres
streaming replicationscenario,andensure that the nodeshave sufficientpermissionsto
communicate with each other. You must also have credentialsthat allowaccessto the
EnterpriseDBrepository.
To requestcredentialsforthe repository,visit theEnterpriseDBAdvanced Downloads
page at:
https://www.enterprisedb.com/advanced-downloads
Follow the links in the EDB FailoverManagertable to requestcredentials.
3.1 Installing an RPM Package on a RedHat, CentOS, or OEL
Host
Afterreceiving yourcredentials,youmust create the EnterpriseDBrepository
configurationfile on each nodeofthe cluster, and thenmodify the file to enable access.
The following stepsprovide detailed informationaboutaccessingthe EnterpriseDB
repository;the stepsmustbe performed on each nodeofthe cluster:
1. Use the edb-repo packageto create the repositoryconfiguration file. You can
downloadand invoke the edb-repo file,or use rpmoryumto create the
repository. Assume superuserprivilegesanduse either rpm oryum to create the
EnterpriseDBrepository configurationfile. :
rpm -Uvh http://yum.enterprisedb.com/edbrepos/edb-repo-
latest.noarch.rpm
or
yum install -y http://yum.enterprisedb.com/edbrepos/edb-
repo-latest.noarch.rpm
The repositoryconfiguration file is named edb.repo; it resides in
/etc/yum.repos.d.
2. Use yourchoiceofeditorto modify the repositoryconfiguration file,enabling the
[enterprisedb-tools] and the [enterprisedb-dependencies]entries.
To enable a repository,change the valueofthe enabledparameterto 1and replace

19
the username and password placeholdersin the baseurlspecification with your
username and the repositorypassword.
[enterprisedb-tools]
name=EnterpriseDB Tools $releasever - $basearch
baseurl=http://<username>:<password>@yum.enterprisedb.com/t
ools/redhat/rhel-$releasever-$basearch
enabled=0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/ENTERPRISEDB-GPG-KEY
[enterprisedb-dependencies]
name=EnterpriseDB Dependencies $releasever - $basearch
baseurl=http://<username>:<password>@yum.enterprisedb.com/d
ependencies/redhat/rhel-$releasever-$basearch
enabled=0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/ENTERPRISEDB-GPG-KEY
3. Aftermodifying applicable entries in the repository configurationfile,save the
configurationfile and exit the editor.
Then,you canuse theyuminstallcommand to installFailoverManager. Forexample, to
installFailoverManagerversion 3.4,use the command:
yum install edb-efm34
When you installan RPM package that is signedby a source that is notrecognized by
yoursystem,yummay askforyourpermissionto import the key to yourlocalserver. If
prompted,andyouare satisfiedthatthe packagescome froma trustworthysource,entera
y, and press Return to continue.
During the installation,yummay encountera dependencythatit cannot resolve. Ifit
does,it will provide a list ofthe required dependenciesthat youmustmanually resolve.
FailoverManagermust be installed by root. During theinstallationprocess,the
installerwill also create a usernamed efm that hassufficient privilegesto invoke scripts
that controlthe FailoverManagerservice forclustersownedby enterprisedb or
postgres.
If you are using FailoverManagerto monitora clusterownedby a userotherthan
enterprisedb orpostgres,see Section 3.4,Extending FailoverManager
Permissions.
Afterinstalling FailoverManageron eachnodeofthe cluster,you must:
1. Modify the clusterproperties file on each node. Fordetailed information about
modifying the clusterpropertiesfile,see Section 3.5.1.

20
2. Modify the clustermembers file on each node. Formore information about the
clustermembers file,see Section 3.5.2.
3. If applicable,configure andtest virtualIPaddresssettingsand any scripts thatare
identified in the clusterpropertiesfile.
4. Start the FailoverManageragent oneachnode ofthecluster. Formore
information about controlling theFailoverManagerservice,seeSection 5.
3.1.1 Installation Locations
FailoverManagercomponents are installed in the following locations:
Component Location
Executables /usr/edb/efm-3.4/bin
Libraries /usr/edb/efm-3.4/lib
Cluster configuration files /etc/edb/efm-3.4
Logs /var/log/efm-3.4
Lock files /var/lock/efm-3.4
Log rotation file /etc/logrotate.d/efm-3.4
sudo configuration file /etc/sudoers.d/efm-34
Binary to access VIP without sudo /usr/edb/efm-3.4/bin/secure

21
3.2 Installing an RPM Package on a Debian or Ubuntu Host
To installFailoverManager,youmustalso havecredentials thatallowaccessto the
EnterpriseDBrepository. To request credentialsforthe repository,visit the
EnterpriseDBAdvanced Downloads page at:
Follow the links in the EDB FailoverManagertable to requestcredentials.
The following stepswill walk you throughusingthe EnterpriseDBapt repository to
installFailoverManager. Whenusingthe commands,replacethe username and
password with the credentialsprovided byEnterpriseDB.
1. Assume superuserprivileges:
sudo su -
2. Configure the EnterpriseDBapt repository:
sh -c 'echo "deb
https://username:password@apt.enterprisedb.com/$(lsb_releas
e -cs)-edb/ $(lsb_release -cs) main" >
/etc/apt/sources.list.d/edb-$(lsb_release -cs).list'
3. Add support to yoursystemforsecure APTrepositories:
apt-get install apt-transport-https
4. Add the EDBsigning key:
wget -q -O - https:// username: password
@apt.enterprisedb.com/edb-deb.gpg.key | apt-key add -
5. Update the repositorymeta data:
apt-get update
6. InstallFailoverManager:
apt-get install edb-efm34

22
3.3 Installing an RPM Package on a SLES Host
To installFailoverManager,youmustalso havecredentials thatallowaccessto the
EnterpriseDBrepository. To request credentialsforthe repository,visit theAdvanced
Downloads page at:
You can use thezypper package managerto installa FailoverManageragenton an
SLES 12 host. zypper will attempt to satisfypackagedependencies asit installs a
package,butrequires accesstospecific repositoriesthat are nothostedat EnterpriseDB.
You must assume superuserprivilegesandstopany firewalls beforeinstalling Failover
Manager. Then,usethe following commandsto add EnterpriseDBrepositoriesto your
system:
zypper addrepo http://zypp.enterprisedb.com/suse/epas96-sles.repo
zypper addrepo http://zypp.enterprisedb.com/suse/epas-sles-
tools.repo
zypper addrepo http://zypp.enterprisedb.com/suse/epas-sles-
dependencies.repo
The commands create the repository configurationfiles in the /etc/zypp/repos.d
directory. Then,use the followingcommand torefreshthe metadata onyourSLES host
to include the EnterpriseDBrepository:
zypper refresh
When prompted,providecredentials forthe repository, and specify a to always trust the
provided key,andupdatethe metadata to include the EnterpriseDBrepository.
You must also addSUSEConnect andtheSUSEPackage Hub extension to the SLES
host,andregisterthehostwith SUSE,allowing accessto SUSErepositories. Use the
commands:
zypper install SUSEConnect
SUSEConnect -r registration_number -e user_id
SUSEConnect -p PackageHub/12/x86_64
SUSEConnect -p sle-sdk/12/x86_64
Then,you canuse thezypperutility to installa FailoverManageragent:
zypper install edb-efm34
Fordetailed information aboutregistering a SUSEhost,visit:
https://www.suse.com/support/kb/doc/?id=7016626

23
3.4 Extending Failover Manager Permissions
During the FailoverManagerinstallation,theinstallercreatesa usernamed efm. efm
does nothave sufficientprivilegesto performmanagement functionsthatare normally
limited to the database owneroroperatingsystemsuperuser.
 When performing managementfunctionsrequiring database superuserprivileges,
efm invokes the efm_db_functions script.
 When performing managementfunctionsrequiring operatingsystemsuperuser
privileges,efm invokesthe efm_root_functions script.
 When assigningorreleasinga virtualIPaddress, efm invokestheefm_address
script.
The efm_db_functions orefm_root_functions scriptsperformmanagement
functions onbehalfofthe efm user.
The sudoers file containsentriesthat allowthe userefm to controlthe FailoverManager
service forclustersownedby postgres orenterprisedb. You can modify a copyof
the sudoersfile to grant permissionto manage Postgresclusters ownedby otherusersto
efm.
The efm-34 file is located in /etc/sudoers.d,andcontainsthe followingentries:
# Copyright EnterpriseDB Corporation, 2014-2019. All Rights
# Reserved.
#
# Do not edit this file. Changes to the file may be overwritten
# during an upgrade.
#
# This file assumes you are running your efm cluster as user
# 'efm'. If not, then you will need to copy this file.
# Allow user 'efm' to sudo efm_db_functions as either 'postgres'
# or 'enterprisedb'. If you run your db service under a
# non-default account, you will need to copy this file to grant
# the proper permissions and specify the account in your efm
# cluster properties file by changing the 'db.service.owner'
# property.
efm ALL=(postgres) NOPASSWD: /usr/edb/efm-3.4 /bin/efm_db_functions
efm ALL=(enterprisedb) NOPASSWD: /usr/edb/efm-3.4
/bin/efm_db_functions
# Allow user 'efm' to sudo efm_root_functions as 'root' to
# write/delete the PID file, validate the db.service.owner
# property, etc.

24
efm ALL=(ALL) NOPASSWD: /usr/edb/efm-3.4 /bin/efm_root_functions
# Allow user 'efm' to sudo efm_address as root for VIP tasks.
efm ALL=(ALL) NOPASSWD: /usr/edb/efm-3.4 /bin/efm_address
# relax tty requirement for user 'efm'
Defaults:efm !requiretty
If you are using FailoverManagerto monitorclustersthat are owned byusersotherthan
postgres or enterprisedb,make a copy ofthe efm-34 file, and modify the content
to allow the userto accesstheefm_functions script to manage theirclusters.
If an agent cannot start becauseofpermission problems,make sure the default
/etc/sudoers file containsthefollowing line at the end ofthe file:
## Read drop-in files from /etc/sudoers.d (the # here does not
# mean a comment)
#includedir /etc/sudoers.d

25
3.4.1 Running Failover Manager without sudo
By default,FailoverManageruses sudo to securely manage accessto system
functionality. Ifyou chooseto configure FailoverManagerto run withoutsudoaccess,
please note that root accessis stillrequired to:
 installthe FailoverManagerRPM.
 performFailoverManagersetuptasks.
To run FailoverManagerwithoutsudo,youmustselect a database processownerthat
will have privilegestoperformmanagement functionson behalfofFailoverManager.
The usercouldbe the default database superuser(forexample, enterprisedb or
postgres)or anotherprivilegeduser. Afterselectingthe user:
1. Use the following command to addtheuserto the efm group:
usermod -a -G efm enterprisedb
This should allowthe userto write to /var/run/efm-3.4 and
/var/lock/efm-3.4.
2. If you are reusing a clustername,remove any previously created log files; the
newuserwill not be able to write to log files created bythe default (orother)
owner.
3. Copy the clusterproperties templatefile and the nodestemplate file:
su - enterprisedb
cp /etc/edb/efm-3.4/efm.properties.in
directory/cluster_name.properties
cp /etc/edb/efm-3.4/efm.nodes.in
directory/cluster_name.nodes
Then,modify the clusterpropertiesfile, providing the name ofthe userin the
db.service.owner property. You must also ensure thatthe db.service.name
propertyis blank; without sudo,youcannot runserviceswithoutroot access.
Aftermodifying the configuration,the newuser cancontrolFailoverManagerwith the
following command:
/usr/edb/efm-3.4/bin/runefm.sh start|stop
directory/cluster_name.properties
Where directory/cluster_name.properties specifies the fullpath and name of
the clusterpropertiesfile. Please note thatthe usermust ensure that the fullpath to the
propertiesfile must be provided wheneverthe non-default useris controlling agentsor
using the efm script.

26
To allow the newuserto manage FailoverManageras a service,youmustprovide a
customscript orunit file.
FailoverManageruses a binary named manage-vip that residesin /usr/edb/efm-
3.4/bin/secure/ to performVIP management operationswithout sudo privileges.
This script usessetuid to acquire with the privilegesneededto manage VirtualIP
addresses.
 This directory is only accessible to root and usersin the efm group.
 The binary is only executable by root andthe efm group.
Forsecurity reasons,we recommend againstmodifyingthe accessprivilegesof the
/usr/edb/efm-3.4/bin/secure/ directory orthe manage-vip script.
Formore information aboutusingFailoverManagerwithout sudo,visit:
https://www.enterprisedb.com/blog/running-edb-postgres-failover-manager-without-sudo

27
3.5 Configuring Failover Manager
Configurable FailoverManagerpropertiesare specified in two user-modifiable files:
 efm.properties
 efm.nodes
The efm.properties file containsthe propertiesofthe individualnodeon whichit
resides,while the efm.nodes file containsa list ofthe currentFailoverManagercluster
members. By default,the installerplacesthefiles in the /etc/edb/efm-3.4 directory.
Please note that alluserscriptsreferenced in the propertiesfile will be invoked as the
FailoverManageruser.
3.5.1 The Cluster Properties File
The FailoverManagerinstallercreatesa file template forthe clusterpropertiesfile named
efm.properties.in in the /etc/edb/efm-3.4 directory. Aftercompletingthe
FailoverManagerinstallation,youmustmake a working copy ofthe templatebefore
modifying the file contents. Forexample,the following command copiesthe
efm.properties.in file, creating a propertiesfile named efm.properties:
# cp /etc/edb/efm-3.4/efm.properties.in /etc/edb/efm-3.4/efm.properties
Aftercopying thetemplatefile,change the ownerofthe file to efm:
# chown efm:efm efm.properties
Please note:By default,FailoverManagerexpects the clusterpropertiesfile to be named
efm.properties. If you name the properties file somethingotherthan
efm.properties,you must modify the service script orunit file to instructFailover
Managerto use a different name.
Aftercreating theclusterpropertiesfile,add (ormodify)configurationparametervalues
as required. Fordetailed information about each property,seeSection 3.5.1.1.
The propertyfiles are ownedby root. The FailoverManagerservicescript expectsto
find the files in the /etc/edb/efm-3.4 directory. Ifyou move the propertyfile to
anotherlocation,youmustcreatea symbolic linkthat specifies thenewlocation.

28
3.5.1.1 Specifying Cluster Properties
You can use thepropertieslisted in the clusterpropertiesfile to specify connection
propertiesandbehaviorsforyourFailoverManagercluster. Modificationsto property
settingswill be applied when FailoverManagerstarts. Ifyou modify a propertyvalue
you must restart FailoverManagerto apply the changes.
Property valuesare case-sensitive. Note that while Postgresuses quoted stringsin
parametervalues,FailoverManagerdoes not allowquoted stringsin propertyvalues.
Forexample, while you might specify an IPaddressin a Postgres configuration
parameteras:
listen_addresses='192.168.2.47'
FailoverManagerrequires thatthe value notbe enclosed in quotes:
bind.address=192.168.2.54:7800
Use the properties in the efm.properties file to specify connection,administrative,and
operationaldetails forFailoverManager.
Use the following properties to specify connection details forthe FailoverManager
cluster:
# The value for the password property should be the output from
# 'efm encrypt' -- do not include a cleartext password here. To
# prevent accidental sharing of passwords among clusters, the
# cluster name is incorporated into the encrypted password. If
# you change the cluster name (the name of this file), you must
# encrypt the password again with the new name.
# The db.port property must be the same for all nodes.
db.user=
db.password.encrypted=
db.port=
db.database=
The db.user specified must havesufficient privilegesto invoke selectedPostgreSQL
commands on behalfofFailoverManager. Formore information,please see Section 2.2.
Forinformation aboutencryptingthepassword forthe database user, see Section 3.5.1.2.
Use the db.service.owner propertyto specify thename ofthe operating systemuser
that ownsthe clusterthat is beingmanaged byFailoverManager. This propertyis not
required on a dedicatedwitnessnode.
# This property tells EFM which OS user owns the $PGDATA dir for
# the 'db.database'. By default, the owner is either 'postgres'

29
# for PostgreSQL or 'enterprisedb' for EDB Postgres Advanced
# Server. However, if you have configured your db to run as a
# different user, you will need to copy the /etc/sudoers.d/efm-XX
# conf file to grant the necessary permissions to your db owner.
#
# This username must have write permission to the
# 'db.recovery.conf.dir' specified below.
db.service.owner=
Specify the name ofthe database servicein the db.service.name property ifyou use the
service or systemctl command when startingorstopping the service.
# Specify the proper service name in order to use service
# commands rather than pg_ctl to start/stop/restart a database.
# For example, if this property is set, then 'service <name>
# restart' or 'systemctl restart <name>' (depending on OS
# version) will be used to restart the database rather than
# pg_ctl. This property is required unless db.bin is set.
db.service.name=
You should use thesame service controlmechanism(pg_ctl,service,or
systemctl)each time you start orstopthe database service. Ifyou use the pg_ctl
programto controlthe service,specify thelocation ofthe pg_ctl programin the db.bin
property.
# Specify the directory containing the pg_ctl command, for
# example: /usr/pgsql-9.6/bin. Unless the db.service.name
# property is used, the pg_ctl command is used to
# start/stop/restart databases as needed after a failover or
# switchover. This property is required unless db.service.name
# is set.
db.bin=
Use the db.recovery.conf.dir property to specify the locationto which a recoveryfile
will be written on the Masternodeofthe cluster,and a triggerfile is written on a
Standby. This propertyis not requiredon a dedicated witnessnode.
# Specify the location of the db recovery.conf file on the node.
# On a standby node, the trigger file location is read from the
# file in this directory. After a failover, the recovery.conf
# files on remaining standbys are changed to point to the new
# master db (a copy of the original is made first). On a master
# node, a recovery.conf file will be written during failover and
# promotion to ensure that the master node can not be restarted
# as the master database.

30
db.recovery.conf.dir=
Use the jdbc.sslmode propertyto instruct FailoverManagerto use SSLconnections;
by default,SSLis disabled.
# Use the jdbc.sslmode property to enable ssl for EFM
# connections. Setting this property to anything but 'disable'
# will force the agents to use 'ssl=true' for all JDBC database
# connections (to both local and remote databases).
# Valid values are:
#
# disable - Do not use ssl for connections.
# verify-ca - EFM will perform CA verification before allowing
# the certificate.
# require - Verification will not be performed on the server
# certificate.
jdbc.sslmode=disable
Forinformation aboutconfiguring andusingSSL,please see:
https://www.postgresql.org/docs/10/static/ssl-tcp.html
and
https://jdbc.postgresql.org/documentation/94/ssl.html
Use the user.email propertyto specify anemailaddress (ormultiple email addresses)
that will receive any notificationssent byFailoverManager.
# Email address(es) for notifications. The value of this
# property must be the same across all agents. Multiple email
# addresses must be separated by space. If using a notification
# script instead, this property can be left blank.
user.email=
Use the notification.level propertyto specify the minimumseveritylevelat which
FailoverManagerwill send usernotifications orwhen a notificationscriptis called. For
a complete list ofnotifications,please see Section 7.
# Minimum severity level of notifications that will be sent by
# the agent. The minimum level also applies to the notification
# script (below). Valid values are INFO, WARNING, and SEVERE.
# A list of notifications is grouped by severity in the user's
# guide.
notification.level=INFO
Use the script.notification propertyto specify thepathto a user-supplied script
that actsas a notificationservice;the script willbe passeda message subject anda

31
messagebody. The scriptwill be invoked eachtime FailoverManagergeneratesa user
notification.
# Absolute path to script run for user notifications.
#
# This is an optional user-supplied script that can be used for
# notifications instead of email. This is required if not using
# email notifications. Either/both can be used. The script will
# be passed two parameters: the message subject and the message
# body.
script.notification=
The bind.address property specifies the IPaddressandportnumberofthe agent onthe
current node ofthe FailoverManagercluster.
# This property specifies the ip address and port that jgroups
# will bind to on this node. The value is of the form
# <ip>:<port>.
# Note that the port specified here is used for communicating
# with other nodes, and is not the same as the admin.port below,
# used only to communicate with the local agent to send control
# signals.
# For example, <provide_your_ip_address_here>:7800
bind.address=
Use the admin.port propertyto specify a port on whichFailoverManagerlistens for
administrative commands.
# This property controls the port binding of the administration
# server which is used for some commands (ie cluster-status). The
# default is 7809; you can modify this value if the port is
# already in use.
admin.port=7809
Set the is.witness propertyto true to indicate that thecurrent node is a witnessnode.
If is.witness is true, the localagent willnot checkto seeifa localdatabaseis
running.
# Specifies whether or not this is a witness node. Witness nodes
# do not have local databases running.
is.witness=
The Postgres pg_is_in_recovery() functionis a booleanfunctionthat reportsthe
recovery state ofa database. The function returns true ifthe databaseis in recovery,or
false if the database is notin recovery. Whenan agentstarts,it connectsto the local

32
database andinvokesthe pg_is_in_recovery() function. Ifthe serverresponds
true, the agentassumestherole ofstandby; ifthe serverresponds false,the agent
assumesthe role ofmaster. Ifthere is no localdatabase,the agent willassume an idle
state.
If is.witness is true, FailoverManagerwill not checktherecoverystate.
The local.period property specifies howmany secondsbetween attemptsto contact
the database server.
The local.timeout property specifies howlongan agent willwait fora positive
response fromthe localdatabase server.
The local.timeout.final propertyspecifies howlong an agent willwait afterthe
final attempt to contact the database serveronthe currentnode. Ifa response is not
received fromthe databasewithin the numberofsecondsspecified by the
local.timeout.final property,the database is assumed tohave failed.
Forexample, given the default valuesoftheseproperties,a checkofthe localdatabase
happensonce every 10seconds. Ifan attempt to contactthe localdatabase doesnotcome
backpositivewithin 60seconds,FailoverManagermakes a finalattempt to contactthe
database. Ifa responseis not receivedwithin 10seconds,FailoverManagerdeclares
database failure and notifies the administratorlisted in the user.email property. These
propertiesare not requiredon a dedicated witnessnode.
# These properties apply to the connection(s) EFM uses to monitor
# the local database. Every 'local.period' seconds, a database
# check is made in a background thread. If the main monitoring
# thread does not see that any checks were successful in
# 'local.timeout' seconds, then the main thread makes a final
# check with a timeout value specified by the
# 'local.timeout.final' value. All values are in seconds.
# Whether EFM uses single or multiple connections for database
# checks is controlled by the 'db.reuse.connection.count'
# property.
local.period=10
local.timeout=60
local.timeout.final=10
If necessary,youshould modify these values tosuit yourbusinessmodel.
Use the remote.timeout propertyto specify howmany secondsan agentwaits fora
response froma remote database server(i.e.,howlong a standby agent waitsto verify
that the masterdatabase is actually down before performing failover).
# Timeout for a call to check if a remote database is responsive.
# For example, this is how long a standby would wait for a
# DB ping request from itself and the witness to the master DB
# before performing failover.

33
remote.timeout=10
Use the node.timeout propertyto specify the numberofsecondsthat an agent will wait
for a responsefroma node when determiningifa node hasfailed. The node.timeout
propertyvalue specifiesa timeout value foragent-to-agent communication; othertimeout
propertiesin the clusterpropertiesfile specify valuesforagent-to-database
communication.
# The total amount of time in seconds to wait before determining
# that a node has failed or been disconnected from this node.
#
# The value of this property must be the same across all agents.
node.timeout=50
Use the stop.isolated.master propertyto instruct FailoverManagerto shut down the
database ifa masteragentdetectsthat it is isolated. Whentrue (the default),Failover
Managerwill stop thedatabase before invokingthescriptspecified in the script.
master.isolated property.
# Shut down the database after a master agent detects that it has
# been isolated from the majority of the efm cluster. If set to
# true, efm will stop the database before running the
# 'script.master.isolated' script, if a script is specified.
stop.isolated.master=true
Use the stop.failed.master property to instructFailoverManagerto attempt to shut
down a masterdatabase ifit can not reach the database. Iftrue, FailoverManagerwill
run the script specified in the script.db.failure propertyafterattempting to shutdownthe
database.
# Attempt to shut down a failed master database after EFM can no
# longer connect to it. This can be used for added safety in the
# case a failover is caused by a failure of the network on the
# master node.
# If specified, a 'script.db.failure' script is run after this
attempt.
stop.failed.master=true
Use the master.shutdown.as.failure parameterto indicate that anyshutdown ofthe
FailoverManageragenton themasternodeshould be treated asa failure. If this
parameteris set to true and the masteragentstops(forany reason),theclusterwill
attempt to confirmif the databaseon themasternodeis running:
 If the database is reached,a notification willbe sent informing you oftheagent
status.

34
 If the database is notreached,a failoverwill occur.
# Treat a master agent shutdown as a failure. This can be set to
# true to treat a master agent shutdown as a failure situation,
# e.g. during the shutdown of a node, accidental or otherwise.
# Caution should be used when using this feature, as it could
# cause an unwanted promotion in the case of performing master
# database maintenance.
# Please see the user's guide for more information.
master.shutdown.as.failure=false
The master.shutdown.as.failure property is meant tocatchusererrorrather
failures,such asthe accidentalshutdownofa masternode. The propershutdownofa
node can appearto therest ofthe cluster like a userhasstoppedthe masterFailover
Manageragent (forexample to performmaintenanceon themasterdatabase). Ifyou set
the master.shutdown.as.failure propertyto true,care must be takenwhen
performing maintenance.
To performmaintenance onthe masterdatabase when master.shutdown.as.failure
is true, you shouldstopthe masteragentand wait to receivea notification thatthe
masteragent hasfailed but thedatabase is stillrunning. Thenit is safe to stopthe master
database. Alternatively,youcan use the efm stop-cluster command to stop allofthe
agentswithoutfailure checksbeingperformed.
Use the pingServer propertyto specify the IPaddressofa serverthatFailover
Managercan use to confirmthat networkconnectivityis not a problem.
# This is the address of a well-known server that EFM can ping
# in an effort to determine network reachability issues. It
# might be the IP address of a nameserver within your corporate
# firewall or another server that *should* always be reachable
# via a 'ping' command from each of the EFM nodes.
#
# There are many reasons why this node might not be considered
# reachable: firewalls might be blocking the request, ICMP might
# be filtered out, etc.
#
# Do not use the IP address of any node in the EFM cluster
# (master, standby, or witness because this ping server is meant
# to provide an additional layer of information should the EFM
# nodes lose sight of each other.
#
# The installation default is Google's DNS server.
pingServerIp=8.8.8.8
Use the pingServerCommand propertyto specify the commandusedto testnetwork
connectivity.

35
# This command will be used to test the reachability of certain
# nodes.
#
# Do not include an IP address or hostname on the end of
# this command - it will be added dynamically at runtime with the
# values contained in 'virtualIp' and 'pingServer'.
#
# Make sure this command returns reasonably quickly - test it
# from a shell command line first to make sure it works properly.
pingServerCommand=/bin/ping -q -c3 -w5
Use the auto.allow.hosts propertyto instruct the serverto usethe addresses
specified in the .nodes file ofthe first node started to update theallowed host list.
Enabling this property (settingauto.allow.hosts to true)can simplify clusterstart-
up.
# Have the first node started automatically add the addresses
# from its .nodes file to the allowed host list. This will make
# it faster to start the cluster when the initial set of hosts
# is already known.
auto.allow.hosts=false
Use the stable.nodes.file property to instructthe serverto notrewrite the nodesfile
when a node joins orleaves the cluster. This propertyis most usefulin clusterswith
unchangingIPaddresses.
# When set to true, EFM will not rewrite the .nodes file whenever
# new nodes join or leave the cluster. This can help starting a
# cluster in the cases where it is expected for member addresses
# to be mostly static, and combined with 'auto.allow.hosts' makes
# startup easier when learning failover manager.
stable.nodes.file=false
The db.reuse.connection.count propertyallows the administratorto specify the
numberoftimes FailoverManagerreuses the same database connection to checkthe
database health. The default value is 0,indicatingthat FailoverManagerwill create a
fresh connectioneachtime. This propertyis not required ona dedicatedwitnessnode.
# This property controls how many times a database connection is
# reused before creating a new one. If set to zero, a new
# connection will be created every time an agent pings its local
# database.
db.reuse.connection.count=0

36
The auto.failover propertyenablesautomatic failover. By default, auto.failover
is set to true.
# Whether or not failover will happen automatically when the master
# fails. Set to false if you want to receive the failover notifications
# but not have EFM actually perform the failover steps.
# The value of this property must be the same across all agents.
auto.failover=true
Use the auto.reconfigure property to instructFailoverManagerto enable ordisable
automatic reconfiguration ofremaining Standbyserversafterthe primary standbyis
promoted to Master. Set the property to true to enable automatic reconfiguration(the
default)orfalse to disable automatic reconfiguration. Thisproperty is notrequiredon
a dedicated witness node.
# After a standby is promoted, failover manager will attempt to
# update the remaining standbys to use the new master. Failover
# manager will back up recovery.conf, change the host parameter
# of the primary_conninfo entry, and restart the database. The
# restart command is contained in either the efm_db_functions or
# efm_root_functions file; default when not running db as an os
# service is:
# "pg_ctl restart -m fast -w -t <timeout> -D <directory>"
# where the timeout is the local.timeout property value and the
# directory is specified by db.recovery.conf.dir. To turn off
# automatic reconfiguration, set this property to false.
auto.reconfigure=true
Please note:primary_conninfo is a space-delimited list ofkeyword=value pairs.
Please note: Ifyou are usingreplication slotsto manage yourWALsegments,automatic
reconfigurationis not supported; youshould set auto.reconfigure to false. For
more information,see Section 2.2.
Use the promotable propertyto indicate that a node should notbe promoted. To
override the setting,usethe efm set-priority command at runtime; formore
information about theefm set-priority command,see Section 5.3.
# A standby with this set to false will not be added to the
# failover priority list, and so will not be available for
# promotion. The property will be used whenever an agent starts
# as a standby or resumes as a standby after being idle. After
# startup/resume, the node can still be added or removed from the
# priority list with the 'efm set-priority' command. This
# property is required for all non-witness nodes.
promotable=true

37
Use the minimum.standbys propertyto specify the minimumnumberofstandbynodes
that will be retained on a cluster; ifthe standbycount dropsto the specified minimum, a
replica node will not be promotedin the eventofa failure of the masternode.
# Instead of setting specific standbys as being unavailable for
# promotion, this property can be used to set a minimum number
# of standbys that will not be promoted. Set to one, for
# example, promotion will not happen if it will drop the number
# of standbys below this value. This property must be the same on
# each node.
minimum.standbys=0
Use the recovery.check.period property to specify thenumberofsecondsthat
FailoverManagerwill wait before checks tosee ifa database is out ofrecovery.
# Time in seconds between checks to see if a promoting database
# is out of recovery.
recovery.check.period=2
Use the auto.resume.period propertyto specify the numberofseconds(aftera
monitored database fails,and anagent hasassumed anidle state,orwhen starting in
IDLE mode)during which an agent will attempt to resume monitoringthat database.
# Period in seconds for IDLE agents to try to resume monitoring
# after a database failure or when starting in IDLE mode. Set to
# 0 for agents to not try to resume (in which case the
# 'efm resume <cluster>' command is used after bringing a
# database back up).
auto.resume.period=0
FailoverManagerprovidessupport forclustersthat use a virtualIP. If yourclusterusesa
virtualIP, provide the host name orIP address in the virtualIp property;specify the
corresponding prefixin the virtualIp.prefix property. IfvirtualIp is left blank,
virtualIP support is disabled.
Use the virtualIp.interface propertyto providethe networkinterface used bythe
VIP.
The specified virtualIPaddressis assignedonly to themasternodeofthe cluster. Ifyou
specify virtualIp.single=true,the same VIP addresswill be used onthe new
masterin the event ofa failover. Specify a value of false to providea uniqueIP
address foreachnode ofthe cluster.
Forinformation aboutusing a virtualIPaddress,see Section 3.6.

38
# These properties specify the IP and prefix length that will be
# remapped during failover. If you do not use a VIP as part of
# your failover solution, leave the virtualIp property blank to
# disable Failover Manager support for VIP processing (assigning,
# releasing, testing reachability, etc).
#
# If you specify a VIP, the interface and prefix are required.
#
# If specify a host name, it will be resolved to an IP address
# when acquiring or releasing the VIP. If the host name resolves
# to more than one IP address, there is no way to predict which
# address Failover Manager will use.
#
# By default, the virtualIp and virtualIp.prefix values must be
# the same across all agents. If you set virtualIp.single to
# false, you can specify unique values for virtualIp and
# virtualIp.prefix on each node.
#
# If you are using an IPv4 address, the virtualIp.interface value
# should not contain a secondary virtual ip id (do not include
# ":1", etc).
virtualIp=
virtualIp.interface=
virtualIp.prefix=
virtualIp.single=true
Provide pathsto scriptsthat reconfigure yourload balancerin the event ofa switchover
or masterfailure scenario. These scriptswill also be invokedin the event ofa standby
failure.
If you are using these properties,theyshould be providedon every nodeofthe cluster
(master,standby,andwitness). This ensuresthat ifa databasenode fails,anothernode
will call the detach script with thefailed node'saddress.
Set the check.vip.before.promotion propertyto false to indicatethatFailover
Managerwill not checkto seeifa VIP is in use before assigningit to a a newmasterin
the event ofa failure. Please notethatthiscould result in multiple nodesbroadcastingon
the same VIP address;unlessthemasternode is isolatedorcan be shutdownvia another
process,you should set thisproperty to true.
# Whether to check if the VIP (when used) is still in use before
# promoting after a master failure. Turning this off may allow
# the new master to have the VIP even though another node is also
# broadcasting it. This should only be used in environments where
# it is known that the failed master node will be isolated or
# shut down through other means.
check.vip.before.promotion=true

39
Provide a script name after the script.load.balancer.attach propertyto identify
a script thatwill be invoked whena nodeshould be attached tothe loadbalancer. Use
the script.load.balancer.detach propertyto specify the name ofa script that will
be invoked whena node should be detachedfromthe load balancer. Include the%h
placeholderto representthe IPaddressofthe node that is being attached orremovedfrom
the cluster.
# Absolute path to load balancer scripts
# The attach script is called when a node should be attached to
# the load balancer, for example after a promotion. The detach
# script is called when a node should be removed, for example
# when a database has failed or is about to be stopped. Use %h to
# represent the IP/hostname of the node that is being
# attached/detached.
#
# Example:
# script.load.balancer.attach=/somepath/attachscript %h
script.load.balancer.attach=
script.load.balancer.detach=
script.fence specifies the path to anoptionaluser-supplied script that willbe invoked
during the promotionofa standbynode to masternode.
# absolute path to fencing script run during promotion
#
# This is an optional user-supplied script that will be run
# during failover on the standby database node. If left blank,
# no action will be taken. If specified, EFM will execute this
# script before promoting the standby.
#
# Parameters can be passed into this script for the failed master
# and new primary node addresses. Use %p for new primary and %f
# for failed master. On a node that has just been promoted, %p
# should be the same as the node's efm binding address.
#
# Example:
# script.fence=/somepath/myscript %p %f
#
# NOTE: FAILOVER WILL NOT OCCUR IF THIS SCRIPT RETURNS A NON-ZERO
EXIT CODE.
script.fence=
Use the script.post.promotion propertyto specify the pathto an optionaluser-
suppliedscriptthatwill be invoked aftera standby node hasbeen promoted tomaster.
# Absolute path to fencing script run after promotion
#
# This is an optional user-supplied script that will be run after

40
# failover on the standby node after it has been promoted and
# is no longer in recovery. The exit code from this script has
# no effect on failover manager, but will be included in a
# notification sent after the script executes.
#
# Parameters can be passed into this script for the failed master
# and new primary node addresses. Use %p for new primary and %f
# for failed master. On a node that has just been promoted, %p
# should be the same as the node's efm binding address.
#
# Example:
# script.post.promotion=/somepath/myscript %f %p
script.post.promotion=
Use the script.resumed propertyto specify an optionalpath toa user-supplied script
that will be invoked when an agent resumesmonitoring ofa database.
# Absolute path to resume script
#
# This script is run before an IDLE agent resumes
# monitoring its local database.
script.resumed=
Use the script.db.failure property to specify thecomplete path to an optionaluser-
suppliedscriptthatFailoverManagerwill invoke if an agent detectsthat the database that
it monitors hasfailed.
# Absolute path to script run after database failure
#
# an agent detects that its local database has failed.
script.db.failure=
Use the script.master.isolated propertyto specify the complete pathto an optional
user-suppliedscriptthatFailoverManagerwill invoke if the agentmonitoringthemaster
database detectsthat themasteris isolatedfromthe majority ofthe FailoverManager
cluster. This script is called immediately afterthe VIPis released (ifa VIP is in use).
# Absolute path to script run on isolated master
#
# a master agent detects that it has been isolated from the
# majority of the efm cluster.
script.master.isolated=

41
Use the script.remote.pre.promotion property to specify thepath andname ofa
script that willbe invoked on anyagent nodesnot involved in the promotionwhena node
is about to promote its database to master.
Include the %p placeholderto identify theaddressofthe newprimary node.
# Absolute path to script invoked on non-promoting agent nodes
# before a promotion.
#
# This optional user-supplied script will be invoked on other
# agents when a node is about to promote its database. The exit
# code from this script has no effect on Failover Manager, but
# will be included in a notification sent after the script
# executes.
#
# Pass a parameter (%p) with the script to identify the new
# primary node address.
#
# Example:
# script.remote.pre.promotion=/path_name/script_name %p
script.remote.pre.promotion=
Use the script.remote.post.promotion property to specify the path andname ofa
script that willbe invoked on any non-masternodes aftera promotion occurs.
Include the %p placeholderto identify theaddressofthe newprimary node.
# Absolute path to script invoked on non-master agent nodes
# after a promotion.
#
# This optional user-supplied script will be invoked on nodes
# (except the new master) after a promotion occurs. The exit code
# from this script has no effect on Failover Manager, but will be
# included in a notification sent after the script executes.
#
# Pass a parameter (%p) with the script to identify the new
# primary node address.
#
# Example:
# script.remote.post.promotion=/path_name/script_name %p
script.remote.post.promotion=
Use the script.custom.monitor property to provide thename and locationofan
optionalscript that willbe invoked on regularintervals (specified in secondsbythe
custom.monitor.interval property).

42
Use custom.monitor.timeout to specify the maximum time that the script will be
allowed to run; if script executiondoesnotcomplete within the time specified,Failover
Managerwill send a notification.
Set custom.monitor.safe.mode to true to instruct FailoverManagerto report non-
zero exit codes fromthe script,but not promote a standbyas a result ofan exit code.
# Absolute path to a custom monitoring script.
#
# Use script.custom.monitor to specify the location and name of
# an optional user-supplied script that will be invoked
# periodically to perform custom monitoring tasks. A non-zero
# exit value means that a check has failed; this will be treated
# as a database failure. On a master node, script failure will
# cause a promotion. On a standby node script failure will
# generate a notification and the agent will become IDLE.
#
# The custom.monitor.* properties are required if a custom
# monitoring script is specified:
#
# custom.monitor.interval is the time in seconds between
executions of the script.
#
# custom.monitor.timeout is a timeout value in seconds for how
# long the script will be allowed to run. If script execution
# exceeds the specified time, the task will be stopped and a
# notification sent. Subsequent runs will continue.
#
# If custom.monitor.safe.mode is set to true, non-zero exit codes
# from the script will be reported but will not cause a promotion
# or be treated as a database failure. This allows testing of the
# script without affecting EFM.
#
script.custom.monitor=
custom.monitor.interval=
custom.monitor.timeout=
custom.monitor.safe.mode=
Use the sudo.command propertyto specify a command thatwill be invoked by Failover
Managerwhen performing tasksthat require extended permissions. Use thisoption to
include command optionsthat might be specific to yoursystemauthentication.
Use the sudo.user.command property to specify a command that willbe invoked by
FailoverManagerwhenexecutingcommandsthatwill be performed by the database
owner.
# Command to use in place of 'sudo' if desired when efm runs
# the efm_db_functions or efm_root_functions, or efm_address
# scripts.
# Sudo is used in the following ways by efm:

43
#
# sudo /usr/edb/efm-<version>/bin/efm_address <arguments>
# sudo /usr/edb/efm-<version>/bin/efm_root_functions <arguments>
# sudo -u <db service owner>
/usr/edb/efm-<version>/bin/efm_db_functions
<arguments>
#
# 'sudo' in the first two examples will be replaced by the value
# of the sudo.command property. 'sudo -u <db service owner>' will
# be replaced by the value of the sudo.user.command property.
# The '%u' field will be replaced with the db owner.
sudo.command=sudo
sudo.user.command=sudo -u %u
Use the lock.dir property to specify an alternate locationforthe FailoverManagerlock
file; the file preventsFailoverManagerfromstartingmultiple (potentially orphaned)
agentsfora single clusteron the node.
# Specify the directory of lock file on the node. Failover
# Manager creates a file named <cluster>.lock at this location to
# avoid starting multiple agents for same cluster. If the path
# does not exist, Failover Manager will attempt to create it. If
# not specified defaults to '/var/lock/efm-<version>'
lock.dir=
Use the log.dir property to specify thelocation towhich agentlog files will be written;
FailoverManagerwill attempt to createthe directory ifthe directorydoesnotexist.
# Specify the directory of agent logs on the node. If the path
# does not exist, Failover Manager will attempt to create it. If
# not specified defaults to '/var/log/efm-<version>'. (To store
# Failover Manager startup logs in a custom location, modify the
# path in the service script to point to an existing, writable
# directory.)
# If using a custom log directory, you must configure
# logrotate separately. Use 'man logrotate' for more information.
log.dir=
Afterenabling the UDPorTCP protocolon a FailoverManagerhost,youcanenable
logging to syslog. Use thesyslog.protocol parameterto specify theprotocoltype
(UDP or TCP) and the syslog.port parameterto specify the listenerport ofthe syslog
host. The syslog.facility valuemay be usedasan identifierforthe processthat
created the entry; thevaluemust be between LOCAL0 and LOCAL7.

44
# Syslog information. The syslog service must be listening on
# the port for the given protocol, which can be UDP or TCP.
# The facilities supported are LOCAL0 through LOCAL7.
syslog.host=localhost
syslog.port=514
syslog.protocol=UDP
syslog.facility=LOCAL1
Use the file.log.enabled and syslog.enabled propertiesto specify the typeof
logging thatyouwish to implement. Set file.log.enabled to true to enable logging
to a file; enable the UDPprotocolorTCPprotocoland set syslog.enabled to true to
enable loggingto syslog. You can enable loggingto botha file and syslog.
# Which logging is enabled.
file.log.enabled=true
syslog.enabled=false
Formore information aboutconfiguring syslog logging,seeSection 6.1.
Use the jgroups.loglevel andefm.loglevel parametersto specify the levelof
detaillogged by FailoverManager. The default value is INFO. Formore information
about logging,seeSection6,ControllingLogging.
# Logging levels for JGroups and EFM.
# Valid values are: TRACE, DEBUG, INFO, WARN, ERROR
# Default value: INFO
# It is not necessary to increase these values unless debugging a
# specific issue. If nodes are not discovering each other at
# startup, increasing the jgroups level to DEBUG will show
# information about the TCP connection attempts that may help
# diagnose the connection failures.
jgroups.loglevel=INFO
efm.loglevel=INFO
Use the jvm.options propertyto passJVM-relatedconfigurationinformation. The
default settingspecifies theamount ofmemory that the FailoverManageragent will be
allowed to use.
# Extra information that will be passed to the JVM when starting
# the agent.
jvm.options=-Xmx128m

45
3.5.1.2 Encrypting Your Database Password
FailoverManagerrequires youto encrypt yourdatabase passwordbeforeincluding it in
the clusterpropertiesfile. Use the efm utility (located in the /usr/edb/efm-3.4 /bin
directory)to encrypt thepassword. When encryptinga password,youcan eitherpassthe
passwordon thecommand line when you invoke the utility,orusethe EFMPASS
environmentvariable.
To encrypt a password,usethe command:
# efm encrypt cluster_name [ --from-env ]
Where cluster_name specifies thename ofthe FailoverManagercluster.
If you include the --from-env option,youmust export thevalueyouwish to encrypt
before invoking the encryptionutility. Forexample:
export EFMPASS=password
If you do not include the--from-env option,FailoverManagerwill prompt you to
enterthe database password twice before generatingan encryptedpassword foryouto
place in yourclusterpropertyfile. When the utility sharestheencrypted password,copy
and paste theencrypted password into the clusterproperty files.
Please note: Many Java vendors shiptheirversionofJava with full-strengthencryption
included,butnotenableddue toexport restrictions. Ifyou encounteran error thatrefers
to an illegal key size when attempting toencrypt thedatabasepassword,you should
downloadand enable a Java Cryptography Extension (JCE)that providesan unlimited
policy foryourplatform.
The following example demonstratesusingtheencrypt utility to encrypt a password for
the acctg cluster:
# efm encrypt acctg
This utility will generate an encrypted password for you to place
in your EFM cluster property file:
/etc/edb/efm-3.4/acctg.properties
Please enter the password and hit enter:
Please enter the password again to confirm:
The encrypted password is: 516b36fb8031da17cfbc010f7d09359c
Please paste this into your acctg.properties file
db.password.encrypted=516b36fb8031da17cfbc010f7d09359c

46
Please note:the utility will notify you ifa propertiesfile does notexist.
Afterreceiving yourencryptedpassword,paste the passwordinto thepropertiesfile and
start the FailoverManagerservice. Ifthere is a problemwith the encrypted password,the
FailoverManagerservice willnot start:
[witness@localhost ~]# service efm-3.4 start
Starting local efm-3.4 service: [FAILED]
If you receive this message whenstartingtheFailoverManagerservice,pleasesee the
startuplog (located in /var/log/efm-3.4/startup-efm.log)formore information.
If you are using RHEL 7.x or CentOS7.x, startup information is alsoavailable with the
following command:
systemctl status efm-3.4
To prevent a clusterfrominadvertently connectingto thedatabaseofanothercluster,the
clustername is incorporatedinto theencrypted password. Ifyou modify the cluster
name,you will need to re-encrypt the databasepassword and update theclusterproperties
file.
Using the EFMPASSEnvironment Variable
The following example demonstratesusingthe--from-env environmentvariable when
encrypting a password. Before invokingthe efm encrypt command,setthe value of
EFMPASS to the password (1safepassword):
# export EFMPASS=1safepassword
Then,invoke efm encrypt,specifyingthe --from-env option:
# efm encrypt acctg --from-env
# 7ceecd8965fa7a5c330eaa9e43696f83
The encryptedpassword(7ceecd8965fa7a5c330eaa9e43696f83)is returned asa
text value; when usinga script,youcan checktheexit code ofthe command to confirm
that the command succeeded. A successfulexecution returns 0.

47
3.5.2 The Cluster Members File
Each node in a FailoverManagerclusterhasa clustermembers file. When an agent
starts,it usesthe file to locate otherclustermembers. The FailoverManagerinstaller
creates a file template forthe clustermembers file named efm.nodes.in in the
/etc/edb/efm-3.4 directory. Aftercompletingthe FailoverManagerinstallation,you
must make a working copy ofthe template:
# cp /etc/edb/efm-3.4/efm.nodes.in /etc/edb/efm-3.4/efm.nodes
Aftercopying thetemplatefile,change the ownerofthe file to efm:
chown efm:efm efm.nodes
By default,FailoverManagerexpectsthe clustermembers file to be named efm.nodes.
If you name the clustermembers file somethingotherthan efm.nodes,you mustmodify
the FailoverManagerservice script to instructFailoverManagerto usethe new name.
The clustermembers file on the first nodestarted canbe empty; thisnodewill become
the Membership Coordinator. On each subsequentnode,theclustermemberfile must
contain theaddressandportnumberofthe Membership Coordinator. Each entryin the
clustermembers file must be listed in an address:port format,with multiple entries
separatedby white space.
The Membership Coordinator will updatethe contentsofthe efm.nodes file to match the
current members ofthe cluster. As agents join orleave the cluster,the efm.nodes files
on otheragents are updatedto reflect the currentclustermembership. Ifyou invoke the
efm stop-cluster command,FailoverManagerdoesnot modify the file.
If the Membership Coordinatorleavesthecluster,anothernode willassume the role.
You can use theefm cluster-status command to find theaddressofthe Membership
Coordinator. Ifa node joins orleaves a clusterwhile an agent is down,youmust
manually ensurethatthe file includesat leastthe currentMembershipCoordinator.
If you knowthe IPaddressesandportsofthe nodesthatwill be joining the cluster,you
can include the addressesin the clustermembers file at any time. At startup,any
addressesthat donot identify clustermembers willbe ignored unlessthe
auto.allow.hosts property(in the clusterproperties file)is set to true. Formore
information,see Section4.1.2.
If the stable.nodes.file property is setto true,the MembershipCoordinatorwillnot
updatethe .nodes file when clustermembers join orleave the cluster; this behavioris
most usefulwhen the IPaddressesofclustermembers donotchange often. For
information about modifyingclusterproperties,see Section 3.5.1.1.

48
3.6 Using Failover Manager with Virtual IP Addresses
FailoverManagerusestheefm_address script to assignorreleasea virtualIPaddress.
Please note that virtualIPaddressesare not supportedby many cloud providers.In those
environments,anothermechanismshould be used(suchas an Elastic IPAddresson
AWS),which can be changedwhenneeded bya fencingorpost-promotionscript.
By default,the script residesin:
/usr/edb/efm-3.4/bin/efm_address
Use the following command variationsto assign orrelease an IPv4orIPv6IP address.
To assigna virtualIPv4IP address:
# efm_address add4 interface_name IPv4_addr/prefix
To assigna virtualIPv6IP address:
# efm_address add6 interface_name IPv6_addr/prefix
To release a virtualaddress:
# efm_address del interface_name IP_address/prefix
Where:
interface_name matchesthe name specified in the virtualIp.interface
propertyin the clusterpropertiesfile.
IPv4_addr or IPv6_addr matches the name specified in the virtualIp
propertyin the clusterpropertiesfile.
prefix matches the value specified in the virtualIp.prefix propertyin the
clusterpropertiesfile.
Formore information aboutpropertiesthatdescribe a virtualIPaddress,see Section
3.5.1.1.
You must invoke the efm_address script astheroot user. The efm useris created
during the installation,andis grantedprivilegesin the sudoers file to run the
efm_address script. Formore information aboutthe sudoers file,see Section 3.4,
Extending FailoverManagerPermissions.

49
Testing the VIP
When usinga virtualIP(VIP) addresswith FailoverManager,it is important to testthe
VIP functionality manually before startingfailovermanager. This will catch any
network-relatedissuesbefore they cause a problemduring an actualfailover. The
following stepstestthe actionsthat failovermanagerwill take. The example usesthe
following propertyvalues:
virtualIp=172.24.38.239
virtualIp.interface=eth0
virtualIp.prefix=24
pingServerCommand=/bin/ping -q -c3 -w5
Please note:the virtualIp.prefix specifies the numberofsignificant bitsin the
virtualIp address.
When instructedto ping the VIPfrom a node,use the commanddefinedbythe
pingServerCommand property.
1. Ping the VIP from all nodesto confirmthat the addressis notalreadyin use:
# /bin/ping -q -c3 -w5 172.24.38.239
PING 172.24.38.239 (172.24.38.239) 56(84) bytes of data.
--- 172.24.38.239 ping statistics ---
4 packets transmitted, 0 received, +3 errors, 100% packet
loss, time 3000ms
You should see 100% packet loss.
2. Run the efm_address add4 command on the Masternode to assignthe VIPand then
confirmwith ip address:
# efm_address add4 eth0 172.24.38.239/24
# ip address
<output truncated>
eth0 Link encap:Ethernet HWaddr 36:AA:A4:F4:1C:40
inet addr:172.24.38.239 Bcast:172.24.38.255
...
3. Ping the VIP from the othernodesto verify that theycan reachtheVIP:
# /bin/ping -q -c3 -w5 172.24.38.239
--- 172.24.38.239 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time
1999ms
rtt min/avg/max/mdev = 0.023/0.025/0.029/0.006 ms

50
You should see no packetloss.
4. Use the efm_address del command to release the addressonthe masternodeand
confirmthe nodehasbeenreleasedwith ip address:
# efm_address del eth0 172.24.38.239/24
# ip address
eth0 Link encap:Ethernet HWaddr 22:00:0A:89:02:8E
inet addr:10.137.2.142 Bcast:10.137.2.191
...
The output fromthis stepshould not showan eth0interface
5. Repeat step 3,this time verifying thatthe Standby andWitnessdo not see theVIP in
use:
# /bin/ping -q -c3 -w5 172.24.38.239
--- 172.24.38.239 ping statistics ---
4 packets transmitted, 0 received, +3 errors, 100% packet
loss, time 3000ms
You should see 100% packet loss. Repeat this stepon allnodes.
6. Repeat step 2on allStandbynodesto assign the VIPto every node. You can ping the
VIP from any nodeto verify that it is in use.
# efm_address add4 eth0 172.24.38.239/24
# ip address
<output truncated>
eth0 Link encap:Ethernet HWaddr 36:AA:A4:F4:1C:40
inet addr:172.24.38.239 Bcast:172.24.38.255
...
Afterthe teststeps above,releasethe VIPfromany non-Masternode before attempting
to start FailoverManager.
Please note:the networkinterfaceusedforthe VIP does not have tobe the same interface
used forthe FailoverManageragent's bind.address value. Themasteragent will drop
the VIP as needed duringa failover,and FailoverManagerwill verify that the VIP is no
longeravailable before promotinga standby. A failure ofthe bind addressnetworkwill
lead to masterisolationand failover.
If the VIP uses a different interface, youmay encountera timing conditionwhere the rest
of the clusterchecksfora reachable VIPbefore the masteragent hasdropped it. In this
case,EFM will retry the VIP checkforthe numberofsecondsspecified in the
node.timeout propertyto help ensurethata failoverhappensas expected.

51

52
4 Using Failover Manager
FailoverManageroffers support formonitoringand failoverofclusterswith one ormore
Standbyservers. You can add orremove nodesfromthe clusteras yourdemandfor
resourcesgrowsorshrinks.
If a Masternodereboots,FailoverManagermay detectthe database is downon the
Masternodeandpromote a Standbynodeto therole ofMaster.Ifthis happens,the
FailoverManageragenton the(rebooted)Masternode willnot get a chanceto write the
recovery.conf file; the rebootedMasternodewill return to the clusteras a second
Masternode. To prevent this,start theFailoverManageragent before startingthe
database server. The agent willstart in idle mode,and checkto see ifthere is already a
masterin the cluster.Ifthere is a masternode,theagentwill verify that a
recovery.conf file exists,and the database willnot start as a secondmaster.
4.1 Managing a Failover Manager Cluster
Once configured,a FailoverManagerclusterrequires noregular maintenance. The
following sectionsprovideinformationaboutperforming the management tasksthat may
occasionally be requiredby a FailoverManagerCluster.
By default,some ofthe commandslistedbelowmust beinvoked by efm orby an OS
superuser;an administratorcan selectively permit users toinvoke thesecommandsby
adding the userto the efm group. Thecommandsare:
 efm allow-node
 efm disallow-node
 efm promote
 efm resume
 efm set-priority
 efm stop-cluster
 efm upgrade-conf

53
4.1.1 Starting the Failover Manager Cluster
You can start the nodesofa FailoverManagerclusterin any order.
To start the FailoverManagercluster on RHEL 6.x orCentOS6.x, assume superuser
privileges,and invoke thecommand:
To start the FailoverManagerclusteron RHEL 7.x orCentOS7.x, assume superuser
If the clusterpropertiesfile for the node specifiesthat is.witness is true,the node
will start as a Witnessnode.
If the node is not a dedicatedWitnessnode,FailoverManagerwill connectto thelocal
database andinvoke the pg_is_in_recovery() function. Ifthe serverresponds
false, the agentassumesthenode is a Masternode,andassigns a virtualIPaddressto
the node (ifapplicable). Ifthe serverresponds true,the FailoverManageragent
assumesthat the nodeis a Standby server. Ifthe serverdoesnot respond,theagentwill
start in an idle state.
Afterjoining the cluster,theFailoverManageragent checksthesupplied database
credentialsto ensure that it can connectto allofthe databaseswithin the cluster. Ifthe
agent cannot connect,the agentwill shut down.
If a newmasterorstandbynodejoins a cluster,allofthe existing nodeswill also confirm
that theycan connect to the databaseon thenewnode.
4.1.2 Adding Nodes to a Cluster
You can add a nodeto a FailoverManagerclusterat any time. Whenyouadda nodeto a
cluster,youmust modify theclusterto allowthe newnode,and thentellthe newnode
howto find the cluster. The following stepsdetailaddinga nodeto a cluster:
1. Unless auto.allow.hosts is setto true,use the efm allow-node command,
to add the IPaddressofthe newnodeto the FailoverManager allowed node host
list. When invokingthe command,specify the clustername andthe IPaddressof
the newnode:
efm allow-node cluster_name ip_address

54
Formore information aboutusingthe efm allow-node command orcontrolling
a FailoverManagerservice,see Section5.
Installa FailoverManageragentand configure the clusterpropertiesfile on the
newnode. Formore information about modifyingthepropertiesfile,see Section
3.5.1.
2. Configure the clustermembers file on the newnode,adding an entry forthe
Membership Coordinator. Formore information aboutmodifyingthe cluster
members file, see Section3.5.2.
3. Assume superuserprivilegesonthe newnode,andstart theFailoverManager
agent. To start theFailoverManagerclusteron RHEL6.x or CentOS6.x, assume
superuserprivileges,and invoke thecommand:
To start the FailoverManagerclusteron RHEL 7.x orCentOS7.x, assume
superuserprivileges,and invoke thecommand:
When thenewnode joinsthecluster,FailoverManagerwill send a notificationto the
administratoremail providedin the user.email property,and/orwill invoke the
specified notificationscript.
Please Note:To be a usefulStandbyforthe current node,the node must bea standbyin
the PostgreSQLStreaming Replication scenario.
4.1.3 Changing the Priority of a Standby
If yourFailoverManagerclusterincludesmore thanoneStandby server,youcan usethe
efm set-priority command to influence thepromotionpriority ofa Standbynode.
Invoke the command onanyexisting memberofthe FailoverManagercluster,and
specify a priority value afterthe IPaddressofthe member.
Forexample, the following command instructsFailoverManagerthat the acctg cluster
member that is monitoring 10.0.1.9:7800 is the primary Standby(1):
efm set-priority acctg 10.0.1.9:7800 1
In the event ofa failover,FailoverManagerwill first retrieve information fromPostgres
streaming replicationto confirmwhich Standbynode hasthe mostrecent data,and
promote the node with theleastchance ofdata loss. Iftwo Standby nodescontain
equally up-to-datedata,the nodewith a higheruser-specified priority valuewill be

55
promoted to Master. To checkthe priorityvalueofyourStandbynodes,use the
command:
efm cluster-status cluster_name
Please note:The promotion prioritymay change ifa node becomesisolatedfromthe
cluster,and laterre-joinsthe cluster.
4.1.4 Promoting a Failover Manager Node
You can invoke efm promote on any nodeofa FailoverManagerclusterto start a
manualpromotion ofa Standbydatabaseto Masterdatabase.
Manualpromotionshould only be performed duringa maintenance windowforyour
database cluster. Ifyou do not have anup-to-date Standbydatabase available,youwill
be prompted before continuing. To starta manualpromotion,assume the identity ofefm
or the OSsuperuser,andinvoke the command:
efm promote cluster_name [-switchover]
[-sourcenode <address>] [-quiet]
Where:
cluster_name is the name ofthe FailoverManagercluster.
Include the –switchover option to reconfigure the originalMasteras a Standby.
If you include the –switchover keyword,the clustermustincludea masternode
and at least onestandby,and the nodesmust be in sync.
Include the –sourcenode keyword tospecify the nodefromwhich the
recovery.conf file will be copied to the master.
Include the -quiet switchto suppressnotifications duringswitchover.
During switchover:
 A recovery.conf file is copied froman existing standbyto the masternode.
 The masterdatabase is stopped.
 If you are using a VIP, the addressis releasedfromthe masternode.
 A standbyis promoted toreplace the masternode,andacquirestheVIP.
 The addressofthe newmasternode is addedto therecovery.conf file.
 The old masteris restarted;theagent willresume monitoring it as a standby.

56
During a manualpromotion,the Masteragent releasesthe virtualIPaddressbefore
creating a recovery.conf file in the directory specified by the db.recovery.conf.dir
property. The Masteragentremains running,andassumesa statusof Idle.
The Standby agent confirms thatthe virtualIPaddressis no longerin use before pinging
a well-known addressto ensure thatthe agent is not isolated fromthe network. The
Standbyagentrunsthefencing script andpromotesthe Standbydatabase to Master. The
StandbyagentthenassignsthevirtualIPaddressto theStandbynode,andrunsthe post-
promotion script (ifapplicable).
Please note that this command instructs the serviceto ignore the value specified in the
auto.failover parameterin the clusterpropertiesfile.
To return a node to therole ofmaster,place the node first in the promotion list:
efm set-priority cluster_name ip_address priority
Then,performa manualpromotion:
efm promote cluster_name -switchover
Formore information aboutusingthe efmutility,see Section5.3.
4.1.5 Stopping a Failover Manager Agent
When you stop an agent,FailoverManagerwill remove the node'saddressfromthe
clustermembers list on allofthe running nodesofthe cluster,but willnot remove the
address fromthe FailoverManager Allowed node host list.
To stop theFailoverManageragent onRHEL 6.x orCentOS6.x, assume superuser
service efm-3.4 stop
To stop theFailoverManageragent onRHEL 7.x orCentOS7.x, assume superuser
systemctl stop efm-3.4
Until you invoke theefm disallow-node command (removing the node's addressof
the node fromthe Allowed node host list), you can use theservice efm-3.4
start command to restart thenode at a laterdate withoutfirst runningthe efm allow-
node command again.

57
Please note that stopping an agent doesnot signalthe clusterthatthe agent hasfailed.
4.1.6 Stopping a Failover Manager Cluster
To stop a FailoverManagercluster,connectto anynode ofa FailoverManagercluster,
assume theidentity ofefm orthe OSsuperuser,andinvoke the command:
efm stop-cluster cluster_name
The command will cause all FailoverManageragentsto exit. Terminating the Failover
Manageragents completely disables allfailoverfunctionality.
Please Note:when youinvoke theefm stop-cluster command,allauthorized node
information is lost fromthe Allowed node host list.
4.1.7 Removing a Node from a Cluster
The efm disallow-node command removesthe IPaddressofa node fromthe
FailoverManagerAllowed node host list. Assume theidentity ofefmorthe OS
superuseron any existingnode (that is currently part ofthe runningcluster),and invoke
the efm disallow-node command,specifyingtheclustername andthe IPaddressof
the node:
efm disallow-node cluster_name ip_address
The efm disallow-node command will not stopa runningagent; the service will
continue to run onthe node untilyou stoptheagent (forinformation aboutcontrolling the
agent,seeSection5). If the agent orclusteris subsequently stopped,the nodewill not be
allowed to rejoin the cluster, and willbe removed fromthe failoverpriority list (and will
be ineligible for promotion).
Afterinvoking theefm disallow-node command,you mustuse theefm allow-
node command to add thenode to the clusteragain. Formore information aboutusing
the efmutility,see Section 5.3.

58
4.2 Monitoring a Failover Manager Cluster
You can use eitherthe FailoverManager efm cluster-status command orthe PEM
Client graphicalinterface to checkthe current status ofa monitorednode ofa Failover
Managercluster.
4.2.1 Reviewing the Cluster Status Report
The cluster-status command returnsa report that contains information about the
status oftheFailoverManagercluster. To invoke the command,enter:
# efm cluster-status cluster_name
The following statusreport is fora cluster namededb that has fournodesrunning:
efm cluster-status efm
Cluster Status: efm
Agent Type Address Agent DB VIP
-----------------------------------------------------
Witness 172.19.12.170 UP N/A
Master 172.19.13.105 UP UP 172.19.13.107*
Standby 172.19.13.113 UP UP 172.19.13.106
Standby 172.19.14.106 UP UP 172.19.13.108
Allowed node host list:
172.19.12.170 172.19.13.113 172.19.13.105 172.19.14.106
Membership coordinator: 172.19.12.170
Standby priority host list:
172.19.13.113 172.19.14.106
Promote Status:
DB Type Address XLog Loc Info
-------------------------------------------------------
Master 172.19.13.105 0/31000140
Standby 172.19.13.113 0/31000140
Standby 172.19.14.106 0/31000140
Standby database(s) in sync with master. It is safe to
promote.
[root@FOUR efm-3.4]}:
The Cluster Status section providesan overviewofthe statusofthe agentsthat reside
on each node ofthe cluster:

59
Cluster Status: efm
-----------------------------------------------------
Witness 172.19.12.170 UP N/A
Master 172.19.13.105 UP UP 172.19.13.107*
Standby 172.19.13.113 UP UP 172.19.13.106
Standby 172.19.14.106 UP UP 172.19.13.108
The asterisk(*)afterthe VIP addressindicatesthatthe addressis available for
connections. Ifa VIP addressis not followedby an asterisk,the addresshasbeen
associatedwith the node (in the propertiesfile),but the addressis not currently in use.
FailoverManageragentsprovide the informationdisplayedin the Cluster Status
section.
The Allowed node host list and Standby priority host list provide an
easy way to tellwhich nodesare allowed to join the cluster,and the promotionorderof
the nodes. TheIPaddressofthe Membership coordinator is alsodisplayedin the
report:
Allowed node host list:
172.19.12.170 172.19.13.113 172.19.13.105 172.19.14.106
Membership coordinator: 172.19.12.170
Standby priority host list:
172.19.13.113 172.19.14.106
The Promote Status section ofthe reportis the result ofa direct query fromthe node
on which you are invokingthecluster-status commandto eachdatabasein the
cluster; thequery alsoreturnsthetransactionlog location ofeachdatabase.
Promote Status:
DB Type Address XLog Loc Info
-------------------------------------------------------
Master 172.19.13.105 0/31000140
Standby 172.19.13.113 0/31000140
Standby 172.19.14.106 0/31000140
Standby database(s) in sync with master. It is safe to promote.
If a databaseis down (orifthe database hasbeenrestarted,butthe resume commandhas
not yet beeninvoked),the stateofthe agentthatresideson thathostwill be Idle. If an
agent is idle,the clusterstatusreport willinclude a summary ofthe conditionofthe idle
node:

60
-----------------------------------------------------
Idle 172.19.18.105 UP UP 172.19.13.105
Exit Codes
The clusterstatus processreturnsan exit code thatis basedon thestateofthe cluster:
 An exit code of0 indicatesthatallagentsare running,and the databasesonthe
MasterandStandbynodesare running and in sync.
 A non-zero exit code indicatesthat there is a problem. The following problems
can triggera non-zero exit code:
A databaseis down orunknown (orhasan idle agent).
FailoverManagercannot decryptthe provideddatabasepassword.
There is a problemcontactingthe databasestoget xlog locations.
There is no Masteragent.
There are no Standbyagents.
One ormore Standbynodes are not in sync with the Master.

61
4.2.2 Monitoring Streaming Replication with Postgres Enterprise
Manager
If you use Postgres Enterprise Manager(PEM)to monitoryourservers,you can
configure theStreaming Replication Analysisdashboard (part ofthePEM graphical
interface)to displaythestateofa MasterorStandbynodethatis part ofa Streaming
Replication scenario.
Figure 4.1 - The Streaming Replication dashboard (Master node)
The Streaming Replication Analysis Dashboard (shown in Figure 4.1)displaysstatistical
information about activity foranymonitoredserveron which streaming replication is

62
enabled. The dashboardheaderidentifiesthe statusofthe monitored server(either
Replication Master or Replication Slave),and displaysthe date and time that
the serverwas last started,the date andtime that the pagewas lastupdated,anda current
count oftriggered alertsforthe server.
When reviewingthedashboard fora Replication Slave (a Standby node),a labelat the
bottomofthe dashboard confirms the statusofthe server(seeFigure 4.2).
Figure 4.2 - The Streaming Replication dashboard (Standby node)
By default,the PEM replicationprobes that provide informationforthe Streaming
Replication Analysis dashboard are disabled.

63
To viewthe Streaming Replication Analysis dashboardforthe Masternode ofa
replication scenario,youmust enable the followingprobes:
 Streaming Replication
 WALArchive Status
To viewthe Streaming Replication Analysis dashboardforthe Standbynode ofa
replication scenario,youmust enable the followingprobes:
 Streaming Replication Lag Time
Formore information aboutPEM,please visit the EnterpriseDBwebsite at:
http://www.enterprisedb.com/products-services-training/products/postgres-enterprise-manager

64
4.3 Running Multiple Agents on a Single Node
You can monitormultiple database clustersthat reside onthe same hostby running
multiple MasterorStandbyagentson that FailoverManagernode. You may also run
multiple Witnessagentson a single node. To configure FailoverManagerto monitor
more than one database cluster,while ensuringthatFailoverManageragentsfrom
different clustersdo notinterfere with eachother,you must:
1. Create a clusterpropertiesfile foreach memberof each clusterthatdefinesa
unique setofpropertiesandtherole ofthe node within the cluster.
2. Create a clustermembers file for each memberof each clusterthat liststhe
members ofthe cluster.
3. Customize the servicescript(on a RHEL or CentOS6.x system)orthe unit file
(on a RHEL or CentOS7.x system) foreach clusterto specify the names ofthe
clusterpropertiesand the clustermembers files.
4. Start the servicesforeach cluster.
The examples that followusestwo database clusters(acctg and sales)running onthe
same node:
 Data foracctg resides in /opt/pgdata1; its serveris monitoringport5444.
 Data forsales resides in /opt/pgdata2; its serveris monitoring port 5445.
To run a FailoverManageragent forbothofthesedatabaseclusters,use the
efm.properties.in template to create two propertiesfiles. Each clusterpropertiesfile
must have a unique name. Forthis example,we create acctg.properties and
sales.properties to match the acctg and sales databaseclusters.
The following parametersmust beunique in eachclusterpropertiesfile:
admin.port
bind.address
db.port
db.recovery.conf.dir
virtualIp (if used)
virtualIp.interface (if used)
Within each clusterpropertiesfile,the db.port parametershould specify a uniquevalue
for each cluster,while the db.user and db.database parametermay havethe same
value ora unique value. Forexample,the acctg.properties file may specify:

65
db.user=efm_user
db.password.encrypted=7c801b32a05c0c5cb2ad4ffbda5e8f9a
db.port=5444
db.database=acctg_db
While the sales.properties file may specify:
db.user=efm_user
db.password.encrypted=e003fea651a8b4a80fb248a22b36f334
db.port=5445
db.database=sales_db
Some parametersrequire specialattention when settingup more thanoneFailover
Managerclusteragenton the same node. Ifmultiple agentsreside onthe same node,
each port must be unique. Any twoports willwork, but it may be easierto keep the
information clearif using ports thatare not tooclose to each other.
When creatingthe clusterpropertiesfile for each cluster,the db.recovery.conf.dir
parameters mustalso specify valuesthat are unique foreach respective database cluster.
The following parametersare used when assigningthevirtualIPaddressto a node. If
yourFailoverManagerclusterdoesnot use a virtualIPaddress,leavetheseparameters
blank.
virtualIp
virtualIp.interface
virtualIp.prefix
This parametervalue is determinedby thevirtualIPaddressesbeingusedandmay or
may not be the same forboth acctg.properties andsales.properties.
Aftercreating theacctg.properties and sales.properties files,create a service
script orunit file foreach clusterthat pointsto the respectiveproperty files; thisstepis
platformspecific. If you are usingRHEL 6.x orCentOS6.x, see Section 4.3.1; if you are
using RHEL 7.x orCentOS7.x, see Section 4.3.2.
Please note: Ifyou are usinga customservice scriptorunit file,you must manually
updatethe file to reflect the newservicename when youupgrade FailoverManager.

66
4.3.1 RHEL 6.x or CentOS 6.x
If you are using RHEL 6.x or CentOS6.x, you should copy the efm-3.4 servicescriptto
newfile with a name that is uniqueforeach cluster. Forexample:
# cp /etc/init.d/efm-3.4 /etc/init.d/efm-acctg
# cp /etc/init.d/efm-3.4 /etc/init.d/efm-sales
Then edit the CLUSTER variable,modifying theclustername fromefm to acctg or
sales.
Aftercreating theservice scripts,run:
# chkconfig efm-acctg on
# chkconfig efm-sales on
Then,use the newservice scriptsto start the agents. Forexample,you can startthe
acctg agent with the command:
# service efm-acctg start

67
4.3.2 RHEL 7.x or CentOS 7.x
If you are using RHEL 7.x or CentOS7.x, you should copythe efm-3.4 unit file to new
file with a name that is uniqueforeach cluster. Forexample, if you have two clusters
(named acctg and sales),the unit file names might be:
/etc/systemd/system/efm-acctg.service
/etc/systemd/system/efm-sales.service
Then,edit the CLUSTER variable within each unit file,changingthe specified cluster
name from efm to the newclustername. Forexample,for a clusternamed acctg,the
value would specify:
Environment=CLUSTER=acctg
You must also update the valueofthe PIDfile parameterto specify the newcluster
name. For example:
PIDFile=/var/run/efm-3.4/acctg.pid
Aftercopying theservice scripts,usethe followingcommandsto enable the services:
# systemctl enable efm-acctg.service
# systemctl enable efm-sales.service
Then,use the newservice scriptsto start the agents. Forexample,you can startthe
acctg agent with the command:
# systemctl start efm-acctg
For information about customizing a unit file, please visit:
http://fedoraproject.org/wiki/Systemd#How_do_I_customize_a_unit_file.2F_add_a_custom_unit
_file.3F

68
5 Controlling the Failover Manager
Service
Each node in a FailoverManagerclusterhosts a FailoverManageragent thatis controlled
by a service script. By default,the servicescriptexpectsto find:
 A configuration file named efm.properties thatcontainsthepropertiesusedby
the FailoverManagerservice. Each node ofa replication scenario must contain a
propertiesfile that providesinformationabout thenode.
 A clustermembers file named efm.nodes that containsa list ofthe cluster
members. Each node ofa replication scenario must containa clustermembers
list.
Note that ifyou are runningmultiple clusterson a single node you willneed to manually
create configurationfiles with cluster-specific names andmodify the service script forthe
corresponding clusters.
The commands that controlthe FailoverManagerserviceare platform-specific; for
information about controlling FailoverManageron a RHEL 6.x orCentOS6.x host,see
Section 5.1. If you are usingRHEL 7.x orCentOS7.x, see Section 5.2.
5.1 Using the service Utility on RHEL 6.x and CentOS 6.x
On RHEL 6.x and CentOS6.x, FailoverManagerrunsas a Linuxservice named (by
default)efm-3.4 that is located in /etc/init.d. Each database clustermonitored by
FailoverManagerwill run a copy ofthe service on each node ofthe replication cluster.
Use the following service commandsto controla FailoverManageragent thatresides
on a RHEL 6.x orCentOS6.x host:
The start command startstheFailoverManageragent onthe currentnode. The
localFailoverManageragent monitorsthe localdatabaseandcommunicateswith
FailoverManageron theothernodes. You can startthe nodesin a Failover
Managerclusterin any order.
This command must be invokedby root.

69
service efm-3.4 stop
Stop the FailoverManageron the currentnode. This command must be invoked
by root.
service efm-3.4 status
The statuscommand returnsthe statusofthe FailoverManageragent onwhich it
is invoked. You can invoke the status commandon anynode toinstruct
FailoverManagerto returnstatusinformation. Forexample:
[witness@localhost ~]# service efm-3.4 status
efm-3.4 (pid 50836) is running...
service efm-3.4 help
Display online help forthe FailoverManagerservice script.

70
5.2 Using the systemctl Utility on RHEL 7.x and CentOS 7.x
On RHEL 7.x and CentOS7.x, FailoverManagerrunsas a Linuxservice named (by
default)efm-3.4.service that is located in /usr/lib/systemd/system. Each
database clustermonitoredby FailoverManagerwillrun a copy ofthe service oneach
node ofthe replication cluster.
Use the following systemctl commandsto controla FailoverManageragent that
resides ona RHEL 7.x orCentOS7.x host:
The start command startstheFailoverManageragent onthe currentnode. The
localFailoverManageragent monitorsthe localdatabaseandcommunicateswith
FailoverManageron theothernodes. You can startthe nodesin a Failover
Managerclusterin any order.
This command must be invokedby root.
systemctl stop efm-3.4
Stop the FailoverManageron the currentnode. This command must be invoked
by root.
systemctl status efm-3.4
The statuscommand returnsthe statusofthe FailoverManageragent onwhich it
is invoked. You can invoke the status commandon anynode toinstruct
FailoverManagerto returnstatusand serverstartup information.
[root@ONE ~]}> systemctl status efm-3.4
efm-3.4.service - EnterpriseDB Failover Manager 3.4
Loaded: loaded (/usr/lib/systemd/system/efm-3.4.service;
disabled; vendor preset: disabled)
Active: active (running) since Wed 2013-02-14 14:02:16
EST; 4s ago
Process: 58125 ExecStart=/bin/bash -c /usr/edb/efm-
3.4/bin/runefm.sh start ${CLUSTER} (code=exited,
status=0/SUCCESS)
Main PID: 58180 (java)
CGroup: /system.slice/efm-3.4.service
└─58180 /usr/lib/jvm/java-1.8.0-openjdk-
1.8.0.161-0.b14.el7_4.x86_64/jre/bin/java -cp /usr/edb/efm-
3.4/lib/EFM-3.4.0.jar -Xmx128m -agentlib:jdwp=transport...

71
5.3 Using the efm Utility
FailoverManagerprovidesthe efm utility to assist with clustermanagement. The RPM
installeraddsthe utility to the /usr/edb/efm-3.4/bin directory whenyou install
FailoverManager.
efm allow-node cluster_name
Invoke the efm allow-node command to allowthe specified node to join the
cluster. When invokingthe command,providethe name ofthe clusterand the IP
address ofthe joiningnode.
This command must be invokedby efm,a member ofthe efm group, or root.
efm cluster-status cluster_name
Invoke the efm cluster-status command to displaythe statusofa Failover
Managercluster. Formore information aboutthe clusterstatusreport,see Section
4.2.1.
efm cluster-status-json cluster_name
Invoke the efm cluster-status-json command to display thestatusofa
FailoverManagerclusterin json format. While the format ofthe displayed
information is different thanthe displaygenerated bythe efm cluster-status
command,the information source is the same.
The following example is generated byqueryingthestatusofa healthycluster
with two nodes:
{
"nodes": {
"172.16.144.176": {
"type": "Witness",
"agent": "UP",
"db": "N/A",
"vip": "",
"vip_active": false
},
"172.16.144.177": {
"type": "Master",
"agent": "UP",
"db": "UP",
"vip": "",
"vip_active": false,
"xlog": "2/77000220",
"xloginfo": ""
},
"172.16.144.180": {
"type": "Standby",

72
"agent": "UP",
"db": "UP",
"vip": "",
"vip_active": false,
"xlog": "2/77000220",
"xloginfo": ""
}
},
"allowednodes": [
"172.16.144.177",
"172.16.144.160",
"172.16.144.180",
"172.16.144.176"
],
"membershipcoordinator": "172.16.144.177",
"failoverpriority": [
"172.16.144.180"
],
"minimumstandbys": 0,
"missingnodes": [],
"messages": []
}
efm disallow-node cluster_name ip_address
Invoke the efm disallow-node command to removethe specified nodefrom
the allowed hostslist,and preventthe node fromjoining a cluster. Provide the
name of the clusterand the IPaddressofthe nodewhencalling the efm
disallow-node command. This command must beinvoked byefm,a member
of the efm group, orroot.
efm encrypt cluster_name [--from-env]
Invoke the efm encrypt command to encrypt the databasepassword before
include the password in the clusterpropertiesfile. Include the --from-env option
to instructFailoverManagerto usethe value specified in the EFMPASS
environmentvariable,and execute withoutuserinput. Formore information,see
Section 3.5.1.2.
efm promote cluster_name [-switchover [-sourcenode address]
-quiet]]
The promote command instructs FailoverManagerto performa manualfailover
of standby tomaster.
Manualpromotionshould only be attempted ifthe statuscommand reportsthat
the clusterincludes a Standbynode that is up-to-datewith the Master. Ifthere is
no up-to-date Standby,FailoverManagerwill prompt you before continuing.

73
Include the –switchover clause to promote a standbynode,andreconfigure a
masternode asa standbynode. Includethe -sourcenode keyword,and specify
a node address to indicate thenode whose recovery.conf file will be copied
to the old masternode (making it a standby). Include the -quiet keywordto
suppressnotificationsduringthe switchoverprocess.
This command must be invokedby efm,a member ofthe efm group,orroot.
Please note that this command instructs the serviceto ignore the value specified in
the auto.failover parameterin the clusterpropertiesfile.
efm resume cluster_name
Invoke the efm resume command to resume monitoringa previously stopped
database. This command must beinvoked byefm,a member of the efm group,
or root.
efm set-priority cluster_name ip_address priority
Invoke the efm set-priority command to assigna failoverpriority to a
standbynode. The valuespecifiesthe orderin which the newnodewill be used
in the eventofa failover. This command must be invokedby efm,a memberof
the efm group, or root.
priority is an integervalueof1to n, where n is the numberofstandby nodes
in the list. Specify a value of 1 to indicate that thenewnode is theprimary
standby,andwill be the first node promoted in the eventofa failover. A
priority of 0 instructsFailoverManagerto notpromote thestandby.
efm stop-cluster cluster_name
Invoke the efm stop-cluster command to stop FailoverManageron allnodes.
This command instructsFailoverManagerto connectto eachnodeon thecluster
and instruct the existing members to shut down. The command hasno effect on
running databases,butwhenthe commandcompletes,there is no failover
protection in place.
Please note:whenyouinvoke the efm stop-cluster command,allauthorized
node information is removed fromthe Allowed node host list.
This command must be invokedby efm,a member ofthe efm group, or root.
efm upgrade-conf cluster_name [-source directory]
Invoke the efm upgrade-conf command to copytheconfiguration files from
an existing FailoverManager installation,and addparametersrequired bya

74
FailoverManager3.4installation. Provide thename ofthe previous clusterwhen
invoking the utility. Thiscommand mustbe invoked with root privileges.
If you are upgradingfroma FailoverManagerconfigurationthat doesnotuse
sudo,include the -source flag and specify the name ofthe directory in which
the configuration files residewhen invoking upgrade-conf.
efm --help
Invoke the efm --help command to displayonline help forthe FailoverManager
utility commands.

75
6 Controlling Logging
FailoverManagerwrites andstoresone log file peragent and one startuplog peragentin
/var/log/cluster_name-3.4 (where cluster_name specifies the name ofthe
cluster).
You can controlthe levelofdetailwritten to the agent log bymodifying the
jgroups.loglevel and efm.loglevel parametersin the clusterpropertiesfile:
# Logging levels for JGroups and EFM.
# Valid values are: TRACE, DEBUG, INFO, WARN, ERROR
# Default value: INFO
# It is not necessary to increase these values unless debugging a
# specific issue. If nodes are not discovering each other at
# startup, increasing the jgroups level to DEBUG will show
# information about the TCP connection attempts that may help
# diagnose the connection failures.
jgroups.loglevel=INFO
efm.loglevel=INFO
The logging facilities use the Javalogging library andlogging levels. The log levels(in
orderfrommost logging output toleast)are:
TRACE
DEBUG
INFO
WARN
ERROR
Forexample, if you set the efm.loglevel parameterto WARN,FailoverManagerwill
only log messagesat theWARN leveland above(WARN and ERROR).
By default,FailoverManagerlog files are rotated daily,compressed,andstoredfora
week. You can modify the file rotation schedule by changing settingsin the log rotation
file (/etc/logrotate.d/efm-3.4). Formore information about modifying thelog
rotation schedule,consult the logrotate man page:
$ man logrotate

76
6.1 Enabling syslog Log File Entries
FailoverManagersupportsforsysloglogging. To implement syslog logging,youmust
configure syslogto allow UDPorTCP connections.
To allow a connectionto syslog,edit the /etc/rsyslog.conf file and uncomment the
protocolyou wishto use. You must alsoensure that theUDPServerRun or
TCPServerRun entry associated with the protocolincludestheport numberto whichlog
entries will be sent.
Forexample, the following configuration file entries enable UDPconnectionsto port
514:
# Provides UDP syslog reception
$ModLoad imudp
$UDPServerRun 514
The following configuration file entries enable TCPconnectionsto port514:
# Provides TCP syslog reception
$ModLoad imtcp
$InputTCPServerRun 514
Aftermodifying the syslogconfiguration file, restart the rsyslog service to enable the
connections:
systemctl restart rsyslog.service
Aftermodifying the rsyslog.conffile on the FailoverManagerhost,you must modify the
FailoverManagerpropertiesto enable logging. Use yourchoice ofeditorto modify the
propertiesfile (/etc/edb/efm-3.4/efm.properties.in)specifying the type of
logging thatyouwish to implement:
# Which logging is enabled.
file.log.enabled=true
syslog.enabled=false
You must also specify syslog details foryoursystem. Use the syslog.protocol
parameterto specify theprotocoltype (UDPorTCP) and the syslog.port parameterto
specify the listenerport ofthe syslog host. Thesyslog.facility value may be used asan
identifierfor the processthatcreated the entry;the value mustbe betweenLOCAL0 and
LOCAL7.
# Syslog information. The syslog service must be listening on
# the port for the given protocol, which can be UDP or TCP.
# The facilities supported are LOCAL0 through LOCAL7.
syslog.host=localhost

77
syslog.port=514
syslog.protocol=UDP
syslog.facility=LOCAL1
Formore information aboutmodifyingFailoverManagerconfigurationfiles,pleasesee
Section 3.5.
Formore information aboutsyslog,please see thesyslogman page:
syslog man

78
7 Notifications
FailoverManagerwill send e-mailnotifications and/orinvoke a notificationscript whena
notable event occursthat affectsthecluster. Ifyou have configuredFailoverManagerto
send an emailnotification,you must have an SMTPserverrunningon port 25 on each
node ofthe cluster. Use the following parametersto configure notificationbehaviorfor
FailoverManager:
user.email
script.notification
Formore information abouteditingtheconfiguration properties,see Section 3.5.1.1.
The body ofthe notificationcontainsdetails about the eventthat triggeredthe
notification,andabout thecurrent state ofthe cluster. Forexample:
EFM node: 10.0.1.11
Cluster name: acctg
Database name: postgres
VIP: ip_address (Active|Inactive)
Database health is not being monitored.
The VIP field displaysthe IPaddressand state ofthe virtualIPif implemented forthe
node.
FailoverManagerassignsa severity levelto eachnotification. The followinglevels
indicate increasinglevelsofattention required:
INFO indicates an informationalmessageaboutthe agent anddoesnot require
any manualintervention (forexample, FailoverManagerhasstartedorstopped).
WARNING indicatesthatan eventhashappenedthat requiresthe administratorto
checkon the system(for example, failoverhas occurred).
SEVERE indicates thata seriousevent has happened and requiresthe immediate
attention ofthe administrator(forexample, failoverwas attempted,but was
unable to complete).
The severityleveldesignatesthe urgency ofthenotification. A notificationwith a
severity levelofSEVERE requires userattentionimmediately,while a notification with a
severity levelofINFO will call yourattention to operationalinformation about your
clusterthatdoesnotrequire useraction. Notificationseverity levelsare not related to
logging levels; allnotificationsare sent regardlessofthe log leveldetailspecified in the
configurationfile.

79
You can use thenotification.level property to specify theminimumseverity level
that will triggera notification; formore information,seeSection 3.5.1.1.
The conditionslistedin the table belowwill triggeran INFO levelnotification:
Subject Description
Executed fencing script Executed fencing script script_name Results:
script_results
Executed post-promotion script Executed post-promotion script script_name Results:
script_results
Executed remote pre-promotion script Executed remote pre-promotion script script_name
Results: script_results
Executed remote post-promotion script Executed remote post-promotion script script_name
Executed post-database failure script Executed post-database failure script script_name
Executed master isolation script Executed master isolation script script_name Results:
script_results
Witness agent running on node_address for
cluster cluster_name
Witness agent is running.
Master agent running on node_address for
Master agent is running and database health is being
monitored.
Standby agent running on node_address for
Standby agent is running and database health is being
monitored.
Idle agent running on node node_address
for cluster cluster_name
Idle agent is running. After starting the local database,
the agent can be resumed.
Assigning VIP to node node_address Assigning VIP VIP_address to node node_address
Releasing VIP from node node_address Releasing VIP VIP_address from node node_address
Starting auto resume check for cluster
cluster_name
The agent on this node will check every
auto.resume.period seconds to see if it can resume
monitoring the failed database. The cluster should be
checked during this time and the agent stopped if the
database will not be started again. See the agent log for
more details.
Executed agent resumed script Executed agent resumed script script_name Results:
script_results
The conditionslistedin the table belowwill triggera WARNING levelnotification:
Subject Description
Witness agent exited on node_address for
Witness agent has exited.
Master agent exited on node_address for
Cluster cluster_name notified that master
node has left
Failover is disabled for the cluster until the master agent
is restarted.
Standby agent exited on node_address for
Agent exited during promotion on
node_address for cluster cluster_name

80
Agent exited on node_address for cluster
cluster_name
The agent has exited. This is generated by an agent in the
Idle state.
Agent exited for cluster cluster_name The agent has exited. This notification is usually
generated during startup when an agent exits before
startup has completed.
Virtual IP address assigned to non-master
node
The virtual IP address appears to be assigned to a non-
master node. To avoid any conflicts, Failover Manager
will release the VIP. You should confirm that the VIP is
assigned to your master node and manually reassign the
address if it is not.
Virtual IP address not assigned to master
node.
The virtual IP address appears to not be assigned to a
master node. EDB Postgres Failover Manager will
attempt to reacquire the VIP.
No standby agent in cluster for cluster
cluster_name
The standbys on cluster_name have left the cluster.
Standby agent failed for cluster
cluster_name
A standby agent on cluster_name has left the cluster,
but the coordinator has detected that the standby database
is still running.
Standby database failed for cluster
cluster_name
A standby agent has signaled that its database has failed.
The other nodes also cannot reach the standby database.
Standby agent cannot reach database for
A standby agent has signaled database failure, but the
other nodes have detected that the standby database is
still running.
Cluster cluster_name has dropped below
three nodes
At least three nodes are required for full failover
protection. Please add witness or agent node to the
cluster.
Subset of cluster cluster_name
disconnected from master
This node is no longer connected to the majority of the
cluster cluster_name. Because this node is part of a
subset of the cluster, failover will not be attempted.
Current nodes that are visible are: node_address
Promotion has started on cluster
cluster_name.
The promotion of a standby has started on cluster
cluster_name.
Witness failure for cluster cluster_name Witness running at node_address has left the cluster.
Idle agent failure for cluster cluster_name. Idle agent running at node_address has left the cluster.
One or more nodes isolated from network for
This node appears to be isolated from the network. Other
members seen in the cluster are: node_name
Node no longer isolated from network for
cluster cluster_name.
This node is no longer isolated from the network.
Standby agent tried to promote, but master DB
is still running
The standby EFM agent tried to promote itself, but
detected that the master DB is still running on
node_address. This usually indicates that the master
EFM agent has exited. Failover has NOT occurred.
Standby agent started to promote, but master
has rejoined.
The standby EFM agent started to promote itself, but
found that a master agent has rejoined the cluster.
Failover has NOT occurred.
Standby agent tried to promote, but could not
verify master DB
The standby EFM agent tried to promote itself, but could
not detect whether or not the master DB is still running
on node_address. Failover has NOT occurred.
Standby agent tried to promote, but VIP
appears to still be assigned
not because the virtual IP address (VIP_address)
appears to still be assigned to another node. Promoting
under these circumstances could cause data corruption.
Standby agent tried to promote, but appears to
be orphaned
not because the well-known server (server_address)

81
could not be reached. This usually indicates a network
issue that has separated the standby agent from the other
agents. Failover has NOT occurred.
Failover has not occurred An agent has detected that the master database is no
longer available in cluster cluster_name, but there are
no standby nodes available for failover.
Potential manual failover required on cluster
cluster_name.
A potential failover situation was detected for cluster
cluster_name. Automatic failover has been disabled
for this cluster, so manual intervention is required.
Failover has completed on cluster
cluster_name
Failover has completed on cluster cluster_name.
Lock file for cluster cluster_name has been
removed
The lock file for cluster cluster_name has been
removed from: path_name on node node_address.
This lock prevents multiple agents from monitoring the
same cluster on the same node. Please restore this file to
prevent accidentally starting another agent for cluster.
recovery.conf file for cluster
cluster_name has been found
A recovery.conf file for cluster cluster_name has been
found at: path_name on master node node_address.
This may be problematic should you attempt to restart the
DB on this node.
recovery_target_timeline is not set to latest in
recovery.conf
The recovery_target_timeline parameter is not set to
latest in the recovery.conf file. The standby server will
not be able to follow a timeline change that occurs when
a new master is promoted.
trigger_file path given in recovery.conf is not
writable
The path provided for the trigger_file parameter in the
recovery.conf file is not writable by the
db_service_owner user. Failover Manager will not be
able to promote the database if needed.
Promotion has not occurred for cluster
cluster_name
A promotion was attempted but there is already a node
being promoted: ip_address.
Standby not reconfigured after failover in
The auto.reconfigure property has been set to false
for this node. The node has not been reconfigured to
follow the new master node after a failover.
Could not resume replay for cluster
cluster_name
Could not resume replay for standby being promoted.
Manual intervention may be required. Error:
error_decription
This error is returned if the server encounters an error
when invoking replay during the promotion of a standby.
Could not resume replay for standby
standby_id.
Could not resume replay for standby. Manual
intervention may be required. Error: error_message.
Possible problem with database timeout
values
Your remote.timeout value (value) is higher than
your local.timeout value (value). If the local
database takes too long to respond, the local agent could
assume that the database has failed though other agents
can connect. While this will not cause a failover, it could
force the local agent to stop monitoring, leaving you
without failover protection.
No standbys available for promotion in cluster
cluster_name
The current number of standby nodes in the cluster has
dropped to the minimum number: number. There cannot
be a failover unless another standby node(s) is added or
made promotable.
Custom monitor timeout for cluster
cluster_name
The following custom monitoring script has timed out:
script_name
Custom monitor 'safe mode' failure for cluster
cluster_name
The following custom monitor script has failed, but is
being run in "safe mode": script_name.

82
Output: script_results
The conditionslistedin the table belowwill triggera SEVERE notification:
Subject Description
Unable to connect to DB on node_address The maximum connections limit has been reached.
Unable to connect to DB on node_address Invalid password for db.user=user_name.
Unable to connect to DB on node_address Invalid authorization specification.
Master cannot ping local database for cluster
cluster_name
The master agent can no longer reach the local database
running at node_address. Other nodes are able to
access the database remotely, so the master will not
release the VIP and/or create a recovery.conf file.
The master agent will become idle until the resume
command is run to resume monitoring the database.
Fencing script error Fencing script script_name failed to execute
successfully.
Exit Value: exit_code
Post-promotion script failed Post-promotion script script_name failed to execute
successfully.
Remote-post-promotion script failed Remote-post-promotion script script_name failed to
execute successfully
Node: node_address
Remote-pre-promotion script failed Remote-pre-promotion script script_name failed to
execute successfully
Node: node_address
Post-database failure script error Post-database failure script script_name failed to
execute successfully.
Agent resumed script error Agent resumed script script_name failed to execute
successfully.
Master isolation script failed Master isolation script script_name failed to execute
successfully.
Could not promote standby The trigger file file_name could not be created on node.
Could not promote standby. Error details:
message_details
Error creating recovery.conf file on
node_address for cluster cluster_name
There was an error creating the recovery.conf file on
master node node_address during promotion.
Promotion has continued, but requires manual
intervention to ensure that the old master node can not be
restarted. Error details: message_details

83
An unexpected error has occurred for cluster
cluster_name
An unexpected error has occurred on this node. Please
check the agent log for more information. Error:
error_details
Master database being fenced off for cluster
cluster_name
The master database has been isolated from the majority
of the cluster. The cluster is telling the master agent at
ip_address to fence off the master database to prevent
two masters when the rest of the failover manager cluster
promotes a standby.
Isolated master database shutdown. The isolated master database has been shutdown by
failover manager.
Master database being fenced off for cluster
cluster_name
The master database has been isolated from the majority
of the cluster. Before the master could finish detecting
isolation, a standby was promoted and has rejoined this
node in the cluster. This node is isolating itself to avoid
more than one master database.
Could not assign VIP to node node_address Failover manager could not assign the VIP address for
some reason.
master_or_standby database failure for
The database has failed on the specified node.
Agent is timing out for cluster
cluster_name
This agent has timed out trying to reach the local
database. After the timeout, the agent could successfully
ping the database and has resumed monitoring. However,
the node should be checked to make sure it is performing
normally to prevent a possible database or agent failure.
Resume timed out for cluster cluster_name This agent could not resume monitoring after
reconfiguring and restarting the local database. See agent
log for details.
Internal state mismatch for cluster
cluster_name
The failover manager cluster's internal state did not
match the actual state of the cluster members. This is rare
and can be caused by a timing issue of nodes joining the
cluster and/or changing their state. The problem should
be resolved, but you should check the cluster status as
well to verify. Details of the mismatch can be found in
the agent log file.
Failover has not occurred An agent has detected that the master database
is no longer available in cluster cluster_name, but
there are not enough standby nodes available for
failover..
Database in wrong state on node_address The standby agent has detected that the local database is
no longer in recovery. The agent will now become idle.
Manual intervention is required.
Database in wrong state on node_address The master agent has detected that the local database is in
recovery. The agent will now become idle. Manual
intervention is required.
Database connection failure for cluster
cluster_name
This node is unable to connect to the database running
on: node_address
Until this is fixed, failover may not work properly
because this node will not be able to check if the database
is running or not.
Standby custom monitor failure for cluster
cluster_name
The following custom monitor script has failed on a
standby node.
The agent will stop monitoring the local database.
Script location: script_name
Script output: script_results
Master custom monitor failure for cluster The following custom monitor script has failed on a

84
cluster_name master node.
EFM will attempt to promote a standby.
Script location: script_name
Script output: script_results
property_name set to true for master node The property_name property has been set to true for
this cluster. Stopping the master agent without stopping
the entire cluster will be treated by the rest of the cluster
as an immediate master agent failure. If maintenance is
required on the master database, shut down the master
agent and wait for a notification from the remaining
nodes that failover will not happen.
Load balancer attach scrip error Load balancer attach script script_name failed to
Load balancer detach script error Load balancer detach script script_name failed to
Please note:In addition to sending noticesto theadministrative emailaddress,all
notifications are recorded in the clusterlog file (/var/log/efm-
3.4/cluster_name.log).

85
8 Supported Failover and Failure
Scenarios
FailoverManagermonitorsa clusterforfailures that may ormay not result in failover.
FailoverManagersupportsa very specific and limited set offailoverscenarios. Failover
can occur:
 if the Masterdatabase crashesoris shutdown.
 if the node hostingthe Masterdatabase crashes orbecomesunreachable.
FailoverManagermakes everyattempt to verify the accuracyofthese conditions. If
agentscannotconfirmthat the Masterdatabaseornodehasfailed,FailoverManagerwill
not performany failoveractionson the cluster.
FailoverManageralso supportsa no auto-failover modeforsituationswhere you want
FailoverManagerto monitoranddetect failoverconditions,butnotperforman automatic
failoverto a Standby. In this mode, a notification is sent tothe administratorwhen
failoverconditionsare met. To disable automatic failover,modify the clusterproperties
file, settingthe auto.failover parameterto false (see Section 3.5.1.1).
FailoverManagerwill alert an administratorto situationsthatrequire administrator
intervention,but that donotmerit promoting a Standby database to Master.

86
8.1 Master Database is Down
If the agent runningon the Masterdatabase nodedetectsa failure ofthe Masterdatabase,
FailoverManagerbeginstheprocessofconfirming the failure (seeFigure 8.1).
Figure 8.1 - Confirming the Failure of the Master Database.

87
If the agent on the Masternodedetectsthat theMasterdatabasehasfailed,all agents
attempt to connectdirectly to the Masterdatabase. Ifan agent canconnect to the
database,FailoverManagersendsa notification about the state ofthe Masternode. Ifno
agent can connect, the Masteragentdeclaresdatabasefailure and releases theVIP(if
applicable).
If no agent can reach thevirtualIPaddressorthe database server,FailoverManager
startsthe failoverprocess. The Standbyagenton themost up-to-datenode runsa fencing
script (if applicable),promotesthe Standby database to Masterdatabase,and assignsthe
virtualIP address tothe Standby node. AnyadditionalStandbynodes are configuredto
replicate fromthe newmasterunless auto.reconfigure is setto false. Ifapplicable,
the agent runsa post-promotionscript.
Returning the Node to the Cluster
To recoverfromthis scenario withoutrestartingthe entire cluster,you should:
1. Restart the database on the originalMasternodeas a Standbydatabase.
2. Invoke the efm resume command on the originalMasternode.
Returning the Node to the Role ofMaster
Afterreturningthe nodeto theclusterasa Standby,youcan easily return the node tothe
role ofMaster:
1. If the clusterhasmore thanone Standby node,use theefm allow-node
command to set the node'sfailoverpriority to 1.
2. Invoke the efm promote -switchover command to promotethe node to its
originalrole ofMasternode. Formore information about the command,please
see Section 5.3.

88
8.2 Standby Database is Down
If a Standbyagentdetectsa failure ofits database,the agent notifiesthe otheragents; the
otheragentsconfirmthe state ofthe database (seeFigure 8.2).
Figure 8.2 - Confirming the failure of a Standby Database.
Afterreturningthe Standby database to a healthy state,invoke the efm resume
command to return the Standby tothe cluster.

89
8.3 Master Agent Exits or Node Fails
If the FailoverManagerMasteragent crashes orthe nodefails,a Standby agent will
detect thefailure and (ifappropriate)initiate a failover(see Figure 8.3).
Figure 8.3 - Confirming the failure of the Master Agent.
If an agent detectsthatthe Masteragent hasleft,all agentsattemptto connectdirectly to
the Masterdatabase. Ifany agentcan connect to the database,an agentsendsa
notification about the failure ofthe Masteragent. Ifno agent can connect,the agents
attempt to ping the virtualIPaddressto determine ifit has been released.

90
If no agent can reach thevirtualIPaddressorthe database server,FailoverManager
startsthe failoverprocess. The Standbyagenton themost up-to-datenode runsa fencing
script (if applicable),promotesthe Standby database to Masterdatabase,and assignsthe
virtualIP address tothe Standby node; ifapplicable,theagent runsa post-promotion
script. Any additionalStandbynodesare configured to replicate fromthe newmaster
unless auto.reconfigure is setto false.
If this scenario hasoccurredbecause the masterhasbeenisolated fromnetwork,the
Masteragent will detectthe isolationandrelease thevirtualIPaddressandcreate the
recovery.conf file. FailoverManagerwill performthe previously listedstepson the
remaining nodesofthe cluster.
To recoverfromthis scenario withoutrestartingthe entire cluster,you should:
1. Restart the originalMasternode.
2. Bring the originalMasterdatabase upas a Standby node.
3. Start the service onthe originalMasternode.
Please note that stopping an agent doesnot signalthe clusterthatthe agent hasfailed.

91
8.4 Standby Agent Exits or Node Fails
If a Standbyagentexits ora Standbynode fails,the otheragentswilldetect that it is no
longerconnected to the cluster.
Figure 8.4 - Failure of Standby Agent.
When thefailure is detected,the agentsattemptto contactthe database that resides onthe
node; ifthe agentsconfirmthat there is a problem,FailoverManagersendsthe
appropriate notification to the administrator.
If there is only one Masterandone Standby remaining,there is no failoverprotectionin
the case ofa Masternodefailure. In the case ofa Masterdatabase failure,the Masterand
Standbyagentscan agree that the database failed and proceedwith failover.

92
8.5 Dedicated Witness Agent Exits / Node Fails
The following scenario details the actions taken ifa dedicatedWitness(a node that is not
hostinga database)fails.
Figure 8.5 - Confirming the Failure of a dedicated Witness.
When an agent detectsthat the Witnessnode cannotbe reached,FailoverManager
notifies the administratorofthe state ofthe Witness(see Figure 8.5).
Note: If there is only one MasterandoneStandbyremaining,there is no failover
protection in the case ofa Masternode failure. In the case ofa Masterdatabase failure,
the MasterandStandbyagentscan agree that thedatabasefailed and proceed with
failover.

93
8.6 Nodes Become Isolated from the Cluster
The following scenario details the actions taken ifone ormore nodes (a minority ofthe
cluster)become isolated fromthe majority ofthe cluster.
Figure 8.6 – If members of the cluster become isolated.
If one ormore nodes(butlessthanhalfofthe cluster)become isolatedfromthe res t of
the cluster,theremaining clusterbehavesasifthe nodeshavefailed. The agentsattempt
to discern ifthe Masternodeis among theisolatednodes; it is, the Masterfencesitselfoff
fromthe cluster,while a Standbynode(fromwithin the clustermajority)is promoted to
replace it. OtherStandbynodesare configuredto replicate fromthe newmaster unless
auto.reconfigure is set to false.
FailoverManagerthennotifiesan administrator,andthe isolated nodesrejoin the cluster
when they are able. Whenthe nodesrejoin the cluster,the failoverpriority may change.

94
9 Upgrading an Existing Cluster
FailoverManagerprovidesa utility to assist youwhenupgradinga FailoverManager
cluster. To upgrade an existing cluster,youmust:
1. InstallFailoverManager 3.4on each nodeofthe cluster. Fordetailed information
about installing FailoverManager,see Section 3.
2. Afterinstalling FailoverManager,invoke the efm upgrade-conf utility to create
the .properties and .nodes files forFailoverManager3.4. The Failover
Managerinstalleraddstheupgrade utility (efm upgrade-conf) to the
/usr/edb/efm-3.4/bin directory. To invoke the utility,assume root
efm upgrade-conf cluster_name
The efm upgrade-conf utility locates the.properties and.nodes files of
pre-existing clustersandcopiesthe parametervaluesto a newconfiguration file
for use by FailoverManager. The utility savesthe updatedcopy ofthe
configurationfiles in the /etc/edb/efm-3.4 directory.
3. Modify the .properties and.nodes files forEFM 3.4, specifying anynew
preferences. Version 3.4ofFailoverManageraddsthe followingconfiguration
properties:
master.shutdown.as.failure
Use yourchoiceofeditorto modify anyadditionalpropertiesin the propertiesfile
(located in the /etc/edb/efm-3.4 directory)before starting the service forthat
node. Fordetailed informationabout propertysettings,see Section3.5.
4. Use a version-specific command to stopthe old FailoverManagercluster; for
example, you can use the following commandto stopa version3.4cluster:
/usr/efm-3.4/bin/efm stop-cluster efm
5. Start the newFailovermanager service(efm-3.4)on each node ofthe cluster.
Formore information aboutstarting theservice,see Section 4.1.1.
The following example demonstratesinvokingthe upgrade utility to create the
.properties and .nodes files for a FailoverManager installation:
[root@ONE efm-3.4]}> /usr/edb/efm-3.4/bin/efm upgrade-conf
example
Checking directory /etc/edb/efm-3.4
Processing example.properties file

95
jvm.options property value updated to "-Xmx128m".
The following properties were added in addition to those in
previous installed version:
master.shutdown.as.failure
Checking directory /etc/edb/efm-3.3
Processing example.nodes file
Upgrade of files is finished. The owner and group for
properties and nodes files have been set as 'efm'.
If you are using a FailoverManagerconfigurationwithout sudo,includethe -source
flag and specify thename ofthe directory in which theconfigurationfiles residewhen
invoking upgrade-conf.
If you are using a FailoverManagerconfigurationwithout sudo,includethe -source
flag and specify thename ofthe directory in which theconfigurationfiles reside. Ifthe
directory is not the configuration default directory,the upgraded files will be created in
the directory fromwhich the upgrade-conf commandwas invoked. Formore
information,see Section3.4.1.
Please note: Ifyou are usinga customservice scriptorunit file,you must manually
updatethe file to reflect the new FailoverManagerservice name when you performan
upgrade.

96
9.1 Un-installing Failover Manager
Afterupgradingto FailoverManager 3.4,you can useYumto remove previous
installationsofFailoverManager. Forexample, use the followingcommand to remove
FailoverManager3.3and any unneededdependencies:
yum remove edb-efm33

97
9.2 Performing a Database Update (Minor Version)
This section describes howto performa quickminor database version upgrade. You can
use the steps thatfollowto upgrade fromone minorversionto another(forexample,from
10.1.5 to version 10.2.7), orto apply a patch release fora version.
You should first update the databaseserveron each Standby node ofthe Failover
Managercluster. Then,performa switchover,promotinga Standbynodeto the role of
Masterwithin theFailoverManagercluster. Then,performa databaseupdate onthe old
masternode.
On each node ofthe clusteryoumustperformthe following stepsto updatethe database
server:
1. Stop the FailoverManageragent.
2. Stop the database server.
3. Update the database server.
4. Start the databaseservice.
5. Start the FailoverManageragent.
Fordetailed information aboutcontrolling theAdvancedServerservice,orupgrading
yourversionofAdvancedServer,pleaseseethe EDBPostgresAdvancedServerGuide,
available at:
https://www.enterprisedb.com/resources/product-documentation
When yourupdatesare complete,you canuse the efm set-priority command to add
the old masterto the front ofthestandbylist,and then switchoverto returnthe clusterto
its originalstate. Formore information about efm set-priority,see Section 5.3.

98
10 Troubleshooting
If you receive a notification messageaboutan unexpectederrormessage,checkthe
FailoverManagerlog file (see Section 6)foran OutOfMemory message. Failover
Managerrunswith the default memory value set bythisproperty:
# Extra information that will be passed to the JVM when starting
the agent.
jvm.options=-Xmx128m
If you are running with lessthan128megabytesallocated,youshould increase the value
and restartthe FailoverManageragent.
FailoverManageris testedwith OpenJDK; we strongly recommend usingOpenJDK.
You can use thefollowing commandto checkthe typeofyourJavainstallation:
# java -version
openjdk version "1.8.0_191"
OpenJDK Runtime Environment (build 1.8.0_191-b12)
OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode)

99
11 Appendix A - Configuring
Streaming Replication
The following section willwalk you through the processofconfiguringa simple two-
node replicationscenario that uses streaming replicationto replicate data froma Master
node to a Standbynode. The replication processforlargerscenarioscanbe complex; for
detailed information about configurationoptions,pleasesee the PostgreSQLcore
documentation,available at:
https://www.postgresql.org/docs/10/static/warm-standby.html#streaming-replication
In the example that follows,we will use a .pgpass file to enable md5 authenticationfor
the replication user – thismay ormay not be the safestauthenticationmethod foryour
environment. Formore information about the supported authenticationoptions,please
see the PostgreSQLcore documentation at:
https://www.postgresql.org/docs/10/static/client-authentication.html
The stepsthat followconfigure a simple streaming replicationscenario with oneMaster
node and one Standby node,eachrunningan installation ofEDBPostgresAdvanced
Server. In the example:
 The Masternode resides on146.148.46.44
 The Standby node resideson107.178.217.178
 The replication username is edbrepuser.
The pathnamesandcommandsreferencedin the examples are forAdvanced Serverhosts
that reside ona CentOS6.x host– you may have to modify paths and commandsforyour
configuration.
Configuringthe MasterNode
Connect to the masternode ofthe replicationscenario,andmodify the pg_hba.conf
file (located in the data directoryunderyourPostgresinstallation),addingconnection
information forthe replication user(in ourexample, edbrepuser):
host replication edbrepuser 107.178.217.178/32 md5
The connectioninformation should specify the addressofthe standby nodeofthe
replication scenario,and yourpreferredauthenticationmethod.
Modify the postgresql.conf file (located in the data directory, under your
Postgres installation),adding thefollowing replicationparameterand valuesto
the end ofthe file:

100
wal_level = hot_standby
max_wal_senders = 8
wal_keep_segments = 128
archive_mode = on
archive_command = 'cp %p /tmp/%f'
Save the configurationfile and restartthe server:
/etc/init.d/edb-as10 restart
Use the sudo su – command to assume theidentity ofthe enterprisedb database
superuser:
sudo su - enterprisedb
Then,start a psql session,connectingto theedb database:
/opt/edb/as10/psql -d edb
At the psql command line,create a userwith the replication attribute:
CREATE ROLE edbrepuser WITH REPLICATION LOGIN PASSWORD
'password';
Configuringthe Standby Node
Connect to the Standbyserver,and assume the identity ofthedatabase superuser
(enterprisedb):
With yourchoice ofeditor,create a .pgpass file in the home directory ofthe
enterprisedb user. The .pgpass file holds the passwordofthe replication userin
plain-text form; if you are usinga .pgpass file, you should ensure that only trusted
users have accesstothe .pgpass file:
Add an entrythatspecifies connectioninformation forthe replication user:
*:5444:*:edbrepuser:password
The serverwill enforce restrictivepermissionson the.pgpass file; use the following
command to set the file permissions:
chmod 600 .pgpass
Relinquish the identityofthe database superuser:

101
exit
Then,assume superuserprivileges:
sudo su -
You must stopthedatabase serverbefore replacing the data directory on the Standby
node with the data directory ofthe Masternode. Use thecommand:
/etc/init.d/edb-as-10 stop
Then,delete thedata directoryon the Standbynode:
rm -rf /opt/edb/as10/data
Afterdeleting theexisting data directory,move intothe bin directory and use the
pg_basebackup utility to copy the data directory ofthe Masternode to theStandby:
cd /opt/edb/as10/bin
./pg_basebackup –R –D /opt/edb/as10/data
--host=146.148.46.44 –-port=5444
--username=edbrepuser --password
The call to pg_basebackup specifies theIPaddressofthe Masternode andthe name of
the replication usercreatedon the Masternode. Formore information about the options
available with the pg_basebackuputility,seethe PostgreSQLcore documentation at:
https://www.postgresql.org/docs/10/static/app-pgbasebackup.html
When promptedby pg_basebackup,providethe passwordassociated with the
replication user.
Aftercopying thedata directory,change ownership ofthe directory to thedatabase
superuser(enterprisedb):
chown -R enterprisedb /opt/edb/as10/data
Navigate into the data directory:
cd /opt/edb/as10/data
With yourchoice ofeditor,create a file named recovery.conf( in the
/opt/PostgresPlus/9.xAS/data directory)that includes:
standby_mode = on
primary_conninfo = 'host=146.148.46.44 port=5444 user=edbrepuser
sslmode=prefer sslcompression=1 krbsrvname=postgres'

102
trigger_file = '/opt/edb/as10/data/mytrigger'
restore_command = '/bin/true'
recovery_target_timeline = 'latest'
The primary_conninfo parameterspecifiesconnectioninformation forthe replication
useron the masternode ofthe replicationscenario.
Change ownershipofthe recovery.conf file to enterprisedb:
chown enterprisedb:enterprisedb recovery.conf
Modify the postgresql.conf file (located in data directory,underthe Postgres
installation),specifyingthefollowing valuesat the endofthe file:
wal_level = hot_standby
max_wal_senders = 8
wal_keep_segments = 128
hot_standby = on
The data file has been copiedfromthe Masternode,andwill contain thereplication
parameters specified previously.
Then,restart the server:
/etc/init.d/edb-as-10 start
At this point,the Masternodewill be replicating datato theStandby node.
Confirming Replicationfromthe Masterto Standby
You can confirmthat the serveris runningandreplicating byenteringthe command:
ps -ef | grep postgres
If replication is running,the Standbyserverwill echo:
501 42054 1 0 07:57 pts/1 00:00:00
/opt/PostgresPlus/9.2AS/bin/edb-postgres -D
/opt/PostgresPlus/9.2AS/data
501 42055 42054 0 07:57 ? 00:00:00 postgres: logger process
501 42056 42054 0 07:57 ? 00:00:00 postgres: startup
process recovering 000000010000000000000004
501 42057 42054 0 07:57 ? 00:00:00 postgres: checkpointer
process
501 42058 42054 0 07:57 ? 00:00:00 postgres: writer process
501 42059 42054 0 07:57 ? 00:00:00 postgres: stats
collector process
501 42060 42054 0 07:57 ? 00:00:00 postgres: wal receiver

103
process streaming 0/4000150
501 42068 42025 0 07:58 pts/1 00:00:00 grep postgres
If you connectto theStandby with thepsqlclient andquery the
pg_is_in_recovery() function,the serverwill reply:
edb=# select pg_is_in_recovery();
pg_is_in_recovery
-------------------
t
(1 row)
Any entriesmade to theMasternode willbe replicated to the Standby node. The
Standbynodewill operate in read-only mode; while you canquery theStandbyserver,
you will not be able to add entriesdirectly to thedatabasethatresideson the Standby
node.
Manually Invoking Failover
To promote the Standbyto become theMasternode,assume the identityofthe cluster
owner(enterprisedb):
Then,invoke pg_ctl:
/opt/edb/as10/bin/pg_ctl promote -D / opt/edb/as10 /data/
Then,if you connectto the Standby nodewith psql,the serverwill confirmthat it is no
longera standby node:
edb=# select pg_is_in_recovery();
pg_is_in_recovery
-------------------
f
(1 row)
Formore information aboutconfiguring and usingstreamingreplication,please seethe
PostgreSQLcore documentation,available at:
https://www.postgresql.org/docs/10/static/warm-standby.html#streaming-replication

104
11.1Limited Support for Cascading Replication
While FailoverManagerdoesnot providefullsupportforcascadingreplication,it does
provide limited supportforsimple failoverin a cascadingreplicationscenario.
Cascadingreplication allowsa Standbynode tostreamto anotherStandbynode,reducing
the numberofconnections(andprocessing overhead)to the masternode.
Fordetailed information aboutconfiguring cascadingreplication,please see the
PostgreSQLdocumentation at:
https://www.postgresql.org/docs/10/static/warm-standby.html#cascading-replication
To use FailoverManagerin a cascadingreplicationscenario,youshould modify the
clusterpropertiesfile,setting thefollowing propertyvalueson StandbyNode#2:
promotable=false
auto.reconfigure=false
In the event ofa Failover,StandbyNode#1will be promotedto the role of Masternode.
Should failoveroccur,Standby Node #2will continue to act asa read-only replica forthe
newMasternode untilyou take actionsto manually reconfigure the replication scenario
to contain 3nodes.
In the event ofa failure of Standby Node #1,youwill not have failoverprotection,but
you will receive an email notifying youofthe failure ofthe node.
Please note that performing a switchoverand switchbackto theoriginalmastermay not
preservethe cascadingreplication scenario.

105
12 Appendix B - Configuring SSL
Authentication on a Failover
Manager Cluster
The following stepsenable SSLauthentication forFailoverManager. Please notethatall
connectingclientswillbe required to useSSLauthenticationwhenconnectingto any
database serverwithin the cluster; youwill be required to modify the connection methods
currently usedby existing clients.
To enable SSLon a FailoverManagercluster,youmust:
1. Place a server.crt and server.key file in the data directory(underyour
Advanced Serverinstallation). You can purchase a certificate signed byan authority,
or create yourown self-signedcertificate. Forinformation about creatinga self-
signed certificate,see thePostgreSQLcore documentation at:
https://www.postgresql.org/docs/10/static/ssl-tcp.html#ssl-certificate-creation
2. Modify the postgresql.conf file on each database within theFailoverManager
cluster,enabling SSL:
ssl=on
Aftermodifying the postgresql.conf file,you must restart theserver.
3. Modify the pg_hba.conf file on each node ofthe FailoverManagercluster,adding
the following line to the beginningofthe file:
hostnossl all all all reject
The line instructstheserverto reject any connections thatare not usingSSL
authentication;this enforcesSSLauthenticationforany connectingclients. For
information about modifyingthe pg_hba.conf file,see the PostgreSQLcore
documentationat:
https://www.postgresql.org/docs/10/static/auth-pg-hba-conf.html
4. Afterplacing the server.crt andserver.key file in the data directory,convert the
certificate to a formthat Java understands;youcan usethe command:
openssl x509 -in server.crt -out server.crt.der -outform der
Formore information,see:

106
https://jdbc.postgresql.org/documentation/94/ssl-client.html
5. Then,add thecertificate tothe Java trustedcertificatesfile:
keytool -keystore $JAVA_HOME/lib/security/cacerts -alias
alias_name -import -file server.crt.der
Where
$JAVA_HOME is the home directory ofyourJava installation.
alias_name can be any string,butmust beunique for eachcertificate.
You can use thekeytool command to reviewa list ofthe available certificatesor
retrieve information abouta specific certificate. Formore information aboutusing
the keytool command,enter:
man keytool
The certificate fromeach database servermust be importedinto thetrusted
certificates file ofeach agent. Note that thelocation ofthe cacerts file may vary on
each system. Formore information,visit:
https://jdbc.postgresql.org/documentation/94/ssl-client.html
6. Modify the efm.properties file on each nodewithin the cluster,setting the
jdbc.sslmode property.

107
13 Inquiries
If you have anyquestionsregardingEDB Postgres FailoverManager,please contact
EnterpriseDBat:
sales@enterprisedb.com

Dokumen.tips edb postgres-failover-manager-guide-get-failover-manager-requires-that-postgresql

More Related Content

What's hot (14)

Similar to Dokumen.tips edb postgres-failover-manager-guide-get-failover-manager-requires-that-postgresql (20)

Recently uploaded (20)

Dokumen.tips edb postgres-failover-manager-guide-get-failover-manager-requires-that-postgresql