2011-03-06

switchover for the common people

Streaming Replication in PostgreSQL 9.0 made it a joy to install a hot standby server. It gets easier if you use the excellent 2ndQuadrant clusterware repmgr. The main challenge is coming up with a proper naming schema which avoids using „master“ or „standby“, as all participants of replication are just playing roles, which may be switched; and setting up proper authorizations, ssh-keys and all.

Still there is one area, where many of the documentation point it out as an excercise for the reader: when doing the switchover, so the former standby becomes the new master – how do we inform all database users?

One solution is to have PgBouncer or PgPool running on a third server (as „running on one of the database servers“ will take PgBouncer / PgPool down when that server goes down), and have clients connect to the database via pgbouncer / pgpool 
The nice things:
  • all the additional powers of those tools – i.e. pooling connections: read the linked websites for details and more powers
  • no heckling with network infrastructure (IP, DNS)
  • at least pgbouncer is also available on windows
The not-so-nice things:
  • neading another server (that also should be kept redundand, giving an additional level to the switchover-challenge as in „switching over to the backup pgbouncer/pgpool)
  • having another tool to configure and maintain
  • authorization and encryption become more challenging through pgbouncer/pgpool being an involved third party
The second idea is to deal with the switchover on IP level: the new master can take over the IP-address of the old master

Nice about this: Clients can connect to the old IP-address

Challenges: 
  • make sure the old master is really taken of the network or switched to a third IP-address (so no quick testing with "pg_ctl stop" on the master)
  • is the DBA allowed to change IP addresses (network infrastructure may be handled by a totally different department / company)
  • how quick and reliable are the updates of the IP-MAC-caches in switches / routers, especially in WAN environments
  • hassle with database-monitoring software which checks on both roles (and be it simply PgAdmin refreshing the "SELECT * FROM repl_status")
  • are the used backup solutions provable happy with that kind of IP-changes?
Third option is doing the switchover using the dns-server: have a DNS-name-alias „actualmasterdb“ for the active server, have that alias point to the actual master server – hence the name :)
Challenges:
  • is the DBA qualified to reconfigure the DNS-server ( DNS-server may be running on a different operating system / or be an appliance)
  • how to script the reconfiguration of the DNS-Server 
  • is the DBA allowed to reconfigure the DNS-server (network infrastructure in other department / company)
  • how to update all DNS-caches accross the network (especially on the clients, intensly looking at you, windows DNS name cache)
  • can database monitoring cope with the changes? 
  • double check that backup knows whom it is backuping
The three presented solutions make you pay their – and especially those „network infrastructure in different deparment“ may be a prohibitive situation.
libpq to the rescue: since 2000-10-17, the option of a pg_service.conf file is within libpq.  

To use it, use the following receipt:
  1. create a pg_service.conf file
    # comment: ncc1701 is master at stardate 2011-03-06 
    [myactivedatabase]
    host=ncc1701
    port=5432
    database=criticalmission
  2. put that pg_service.conf on a network file share 
  3. point the environment variable PGSYSCONFDIR  to that directory (on windows setenv may come in handy), alternatively point PGSERVICEFILE to the pg_service.conf 
  4. connect to your active database via a connection parameter of „service=myactivedatabase“ in any libpq application
  5. on switchover, simply change the pg_service.conf file so that „host“ points to the promoted standby. Have the clients build up a new connection (as the old connection will be broke anyway, since the old master is gone)
  6. done.
Challenges of this solution:
  • not every libpq-application supports entering service=servicename for its connection parameters (that's a limit of those applcations connection handlers, not of libpq!); allthough, there is allways hope that expand_dbname is non-zero and you can enter „service=servicename“ for the database-name and libpq will parse it correctly 
  • solutions have to be found what to do if the file-share goes down (have a local copy of the pg_service.conf, how to update that ...) But in practice most enterprises come to a halt anyway when strategic .ppt-files - also on fileshares - are no longer available; so somebody else has allready taken care of the fileshare high availability.
disclaimer: I am the owner-manager of 2ndQuadrant Deutschland GmbH, the German branch of the international professional PostgreSQL service company 2ndQuadrant

2 comments:

  1. I think one other potential solution, to easily "swtich" the IP address, is to have a cluster manager that manages zero services but still assigns a virtual IP to the master and standby cluster. You don't need to set STONITH up because there are no services. And the (virtual) IP never changes. Of course the assumption here is that the cluster manager can be started at all with zero services.

    I have had a similar, but more a detailed, setup involving DRBD and pgbouncer in production for over 8 months.

    - Ramanujan

    ReplyDelete
  2. I especially like the .pg_service.conf solution, for the same reasons that you describe.

    Let me just add a pointer to the latest docs:
    http://www.postgresql.org/docs/9.0/static/libpq-pgservice.html

    ReplyDelete