2011-03-08

quick hack to roughly transfer table structures from Oracle to PostgreSQL

Sometimes you just wish to mirror some data in PostgreSQL, which is stored by rules you cannot influence in Oracle. (For full-scale migration also check out ora2pg, and you can also order professional migration services :) )

To transfer only a small number of table structures, I found some lines of Python really handy:
import cx_Oracle # I recommend using the unicode variant
cn=cx_Oracle.connect(u"user",u"pwd",u"tnsname")
cs=cn.cursor()
# just select one row from the table we want to mirror
# bonus point: with naming the columns we also can mirror only part of that table
cs.execute("select * from tabletomirror where rownum < 2")

# this defines a mapping between cx_Oracle types and PostgreSQL types.
# as the title states, we are talking about "roghly"
# improving the "numeric" by using precision, scale etc. is left as an exercise for the reader
# (as is mapping CLOBS and whatever else there might be .. :) )
ora2pg= {cx_Oracle.NUMBER:"numeric", cx_Oracle.UNICODE:"text", 
         cx_Oracle.STRING:"text",cx_Oracle.DATETIME:"timestamp"}


# cs.description contains a good-enough description of the oracle
# table structure
# YES, primary keys etc. are not transferred automatically ...
# that's why I call it a quick hack 
collist=[]
for oname, otyp, display_size, internal_size, \
   precision, scale, null_ok in cs.description:
    collist.append("{oname} {otype}".format(
                        oname=oname, otype=ora2pg[otyp]))

createtablecommand = """create table tabnameinpgsql as ({0});""".format(
                     ", ".join(collist))
print ctablecommand

2011-03-06

switchover for the common people

Streaming Replication in PostgreSQL 9.0 made it a joy to install a hot standby server. It gets easier if you use the excellent 2ndQuadrant clusterware repmgr. The main challenge is coming up with a proper naming schema which avoids using „master“ or „standby“, as all participants of replication are just playing roles, which may be switched; and setting up proper authorizations, ssh-keys and all.

Still there is one area, where many of the documentation point it out as an excercise for the reader: when doing the switchover, so the former standby becomes the new master – how do we inform all database users?

One solution is to have PgBouncer or PgPool running on a third server (as „running on one of the database servers“ will take PgBouncer / PgPool down when that server goes down), and have clients connect to the database via pgbouncer / pgpool 
The nice things:
  • all the additional powers of those tools – i.e. pooling connections: read the linked websites for details and more powers
  • no heckling with network infrastructure (IP, DNS)
  • at least pgbouncer is also available on windows
The not-so-nice things:
  • neading another server (that also should be kept redundand, giving an additional level to the switchover-challenge as in „switching over to the backup pgbouncer/pgpool)
  • having another tool to configure and maintain
  • authorization and encryption become more challenging through pgbouncer/pgpool being an involved third party
The second idea is to deal with the switchover on IP level: the new master can take over the IP-address of the old master

Nice about this: Clients can connect to the old IP-address

Challenges: 
  • make sure the old master is really taken of the network or switched to a third IP-address (so no quick testing with "pg_ctl stop" on the master)
  • is the DBA allowed to change IP addresses (network infrastructure may be handled by a totally different department / company)
  • how quick and reliable are the updates of the IP-MAC-caches in switches / routers, especially in WAN environments
  • hassle with database-monitoring software which checks on both roles (and be it simply PgAdmin refreshing the "SELECT * FROM repl_status")
  • are the used backup solutions provable happy with that kind of IP-changes?
Third option is doing the switchover using the dns-server: have a DNS-name-alias „actualmasterdb“ for the active server, have that alias point to the actual master server – hence the name :)
Challenges:
  • is the DBA qualified to reconfigure the DNS-server ( DNS-server may be running on a different operating system / or be an appliance)
  • how to script the reconfiguration of the DNS-Server 
  • is the DBA allowed to reconfigure the DNS-server (network infrastructure in other department / company)
  • how to update all DNS-caches accross the network (especially on the clients, intensly looking at you, windows DNS name cache)
  • can database monitoring cope with the changes? 
  • double check that backup knows whom it is backuping
The three presented solutions make you pay their – and especially those „network infrastructure in different deparment“ may be a prohibitive situation.
libpq to the rescue: since 2000-10-17, the option of a pg_service.conf file is within libpq.  

To use it, use the following receipt:
  1. create a pg_service.conf file
    # comment: ncc1701 is master at stardate 2011-03-06 
    [myactivedatabase]
    host=ncc1701
    port=5432
    database=criticalmission
  2. put that pg_service.conf on a network file share 
  3. point the environment variable PGSYSCONFDIR  to that directory (on windows setenv may come in handy), alternatively point PGSERVICEFILE to the pg_service.conf 
  4. connect to your active database via a connection parameter of „service=myactivedatabase“ in any libpq application
  5. on switchover, simply change the pg_service.conf file so that „host“ points to the promoted standby. Have the clients build up a new connection (as the old connection will be broke anyway, since the old master is gone)
  6. done.
Challenges of this solution:
  • not every libpq-application supports entering service=servicename for its connection parameters (that's a limit of those applcations connection handlers, not of libpq!); allthough, there is allways hope that expand_dbname is non-zero and you can enter „service=servicename“ for the database-name and libpq will parse it correctly 
  • solutions have to be found what to do if the file-share goes down (have a local copy of the pg_service.conf, how to update that ...) But in practice most enterprises come to a halt anyway when strategic .ppt-files - also on fileshares - are no longer available; so somebody else has allready taken care of the fileshare high availability.
disclaimer: I am the owner-manager of 2ndQuadrant Deutschland GmbH, the German branch of the international professional PostgreSQL service company 2ndQuadrant