Project

General

Profile

How To Configure Warm Standby with PITRTools » History » Version 37

Ivan Lezhnjov, 05/08/2013 07:17 AM

1 1 Ivan Lezhnjov
{{toc}}
2
3 20 Ivan Lezhnjov
h1. How To Configure Warm Standby with pitrtools
4 1 Ivan Lezhnjov
5 20 Ivan Lezhnjov
This how-to demonstrates how to configure a warm standby using PostgreSQL 8.4 and pitrtools.
6 1 Ivan Lezhnjov
7 20 Ivan Lezhnjov
h1. What Is pitrtools and Why Would You Want To Use It?
8 1 Ivan Lezhnjov
9 20 Ivan Lezhnjov
Essentially, pitrtools is a wrapper around standard tools, such as rsync and PostgreSQL's internal functionality, that makes, among other things, creating and managing standby configurations and subsequent failover to a standby a snap.
10 1 Ivan Lezhnjov
11 20 Ivan Lezhnjov
With the help of pitrtools you could do more, namely:
12 1 Ivan Lezhnjov
* secure shipping of log files to configured standby server over SSL protected link;
13
* streaming replication;
14
* enable/disable archiving without the need to restart PostgreSQL;
15
* stay informed by generating alerts based on various levels of severity of events happening during the process on both ends of configuration;
16
* automatically take a base backup, including table spaces, restore archives and purge old ones (if PostgreSQL >8.3);
17
* failover to the latest restore point and point-in-time recovery (restore using timestamps);
18
* etc.
19
20 20 Ivan Lezhnjov
Obviously, pitrtools attempts to make things simpler, more secure and easier to manage.
21 1 Ivan Lezhnjov
22
h1. Naming Conventions
23
24
Master server can be referred to also as archiver.
25
Slave is often called standby.
26
27
These names are used interchangeably.
28
29
h1. The Test Setup
30
31 20 Ivan Lezhnjov
It's a good idea to first setup pitrtools and play with it before you actually go ahead and change configuration of your production servers. To show how pitrtools is configured and work, I'll describe the entire process using a 2 hosts test setup as an example.
32 1 Ivan Lezhnjov
33
So, there are 2 hosts named @bitarena@ and @bitarena-clone@, both being virtualized instances of Debian Squeeze. I assume you're experienced enough to install Debian yourself and know how to find your way around the system.
34
35 20 Ivan Lezhnjov
@bitarena@ is designated role of master server aka archiver, @bitarena-clone@ is a slave aka standby server. pitrtools is installed on both hosts, but each host uses different tools from the package.
36 1 Ivan Lezhnjov
37
h1. The Process
38
39 20 Ivan Lezhnjov
These are major steps in actual order of execution that one would have to follow to get pitrtools-enabled setup running:
40 1 Ivan Lezhnjov
41
On master server
42
43
		* Turn on archiving
44 21 Ivan Lezhnjov
		* Install helper scripts
45 1 Ivan Lezhnjov
		* @cmd_archiver -C $CONFIG -I@
46
47
On slave server
48
49
		* @cmd_standby -C $CONFIG -I@
50
		* @cmd_standby -C $CONFIG -B@
51
		* @cmd_standby -C $CONFIG -S@
52
53 20 Ivan Lezhnjov
h1. Installing and Configuring pitrtools
54 1 Ivan Lezhnjov
55 23 Ivan Lezhnjov
If your system doesn't have a package for pitrtools download a tarball from pitrtools project page at https://public.commandprompt.com/projects/pitrtools/wiki.
56 20 Ivan Lezhnjov
57 24 Ivan Lezhnjov
Installing pitrtools is a matter of unpacking an archive file. The recommended default is `postgres' home directory.
58
59
You can put it anywhere in your filesystem as long as the files belong to and can be accessed by `postgres'.
60 1 Ivan Lezhnjov
61 20 Ivan Lezhnjov
h1. SSH Key-based Login for pitrtools
62 1 Ivan Lezhnjov
63 20 Ivan Lezhnjov
pitrtools relies on rsync and SSH heavily to do its work, e.g. making a base backup and shipping WAL log files from master to slave server -- all that happens over an SSL-protected communication channel. This is the area where pitrtools makes one's life easier, because otherwise you'd have to find a way to copy log files to a slave server somehow (most likely it would be some sort of networking mount point).
64 1 Ivan Lezhnjov
65 5 Ivan Lezhnjov
Therefore one of the prerequisites is to configure SSH key-based logins between the two hosts for @postgres@ system user that don't require a password or a passphrase.
66 1 Ivan Lezhnjov
67
*Master*
68
<pre>root@bitarena:~# su - postgres
69
70
postgres@bitarena:~$ ssh-keygen -t rsa
71
... 
72
postgres@bitarena:~$ ls -la /var/lib/postgresql/.ssh/
73
total 16
74
drwx------ 2 postgres postgres 4096 Sep 28 09:17 .
75
drwxr-xr-x 5 postgres postgres 4096 Sep 28 09:17 ..
76
-rw------- 1 postgres postgres 1675 Sep 28 09:17 id_rsa
77
-rw-r--r-- 1 postgres postgres  399 Sep 28 09:17 id_rsa.pub
78
postgres@bitarena:~#
79
</pre>
80
81
*Slave*
82
<pre>root@bitarena-clone:~# su - postgres
83
postgres@bitarena-clone:~$ ssh-keygen -t rsa
84
... </pre>
85
86
Answer all the questions and ssh-keygen will create both private and public (*.pub file) RSA keys. For now just create the key pair, one on on each host (master and slave). When done, proceed to exchanging public keys between the hosts like this:
87
88
*Master*
89 18 Ivan Lezhnjov
<pre>postgres@bitarena:~$ ssh-copy-id bitarena-clone
90
Now try logging into the machine, with "ssh 'bitarena-clone'", and check in:
91 1 Ivan Lezhnjov
92 18 Ivan Lezhnjov
  .ssh/authorized_keys
93 1 Ivan Lezhnjov
94 18 Ivan Lezhnjov
to make sure we haven't added extra keys that you weren't expecting.
95 1 Ivan Lezhnjov
96
postgres@bitarena:~$ ssh bitarena-clone
97 18 Ivan Lezhnjov
Linux bitarena-clone 2.6.32-5-686 #1 SMP Sun May 6 04:01:19 UTC 2012 i686
98 1 Ivan Lezhnjov
99
The programs included with the Debian GNU/Linux system are free software;
100
the exact distribution terms for each program are described in the
101
individual files in /usr/share/doc/*/copyright.
102
103
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
104
permitted by applicable law.
105 18 Ivan Lezhnjov
Last login: Thu Oct 18 06:07:22 2012 from bitarena.localdomain
106
postgres@bitarena-clone:~$ less .ssh/authorized_keys
107
postgres@bitarena-clone:~$ </pre>
108 1 Ivan Lezhnjov
109 18 Ivan Lezhnjov
This will copy @postgres@ system user's public SSH key on master to standby into a file @authorized_keys@, which will allow @postgres@ on master to login to slave host without a password and also in a secure fashion.
110 1 Ivan Lezhnjov
111 19 Ivan Lezhnjov
Remember that this has been done on master server only. In a similar manner, you should take care of the slave host.
112 1 Ivan Lezhnjov
113 20 Ivan Lezhnjov
It's worth noting that pitrtools runs some actions remotely. For example, when a base backup action is run on a slave host, cmd_standby script establishes SSH session to a master host and runs various psql commands to deal with checkpoints, copy log files, etc. This also requires PostgreSQL user password to access the database, and it is often needed to be entered 4 and more times for a base backup action to complete. 
114 1 Ivan Lezhnjov
115 20 Ivan Lezhnjov
Once you played around with pitrtools enough to get hang of things, you could avoid having to enter password manually each time by either using .pgpass file (which is a "standard feature of PostgreSQL":http://www.postgresql.org/docs/8.4/static/libpq-pgpass.html) or make sure there's a "trust relationship configured for localhost in @pg_hba.conf@ file":http://www.postgresql.org/docs/8.4/static/auth-methods.html.
116 19 Ivan Lezhnjov
117
For the sake of clarity, consider these examples:
118
119
.pgpass
120
<pre>
121
127.0.0.1:*:postgres:postgres:postgrespass
122
</pre>
123
124
pg_hba.conf
125
<pre>
126
host postgres postgres 127.0.0.1/32 trust
127
</pre>
128 1 Ivan Lezhnjov
129
In the console output examples you're going to see below I entered password manually, but this, of course, is really inconvenient in production environment.
130
131
h1. Master Configuration File
132
133 20 Ivan Lezhnjov
Sample configuration files are pretty good as they are with the defaults they come. Chances are you're not going to change a lot in there. Make sure you read all *.README files, though, because they contain helpful extra information about pitrtools and configuration parameters that will help you decide how to best configure your servers.
134 1 Ivan Lezhnjov
135 20 Ivan Lezhnjov
We start with the master host by editing pitrtools configuration file for master/archiver.
136 1 Ivan Lezhnjov
137 27 Ivan Lezhnjov
<pre>postgres@bitarena:~$ vim pitrtools/etc/cmd_archiver.ini
138 1 Ivan Lezhnjov
	
139
	[DEFAULT]
140
	; online or offline
141
	state: online					
142
	
143
	; The base database directory
144
	
145
	pgdata: /var/lib/postgresql/8.4/main
146
	
147
	; where to remotely copy archives
148
	r_archivedir: /var/lib/postgresql/archive
149
	
150
	; where to locally copy archives
151
	l_archivedir: /var/lib/postgresql/archive
152 25 Ivan Lezhnjov
153
	; where is ssh and what flags to add
154
	ssh: /usr/bin/ssh
155 1 Ivan Lezhnjov
	
156
	; where is rsync				
157
	rsync_bin: /usr/bin/rsync			
158
	
159
	; extra rsync flags
160
	rsync_flags: -z
161
	
162
	; option 2 or 3, if running RHEL5 or similar it is likely 2
163
	; if you are running something that ships remotely modern software
164
	; it will be 3
165
	
166
	rsync_version = 3
167
	
168
	; IP of slave					
169
	slaves: bitarena-clone
170
	
171 25 Ivan Lezhnjov
	; the user for the ssh connection to a standby
172 1 Ivan Lezhnjov
	user: postgres				
173
	
174 25 Ivan Lezhnjov
	; if rsync can't connect in 10 seconds error
175 1 Ivan Lezhnjov
	ssh_timeout: 10
176
	
177
	; command to process in ok					
178
	notify_ok: echo OK
179
	
180
	; command to process in warning
181
	notify_warning:  echo WARNING
182
	
183
	; command to process in critical
184
	notify_critical: echo CRITICAL
185
	
186
	; if you want to debug on/off only		 	
187
	debug: on
188
	
189
	; if you want ssh debug (warning noisy)
190
	ssh_debug: off
191
</pre>
192
193
194 20 Ivan Lezhnjov
Note that you can use domain names instead of IP addresses for slaves: parameter, it works either way just fine. You might also want to turn debugging on while you're learning pitrtools. It helps to see what software does, if you try to understand better how it works.
195 1 Ivan Lezhnjov
196 11 Ivan Lezhnjov
h2. Turn On and Configure Archiving
197 1 Ivan Lezhnjov
198 20 Ivan Lezhnjov
If you are not using Streaming Replication / Hot Standby, we need to turn on archiving functionality in PostgreSQL. This is purely a PostgreSQL feature, but we can also take advantage of using pitrtools thanks to flexible design of PostgreSQL.
199 2 Ivan Lezhnjov
200
<pre>postgres@bitarena:~$ vim /etc/postgresql/8.4/main/postgresql.conf
201
202
...
203
archive_mode=on
204 26 Ivan Lezhnjov
archive_command = '/var/lib/postgresql/pitrtools/bin/cmd_archiver -C /var/lib/postgresql/pitrtools/etc/cmd_archiver.ini -F %p'
205 2 Ivan Lezhnjov
...
206
</pre>
207
208
209 20 Ivan Lezhnjov
Here we explicitly turn archiving mode on and tell PostgreSQL to use @cmd_archiver@, part of pitrtools, as the archiving command. 
210 2 Ivan Lezhnjov
It must be provided with a path to a configuration file (@-C@ switch), and a path to a log file (@-F@ switch). @%p@ is substituted by PostgreSQL with the actual log file location in the file system. To learn more about how to use @cmd_archiver@ run
211
212 20 Ivan Lezhnjov
<pre>postgres@bitarena:~$ pitrtools/bin/cmd_archiver --help</pre>
213 2 Ivan Lezhnjov
214
215 20 Ivan Lezhnjov
Please, keep in mind that pitrtools isn't supposed to be run by @root@ user. Technically, it can be run by any system user other than root, and unless you have a customized configuration your default PostgreSQL user will be @postgres@, so pitrtools are expected to be run as that system user.
216 2 Ivan Lezhnjov
217
Restart PostgreSQL for changes to take effect.
218
219
<pre>postgres@bitarena:~$ /etc/init.d/postgresql restart
220
Restarting PostgreSQL 8.4 database server: main.
221
postgres@bitarena:~#
222
</pre>
223
224 12 Ivan Lezhnjov
h2. Install Helper Scripts
225 2 Ivan Lezhnjov
226 20 Ivan Lezhnjov
Apply cmd_standby.sql to the database of pitrtools user (usually @postgres@). This is required for master server only.
227 2 Ivan Lezhnjov
228 29 Ivan Lezhnjov
<pre>postgres@bitarena:~$ psql -U postgres < ~/pitrtools/scripts/cmd_standby.sql
229 2 Ivan Lezhnjov
CREATE FUNCTION
230
COMMENT
231
CREATE FUNCTION
232
CREATE FUNCTION
233
CREATE FUNCTION
234
COMMENT
235
CREATE FUNCTION
236
COMMENT
237 29 Ivan Lezhnjov
postgres@bitarena:~$</pre>
238 2 Ivan Lezhnjov
239 28 Ivan Lezhnjov
>Note: If you are running Postgres 9.2, use cmd_standby.92.sql
240 2 Ivan Lezhnjov
241 12 Ivan Lezhnjov
h2. Initialize Master Environment
242 2 Ivan Lezhnjov
243
At this point we're pretty much done configuring the master server. All that is left to do is initialize master environment by running
244
245 30 Ivan Lezhnjov
<pre>postgres@bitarena:~$ pitrtools/bin/cmd_archiver -C pitrtools/etc/cmd_archiver.ini -I
246 2 Ivan Lezhnjov
We are initializing queues, one moment.
247
248
NOTICE: init_env_func()
249
NOTICE: generate_slave_list_func()
250
NOTICE: Your slaves are: ['bitarena-clone']
251
postgres@bitarena:~$ ls -lah
252
total 40K
253
drwxr-xr-x  7 postgres postgres 4.0K Sep 28 10:25 .
254
drwxr-xr-x 34 root     root     4.0K Sep 27 02:32 ..
255
drwxr-xr-x  3 postgres postgres 4.0K Sep 24 15:10 8.4
256
drwx------  2 postgres postgres 4.0K Sep 27 02:23 .aptitude
257
drwxr-xr-x  3 postgres postgres 4.0K Sep 28 10:25 archive
258
-rw-------  1 postgres postgres 1001 Sep 28 10:03 .bash_history
259 20 Ivan Lezhnjov
drwxr-xr-x  2 postgres postgres 4.0K Sep 28 10:25 pitrtools
260 2 Ivan Lezhnjov
-rw-------  1 postgres postgres 1.4K Sep 28 05:08 .psql_history
261
drwx------  2 postgres postgres 4.0K Sep 28 09:29 .ssh
262
-rw-------  1 postgres postgres 3.6K Sep 28 10:25 .viminfo
263
postgres@bitarena:~$ ls -lah archive/bitarena-clone/
264
total 8.0K
265
drwxr-xr-x 2 postgres postgres 4.0K Sep 28 10:25 .
266
drwxr-xr-x 3 postgres postgres 4.0K Sep 28 10:25 ..
267
postgres@bitarena:~#</pre>
268
269
270
As you can see @-I@ switch tells @cmd_archiver@ to do a couple of internal actions, as well as prepare file system layout by creating necessary directories. 
271
272
@archive/@ directory has been created automatically by @cmd_archiver@ and contains sub-directory named after IP address or DNS name of a slave, @bitarena-clone@ in this example. This sub-directory is used in those circumstances when master fails to successfully transfer WAL log files to slave and stores them temporarily in this local directory. Once slave is back online, these files should be transferred to slave.
273
274
Effectively, after you've initialized master it starts trying to ship WAL files to slave. However, the slave host isn't configured yet, so the log delivery will fail and the log files should be found in @l_archivedir/slave_FQDNorIP/@ folder on master host. Once we configure slave, these will be shipped as soon as a new WAL segment is created on master.
275
276
h1. Slave aka Standby Server
277
278
Now we can prepare the slave host. SSH key-based login for @postgres@ system user should be working in both directions, from master to slave, as well as from slave to master.
279
280
Assuming you've arranged for this as it was suggested before, we can now go on and edit slave configuration file to perform first important step and initialize slave environment.
281
282 5 Ivan Lezhnjov
One of the configuration file parameters, namely @pgdata:@, will ask you to specify PostgreSQL data directory. You could look it up in @postgresql.conf@, or, If your PostgreSQL is already running, you could see this information like this:
283 2 Ivan Lezhnjov
284 31 Ivan Lezhnjov
<pre>postgres@bitarena-clone:~$ ps axuwf |grep postgre
285 2 Ivan Lezhnjov
postgres  3352  0.0  0.4  46452  5464 ?        S    09:59   0:03 /usr/lib/postgresql/8.4/bin/postgres 
286
-D /var/lib/postgresql/8.4/main -c config_file=/etc/postgresql/8.4/main/postgresql.conf
287
...</pre>
288
289
Remember, note down the path somewhere or copy it to the clipboard. We'll need it in a minute.
290
291 12 Ivan Lezhnjov
h2. Slave Configuration File
292 2 Ivan Lezhnjov
293
Now edit slave configuration file. Most defaults are good and safe options, but your setup may be different from what I have in this how-to. So, be careful to understand what you're doing.
294
295 33 Ivan Lezhnjov
<pre>postgres@bitarena-clone:~$ vim pitrtools/etc/cmd_standby.ini
296 32 Ivan Lezhnjov
[DEFAULT]
297 1 Ivan Lezhnjov
298 32 Ivan Lezhnjov
; what major version are we using?
299
pgversion: 8.4
300 1 Ivan Lezhnjov
301 32 Ivan Lezhnjov
; Used for PostgreSQL version 8.2 and below, should be set 
302
; to something > than checkpoint_segments on master
303
numarchives: 10
304 1 Ivan Lezhnjov
305 32 Ivan Lezhnjov
; Whether or not to use streaming replication.  If this is set to "on"
306
; PITRTools will configure the standby server to replicate from master
307
; using streaming replication.
308
; This can only be used with PostgreSQL 9.0 and up.
309
use_streaming_replication: no
310 1 Ivan Lezhnjov
311 32 Ivan Lezhnjov
; File to touch to end replication when using streaming replication.
312
trigger_file: /var/lib/postgresql/pitrtools/cmd_end_recovery
313 1 Ivan Lezhnjov
314 32 Ivan Lezhnjov
; User to connect to master DB while using streaming replication,
315
; ignored if not using streaming replication.
316
repl_db_user: replication
317
318
; Password for the user repl_db_user.
319
repl_db_password: replication
320
321
; sslmode to use when connecting for streaming replication.
322
; Accepted values: the same as libpq: disable, allow, prefer, require, verify-ca and verify-full
323
; Default: sslmode: prefer
324
sslmode: prefer
325
326
327
328
; Commands needed for execution
329
330
331
; absolute path to ssh, note you can also add flags such as -P for port.
332
ssh: /usr/bin/ssh
333
334
; absolute path to rsync
335
rsync: /usr/bin/rsync
336
337
; extra rsync flags
338
rsync_flags: -z
339
340
; the path to the postgres bin
341
pg_standby: /usr/lib/postgresql/8.4/bin/pg_standby
342
pg_ctl: /usr/lib/postgresql/8.4/bin/pg_ctl
343
344
; set it for 9.0 and above if streaming replication is enabled
345
pg_archivecleanup: /usr/lib/postgresql/9.0/bin/pg_archivecleanup
346
347
; path to psql on the Master
348
r_psql: /usr/lib/postgresql/8.4/bin/psql
349
350
; Generalized information
351
352
; the port postgresql runs on (master)
353
port: 5432
354
355
; IP or name of Master server
356
master_public_ip: bitarena
357
358
; the IP address we should use when processing remote shell
359
master_local_ip: 127.0.0.1
360
361
; the user performed initdb
362
user: postgres
363
364
; on or off
365
debug: on
366
367
; on or off
368
ssh_debug: off
369
370
; the timeout for ssh before we throw an alarm
371
ssh_timeout: 30
372
373
; should be the same as r_archivedir for archiver
374
archivedir: /var/lib/postgresql/archive
375
376
; where you executed initdb -D to
377
pgdata: /var/lib/postgresql/8.4/main
378
379
380
381
; Confs
382
383
; This is the postgresql.conf to be used for the failover
384
pg_conf_failover: /var/lib/postgresql/pitrtools/etc/pg_conf_failover/postgresql.conf
385
386
; This is the pg_hba.conf to be used for the failover
387
pg_hba_conf_failover: /var/lib/postgresql/pitrtools/etc/pg_conf_failover/pg_hba.conf
388
389
390
; This is the postgresql.conf to be used for Postgres when in standby mode
391
pg_conf: /var/lib/postgresql/pitrtools/etc/pg_conf/postgresql.conf
392
393
; This is the pg_hba.conf to be used for Postgres when in standby mode
394
pg_hba_conf: /var/lib/postgresql/pitrtools/etc/pg_conf/pg_hba.conf
395
396
; By default postgresql.conf and pg_hba.conf will be copied from the
397
; locations specified above to pgdata directory on failover.
398
;
399
; Uncomment the following to make postgres actually use the above conf
400
; files w/o copying them to pgdata.
401
;no_copy_conf: true
402
403
; The recovery.conf file to create when starting up
404
; Defaults to %(pgdata)/recovery.conf
405
recovery_conf: /var/lib/postgresql/8.4/main/recovery.conf
406
407
; Useful when postgresql.conf doesn't specify log destination
408
; Will be passed with -l to pg_ctl when starting the server.
409
;
410
; If you're worried about having complete logs, either make sure
411
; postgresql.conf points to a log file, or use the logfile: parameter.
412
;
413
; Otherwise postgresql will print on standard stdout and nothing
414
; will be recorded in the logs
415
;
416
logfile: /var/log/postgresql/postgresql-8.4-main.log
417
418
419
420
; Alarms
421
422
423
notify_critical: echo CRITICAL
424
notify_warning: echo WARNING
425
notify_ok: echo OK
426
427
; On failover action
428
429
; Whatever is placed here will be executed on -FS must return 0,
430
; must have execution bit for at least owner (chmod u+x)
431
432
action_failover: /var/lib/postgresql/pitrtools/scripts/failover.sh
433
</pre>
434
435
@action_failover:@ script has to exist and have permissions of at least @chmod u+x@ equivalent, it could be just a placeholder script with a simple action like this:
436
437 2 Ivan Lezhnjov
<pre>#!/bin/bash
438 20 Ivan Lezhnjov
touch /var/lib/postgresql/pitrtools/failover_happened</pre>
439 2 Ivan Lezhnjov
	
440 20 Ivan Lezhnjov
but it's meant as a way to let you do certain actions on failover, those would be very specific for each given setup. It's good to know, though, that pitrtools lets you take actions automatically when failover happens. Use this feature to make your setup more sophisticated.
441 17 Ivan Lezhnjov
442 32 Ivan Lezhnjov
In addition, when doing a failover, there are two more additional options to take into consideration, namely @pg_conf_failover:@ and @pg_hba_conf_failover:@. Both allow you to start server on failover using alternative configuration. This is meant to provide users with a way to prepare their failover scenario configuration in advance.
443 2 Ivan Lezhnjov
444 12 Ivan Lezhnjov
h2. Initialize Slave Environment
445 2 Ivan Lezhnjov
446
First stop PostgreSQL, then initialize slave environment.
447
448 34 Ivan Lezhnjov
<pre>postgres@bitarena-clone:~/pitrtools$ bin/cmd_standby -C etc/cmd_standby.ini -I
449 2 Ivan Lezhnjov
NOTICE: check_pgpid_func()
450 34 Ivan Lezhnjov
DEBUG: executing query 'SELECT * FROM cmd_get_data_dirs()' by /usr/bin/ssh -o ConnectTimeout=30 -o StrictHostKeyChecking=no  postgres@bitarena  "/usr/lib/postgresql/8.4/bin/psql -A -t -Upostgres -p5432 -dpostgres -h127.0.0.1 
451 2 Ivan Lezhnjov
DEBUG: /var/lib/postgresql/8.4/main
452 34 Ivan Lezhnjov
Standby is ready</pre>
453 2 Ivan Lezhnjov
454 20 Ivan Lezhnjov
Please, note that if @archivedir: /var/lib/postgresql/archive@ hasn't been created, you should do so manually as @postgres@ system user (or set @postgres@ user and group as the ownership information for this directory). pitrtools should do this automatically for you, but earlier versions were known not to do so. This is important, and the next step in slave configuration, which is base backup, will fail if @archivedir:@ doesn't exist.
455 3 Ivan Lezhnjov
456 12 Ivan Lezhnjov
h2. Making a Base Backup
457 3 Ivan Lezhnjov
458
Before you proceed check if @archivedir:@ exists on slave and WAL files are being shipped to it from master host. WAL files are generated and shipped only when new data is stored in the database on master host. To help simulate data flow and check whether archiving and shipping is happening, try this SQL statement on master host:
459
460
<pre>postgres@bitarena:~$ psql
461
psql (8.4.13)
462
Type "help" for help.
463
464 20 Ivan Lezhnjov
postgres=# create table testpitrtools1 as select * from pg_class, pg_description;
465 3 Ivan Lezhnjov
postgresq=# \q
466
postgres@bitarena:~$</pre>
467
	
468
You could create a couple of tables like that to generate enough WAL segments. See @archivedir:@ directory on slave to check whether any WAL files have been copied there. If they have, everything works as expected and you can try to make a base backup on slave host:
469
470 35 Ivan Lezhnjov
<pre>postgres@bitarena-clone:~$ pitrtools/bin/cmd_standby -C pitrtools/etc/cmd_standby.ini -B
471 3 Ivan Lezhnjov
NOTICE: check_pgpid_func()
472 35 Ivan Lezhnjov
DEBUG: executing query  'checkpoint'  by /usr/bin/ssh -o ConnectTimeout=30 -o StrictHostKeyChecking=no  postgres@bitarena  "/usr/lib/postgresql/8.4/bin/psql -A -t -Upostgres -p5432 -dpostgres -h127.0.0.1 
473 1 Ivan Lezhnjov
DEBUG: CHECKPOINT
474 35 Ivan Lezhnjov
DEBUG: executing query  'SELECT cmd_pg_start_backup()'  by /usr/bin/ssh -o ConnectTimeout=30 -o StrictHostKeyChecking=no  postgres@bitarena  "/usr/lib/postgresql/8.4/bin/psql -A -t -Upostgres -p5432 -dpostgres -h127.0.0.1 
475 1 Ivan Lezhnjov
DEBUG: cmd_pg_start_backup:  1
476 35 Ivan Lezhnjov
DEBUG: executing query 'SELECT * FROM cmd_get_data_dirs()' by /usr/bin/ssh -o ConnectTimeout=30 -o StrictHostKeyChecking=no  postgres@bitarena  "/usr/lib/postgresql/8.4/bin/psql -A -t -Upostgres -p5432 -dpostgres -h127.0.0.1 
477
DEBUG: executing query 'SELECT * FROM cmd_get_pgdata() LIMIT 1' by /usr/bin/ssh -o ConnectTimeout=30 -o StrictHostKeyChecking=no  postgres@bitarena  "/usr/lib/postgresql/8.4/bin/psql -A -t -Upostgres -p5432 -dpostgres -h127.0.0.1 
478
DEBUG: cleaning up /var/lib/postgresql/8.4/main/pg_xlog/ directory before rsync
479 3 Ivan Lezhnjov
receiving incremental file list
480 35 Ivan Lezhnjov
deleting recovery.done
481
deleting backup_label.old
482 1 Ivan Lezhnjov
./
483
backup_label
484 3 Ivan Lezhnjov
postmaster.opts
485 35 Ivan Lezhnjov
base/11917/
486
base/11917/pg_internal.init
487
base/16392/
488
base/16392/pg_internal.init
489
base/25372/
490
deleting global/pgstat.stat
491
base/25372/pg_internal.init
492 3 Ivan Lezhnjov
global/
493
global/pg_control
494 35 Ivan Lezhnjov
global/pg_internal.init
495 3 Ivan Lezhnjov
pg_clog/0000
496
pg_multixact/offsets/0000
497 35 Ivan Lezhnjov
pg_notify/
498
pg_notify/0000
499 3 Ivan Lezhnjov
pg_stat_tmp/
500
pg_stat_tmp/pgstat.stat
501 35 Ivan Lezhnjov
pg_subtrans/0000
502 3 Ivan Lezhnjov
503 35 Ivan Lezhnjov
Number of files: 1199
504
Number of files transferred: 12
505
Total file size: 30675262 bytes
506
Total transferred file size: 353574 bytes
507
Literal data: 289873 bytes
508
Matched data: 63701 bytes
509
File list size: 13862
510
File list generation time: 0.008 seconds
511 3 Ivan Lezhnjov
File list transfer time: 0.000 seconds
512 35 Ivan Lezhnjov
Total bytes sent: 1606
513
Total bytes received: 54020
514 3 Ivan Lezhnjov
515 35 Ivan Lezhnjov
sent 1606 bytes  received 54020 bytes  37084.00 bytes/sec
516
total size is 30675262  speedup is 551.46
517
DEBUG: executing query  'SELECT cmd_pg_stop_backup()'  by /usr/bin/ssh -o ConnectTimeout=30 -o StrictHostKeyChecking=no  postgres@bitarena  "/usr/lib/postgresql/8.4/bin/psql -A -t -Upostgres -p5432 -dpostgres -h127.0.0.1 
518
NOTICE:  pg_stop_backup complete, all required WAL segments have been archived
519
CONTEXT:  SQL function "cmd_pg_stop_backup" statement 1
520
DEBUG: cmd_pg_stop_backup: 1
521
Base backup finished successfully</pre>
522 3 Ivan Lezhnjov
523 20 Ivan Lezhnjov
As you can see what happens is that pitrtools puts master into backup mode, synchronizes data directories (including tablespaces, if any) from master to slave and then exits backup mode on master. If base backup action had failed before it properly finished (say, you had lost connection to slave while rsync was copying files over to it), you'd need to intervene and run manually @-Astop_basebackup:@
524 3 Ivan Lezhnjov
525 35 Ivan Lezhnjov
<pre>postgres@bitarena-clone:~$ pitrtools/bin/cmd_standby -C pitrtools/etc/cmd_standby.ini -Astop_basebackup
526 3 Ivan Lezhnjov
...</pre>
527
528
After that run base backup action again. Just make sure it finishes its work properly (use console output for successful base backup action above as a reference).
529
530 22 Ivan Lezhnjov
h2. Start a Standby
531
532 3 Ivan Lezhnjov
If you want a cold standby you're done. If you need a warm standby, then run:
533
534 36 Ivan Lezhnjov
<pre>postgres@bitarena-clone:~$ pitrtools/bin/cmd_standby -C pitrtools/etc/cmd_standby.ini -S
535 3 Ivan Lezhnjov
NOTICE: check_pgpid_func()
536 1 Ivan Lezhnjov
server starting
537 36 Ivan Lezhnjov
NOTICE: check_pgpid_func()
538
Successfully entered standby mode</pre>
539
540 37 Ivan Lezhnjov
Take a look at @logfile:@, the @/var/log/postgresql/postgresql-8.4-main.log@ in our case. A typical sequence of events will look something like this:
541 36 Ivan Lezhnjov
542
<pre>2012-10-16 02:47:31 PDT LOG:  database system was interrupted; last known up at 2012-10-16 02:44:37 PDT
543 3 Ivan Lezhnjov
2012-10-16 02:47:31 PDT LOG:  starting archive recovery
544
2012-10-16 02:47:31 PDT LOG:  restore_command = '/usr/lib/postgresql/8.4/bin/pg_standby -s5 -w0 -c -d  /var/lib/postgresql/archive %f %p %r '
545
Trigger file 		: <not set>
546
Waiting for WAL file	: 00000001.history
547
WAL file path		: /var/lib/postgresql/archive/00000001.history
548
Restoring to		: pg_xlog/RECOVERYHISTORY
549
Sleep interval		: 5 seconds
550
Max wait interval	: 0 forever
551
Command for restore	: cp "/var/lib/postgresql/archive/00000001.history" "pg_xlog/RECOVERYHISTORY"
552
Keep archive history	: 000000000000000000000000 and later
553
running restore		:cp: cannot stat `/var/lib/postgresql/archive/00000001.history': No such file or directory
554
cp: cannot stat `/var/lib/postgresql/archive/00000001.history': No such file or directory
555
cp: cannot stat `/var/lib/postgresql/archive/00000001.history': No such file or directory
556
cp: cannot stat `/var/lib/postgresql/archive/00000001.history': No such file or directory
557
not restored
558
history file not found
559
Trigger file 		: <not set>
560
Waiting for WAL file	: 000000010000000100000025.00000020.backup
561
WAL file path		: /var/lib/postgresql/archive/000000010000000100000025.00000020.backup
562
Restoring to		: pg_xlog/RECOVERYHISTORY
563
Sleep interval		: 5 seconds
564
Max wait interval	: 0 forever
565
Command for restore	: cp "/var/lib/postgresql/archive/000000010000000100000025.00000020.backup" "pg_xlog/RECOVERYHISTORY"
566
Keep archive history	: 000000000000000000000000 and later
567
running restore		: OK
568
2012-10-16 02:48:01 PDT LOG:  restored log file "000000010000000100000025.00000020.backup" from archive
569
Trigger file 		: <not set>
570
Waiting for WAL file	: 000000010000000100000025
571
WAL file path		: /var/lib/postgresql/archive/000000010000000100000025
572
Restoring to		: pg_xlog/RECOVERYXLOG
573
Sleep interval		: 5 seconds
574
Max wait interval	: 0 forever
575
Command for restore	: cp "/var/lib/postgresql/archive/000000010000000100000025" "pg_xlog/RECOVERYXLOG"
576
Keep archive history	: 000000000000000000000000 and later
577
running restore		: OK
578
579
2012-10-16 02:48:02 PDT LOG:  restored log file "000000010000000100000025" from archive
580
2012-10-16 02:48:02 PDT LOG:  automatic recovery in progress
581
2012-10-16 02:48:02 PDT LOG:  redo starts at 1/25000020, consistency will be reached at 1/2504FFC4
582
2012-10-16 02:48:03 PDT LOG:  consistent recovery state reached
583
Trigger file 		: <not set>
584
Waiting for WAL file	: 000000010000000100000026
585
WAL file path		: /var/lib/postgresql/archive/000000010000000100000026
586
Restoring to		: pg_xlog/RECOVERYXLOG
587
Sleep interval		: 5 seconds
588
Max wait interval	: 0 forever
589
Command for restore	: cp "/var/lib/postgresql/archive/000000010000000100000026" "pg_xlog/RECOVERYXLOG"
590
Keep archive history	: 000000010000000100000025 and later
591
WAL file not present yet.
592
WAL file not present yet.
593
WAL file not present yet.
594
WAL file not present yet.
595
WAL file not present yet.
596
WAL file not present yet.
597
WAL file not present yet.
598
WAL file not present yet.
599 1 Ivan Lezhnjov
WAL file not present yet.
600 3 Ivan Lezhnjov
WAL file not present yet.</pre>
601
602 36 Ivan Lezhnjov
At this point PostgreSQL armed with pitrtools on master server will be continuously shipping log files to @archivedir:@ on slave. Once shipped, the WAL files will be immediately replayed, because slave in standby mode continuously scans @archivedir:@ for new WAL files and replays them as soon as they become available (this can be seen from example above).
603 3 Ivan Lezhnjov
604
h1. Failing Over
605
606 20 Ivan Lezhnjov
Now you have a warm standby mirroring changes occurring on the master server. When your master server becomes unavailable due to any reason, you could turn this warm standby server into a production instance by simply running on standby machine a failover action as shown below. For this PostgreSQL on master must not be running, otherwise pitrtools will throw out a warning and refuse to failover.
607 3 Ivan Lezhnjov
608 20 Ivan Lezhnjov
<pre>postgres@bitarena-clone:~$ pitrtools/bin/cmd_standby -C pitrtools/cmd_standby.ini -F999
609 3 Ivan Lezhnjov
610
NOTICE: check_pgpid_func()
611
2012-10-16 02:56:20 PDT LOG:  received fast shutdown request
612
2012-10-16 02:56:20 PDT LOG:  aborting any active transactions
613
waiting for server to shut down....2012-10-16 02:56:20 PDT LOG:  shutting down
614
2012-10-16 02:56:20 PDT LOG:  database system is shut down
615
 done
616
server stopped
617
server starting
618
NOTICE: Statistics are not replicated in warm standy mode.
619
HINT: Execute ANALYZE on your databases
620
postgres@bitarena-clone:~$ 2012-10-16 02:56:22 PDT LOG:  database system was interrupted while in recovery at log time 2012-10-16 02:54:35 PDT
621
2012-10-16 02:56:22 PDT HINT:  If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.
622
2012-10-16 02:56:22 PDT LOG:  starting archive recovery
623
2012-10-16 02:56:22 PDT LOG:  restore_command = 'cp /var/lib/postgresql/archive/%f "%p"'
624
cp: cannot stat `/var/lib/postgresql/archive/00000001.history': No such file or directory
625
2012-10-16 02:56:23 PDT LOG:  restored log file "000000010000000100000026" from archive
626
2012-10-16 02:56:23 PDT LOG:  automatic recovery in progress
627
2012-10-16 02:56:23 PDT LOG:  redo starts at 1/2635B428, consistency will be reached at 1/27000000
628
cp: cannot stat `/var/lib/postgresql/archive/000000010000000100000027': No such file or directory
629
2012-10-16 02:56:23 PDT LOG:  could not open file "pg_xlog/000000010000000100000027" (log file 1, segment 39): No such file or directory
630
2012-10-16 02:56:23 PDT LOG:  redo done at 1/265AD744
631
2012-10-16 02:56:23 PDT LOG:  last completed transaction was at log time 2012-10-16 02:55:07.387709-07
632
2012-10-16 02:56:24 PDT LOG:  restored log file "000000010000000100000026" from archive
633
cp: cannot stat `/var/lib/postgresql/archive/00000002.history': No such file or directory
634
cp: cannot stat `/var/lib/postgresql/archive/00000003.history': No such file or directory
635
cp: cannot stat `/var/lib/postgresql/archive/00000004.history': No such file or directory
636
cp: cannot stat `/var/lib/postgresql/archive/00000005.history': No such file or directory
637
cp: cannot stat `/var/lib/postgresql/archive/00000006.history': No such file or directory
638
cp: cannot stat `/var/lib/postgresql/archive/00000007.history': No such file or directory
639
cp: cannot stat `/var/lib/postgresql/archive/00000008.history': No such file or directory
640
cp: cannot stat `/var/lib/postgresql/archive/00000009.history': No such file or directory
641
2012-10-16 02:56:24 PDT LOG:  selected new timeline ID: 9
642
cp: cannot stat `/var/lib/postgresql/archive/00000001.history': No such file or directory
643
2012-10-16 02:56:25 PDT LOG:  archive recovery complete
644
2012-10-16 02:56:26 PDT LOG:  database system is ready to accept connections
645
2012-10-16 02:56:26 PDT LOG:  autovacuum launcher started
646
647
postgres@bitarena-clone:~$</pre>
648
649
This will create @recovery.conf@ file under @pg_data:@ directory and restart PostgreSQL to enter production mode of operation.
650
651
After this, you'd be basically running a copy of production master server. Keep in mind that you would also need to change IP addresses and/or load balancing configuration, routing, firewall rules or anything else that might be in the way of establishing a successful connection to this host. This is where @action_failover:@ script could come in handy. 
652
653
Plan in advance, figure this all out to avoid any downtime before you will need to failover.
654
655
656 20 Ivan Lezhnjov
h1. Other things you could do with pitrtools and some tips
657 3 Ivan Lezhnjov
658 20 Ivan Lezhnjov
This how-to is meant to help you get started with pitrtools. pitrtools can do more, though, than just help you configure standby.
659 3 Ivan Lezhnjov
660
661
h2. Point-In-Time Recovery
662
663 20 Ivan Lezhnjov
<pre>postgres@bitarena-clone:~$ pitrtools/bin/cmd_standby -C pitrtools/cmd_standby.ini -F999 -R '2008-05-28 11:00:38.059389'
664 3 Ivan Lezhnjov
...</pre>
665
666
This is essentially a restore to a specific point in time action. In general, you can only restore to a point in time while using cold standby, because PITR will "stop" recovering at the point in time you've specified, and both warm and hot standby servers have already recovered all the WAL files they can. In this regard, it would be a good idea to have a cold standby around for disaster (at logical level) recovery.
667
668
Once this has been done, you can't choose another timestamp to restore to.
669
670
h2. Entering Standby Mode After Failover On Slave
671
672
Suppose you failed over to your standby slave, which is running now as a replacement of master for your applications. You've fixed problems with the actual master and want this slave host to enter standby mode again. Here's how you'd do it:
673
674 20 Ivan Lezhnjov
<pre>postgres@bitarena-clone:~$ pitrtools/bin/cmd_standby -C pitrtools/cmd_standby.ini -Astop
675 3 Ivan Lezhnjov
...</pre>
676
677
This will stop entire PostgreSQL service. You could also use PostgreSQL init script to achieve the same instead. If you need more fine-grained control use @pg_ctlcluser 8.4 main stop@ (see @man pg_ctlcluster@ for more details). Take a new base backup as before and enter standby mode:
678
679 20 Ivan Lezhnjov
<pre>postgres@bitarena-clone:~$ pitrtools/bin/cmd_standby -C pitrtools/cmd_standby.ini -B
680 3 Ivan Lezhnjov
...
681
682 20 Ivan Lezhnjov
postgres@bitarena-clone:~$ pitrtools/bin/cmd_standby -C pitrtools/cmd_standby.ini -S
683 3 Ivan Lezhnjov
...</pre>
684
685
Again, if you want a cold standby just don't run -S action after -B.
686 4 Ivan Lezhnjov
687
h2. Alerts
688
689 20 Ivan Lezhnjov
Alerting is designed to run your custom scripts. You could easily integrate pitrtools alerting to your existing NMS be it Nagios, Zabbix, or anything else, send e-mails or take actions you decide.
690 4 Ivan Lezhnjov
691
h2. Logging
692
693
It bears repeating -- just In case you overlooked this in standby sample configuration file notes -- if your @postgresql.conf@ doesn't specify a log file to write to and you don't use @logfile:@ parameter in @cmd_standby.ini@ the output will be directed to stdout (your console) and nothing will ever be written to a log file on disk. 
694
695
If you restart PostgreSQL, that'll fix the problem, but you can avoid it in the first place by either specifying log file to write to in @postgresql.conf@ or by using @logfile:@ parameter in @cmd_standby.ini@.
696
697
h2. Troubleshooting
698
699
Set @debug:@ parameter in configuration files to @on@ and scrutinize the information. PostgreSQL log file is also a good place to look at.
700
701
h1. Getting Help
702
703 20 Ivan Lezhnjov
A very low-traffic mailing list for pitrtools can be found here http://lists.commandprompt.com/mailman/listinfo/pitrtools/
704 4 Ivan Lezhnjov
There is also consulting available from "Command Prompt":https://commandprompt.com/contact/