Project

General

Profile

How To Configure Warm Standby with PITRTools » History » Version 19

Ivan Lezhnjov, 04/24/2013 09:36 AM

1 1 Ivan Lezhnjov
{{toc}}
2
3
h1. How To Configure Warm Standby with PITRTools
4
5
This how-to demonstrates how to configure a warm standby using PostgreSQL 8.4 and PITRtools.
6
7 7 Joshua Drake
h1. What Is PITRtools and Why Would You Want To Use It?
8 1 Ivan Lezhnjov
9
Essentially, PITRtools is a wrapper around standard tools, such as rsync and PostgreSQL's internal functionality, that makes, among other things, creating and managing standby configurations and subsequent failover to a standby a snap.
10
11
With the help of PITRtools you could do more, namely:
12
* secure shipping of log files to configured standby server over SSL protected link;
13
* streaming replication;
14
* enable/disable archiving without the need to restart PostgreSQL;
15
* stay informed by generating alerts based on various levels of severity of events happening during the process on both ends of configuration;
16
* automatically take a base backup, including table spaces, restore archives and purge old ones (if PostgreSQL >8.3);
17
* failover to the latest restore point and point-in-time recovery (restore using timestamps);
18
* etc.
19
20
Obviously, PITRtools attempts to make things simpler, more secure and easier to manage.
21
22
h1. Naming Conventions
23
24
Master server can be referred to also as archiver.
25
Slave is often called standby.
26
27
These names are used interchangeably.
28
29
h1. The Test Setup
30
31 15 Ivan Lezhnjov
It's a good idea to first setup PITRtools and play with it before you actually go ahead and change configuration of your production servers. To show how PITRtools is configured and work, I'll describe the entire process using a 2 hosts test setup as an example.
32 1 Ivan Lezhnjov
33
So, there are 2 hosts named @bitarena@ and @bitarena-clone@, both being virtualized instances of Debian Squeeze. I assume you're experienced enough to install Debian yourself and know how to find your way around the system.
34
35
@bitarena@ is designated role of master server aka archiver, @bitarena-clone@ is a slave aka standby server. PITRtools is installed on both hosts, but each host uses different tools from the package.
36
37
38
h1. The Process
39
40
These are major steps in actual order of execution that one would have to follow to get PITRtools-enabled setup running:
41
42
On master server
43
44
		* Turn on archiving
45
		* @cmd_archiver -C $CONFIG -I@
46
47
On slave server
48
49
		* @cmd_standby -C $CONFIG -I@
50
		* @cmd_standby -C $CONFIG -B@
51
		* @cmd_standby -C $CONFIG -S@
52
53
h1. Installing and Configuring PITRtools
54
55
PITRtools project page can be found at https://public.commandprompt.com/projects/pitrtools. Of particular interest is a Wiki page https://public.commandprompt.com/projects/pitrtools/wiki where you can find information on how to obtain PITRtools, other useful notes and links.
56
57
You should use GIT version, because currently tarball offers an outdated version of PITRtools. I was told this is going to be fixed soon, but until then you should rely on the GIT repository. 
58
59
Note that essentially you need to have contents of this repository on both master and slave server. For the sake of brevity, I'm going to show the layout I use only on master server. You would then need to repeat the steps for slave server, because all that is going to change is the host where you clone PITRtools repository to.
60
61
62
<pre>root@bitarena ~/GIT# git clone git://github.com/commandprompt/PITRTools.git
63
Cloning into 'PITRTools'...
64
remote: Counting objects: 284, done.
65
remote: Compressing objects: 100% (183/183), done.
66
remote: Total 284 (delta 182), reused 203 (delta 101)
67
Receiving objects: 100% (284/284), 68.51 KiB, done.
68
Resolving deltas: 100% (182/182), done.
69
root@bitarena ~/GIT# ls
70
PITRTools
71
root@bitarena ~/GIT# cd PITRTools/
72
root@bitarena ~/GIT/PITRTools# ls
73
cmd_archiver             cmd_archiver.README  cmd_standby.ini.sample  cmd_standby.sql
74
cmd_archiver.ini.sample  cmd_standby          cmd_standby.README      cmd_worker.py
75
root@bitarena ~/GIT/PITRTools# mkdir -p /var/lib/postgresql/PITRtools/bin
76
root@bitarena ~/GIT/PITRTools# cp cmd_archiver cmd_standby cmd_worker.py /var/lib/postgresql/PITRtools/bin/
77
root@bitarena ~/GIT/PITRTools# cp *.ini.sample /var/lib/postgresql/PITRtools/
78
root@bitarena ~/GIT/PITRTools# cd /var/lib/postgresql/PITRtools/
79
root@bitarena /var/lib/postgresql/PITRtools# mv cmd_archiver.ini.sample cmd_archiver.ini
80
root@bitarena /var/lib/postgresql/PITRtools# mv cmd_standby.ini.sample cmd_standby.ini
81
root@bitarena /var/lib/postgresql/PITRtools# cd ~/GIT/PITRtools/
82
root@bitarena ~/GIT/PITRTools# chown -R postgres.postgres /var/lib/postgresql/PITRtools
83
root@bitarena:~#</pre>
84
85
h1. SSH Key-based Login for PITRtools
86
87
PITRtools relies on rsync and SSH heavily to do its work, e.g. making a base backup and shipping WAL log files from master to slave server -- all that happens over an SSL-protected communication channel. This is the area where PITRtools makes one's life easier, because otherwise you'd have to find a way to copy log files to a slave server somehow (most likely it would be some sort of networking mount point).
88
89 5 Ivan Lezhnjov
Therefore one of the prerequisites is to configure SSH key-based logins between the two hosts for @postgres@ system user that don't require a password or a passphrase.
90 1 Ivan Lezhnjov
91
*Master*
92
<pre>root@bitarena:~# su - postgres
93
94
postgres@bitarena:~$ ssh-keygen -t rsa
95
... 
96
postgres@bitarena:~$ ls -la /var/lib/postgresql/.ssh/
97
total 16
98
drwx------ 2 postgres postgres 4096 Sep 28 09:17 .
99
drwxr-xr-x 5 postgres postgres 4096 Sep 28 09:17 ..
100
-rw------- 1 postgres postgres 1675 Sep 28 09:17 id_rsa
101
-rw-r--r-- 1 postgres postgres  399 Sep 28 09:17 id_rsa.pub
102
postgres@bitarena:~#
103
</pre>
104
105
*Slave*
106
<pre>root@bitarena-clone:~# su - postgres
107
postgres@bitarena-clone:~$ ssh-keygen -t rsa
108
... </pre>
109
110
Answer all the questions and ssh-keygen will create both private and public (*.pub file) RSA keys. For now just create the key pair, one on on each host (master and slave). When done, proceed to exchanging public keys between the hosts like this:
111
112
*Master*
113 18 Ivan Lezhnjov
<pre>postgres@bitarena:~$ ssh-copy-id bitarena-clone
114
Now try logging into the machine, with "ssh 'bitarena-clone'", and check in:
115 1 Ivan Lezhnjov
116 18 Ivan Lezhnjov
  .ssh/authorized_keys
117 1 Ivan Lezhnjov
118 18 Ivan Lezhnjov
to make sure we haven't added extra keys that you weren't expecting.
119 1 Ivan Lezhnjov
120
postgres@bitarena:~$ ssh bitarena-clone
121 18 Ivan Lezhnjov
Linux bitarena-clone 2.6.32-5-686 #1 SMP Sun May 6 04:01:19 UTC 2012 i686
122 1 Ivan Lezhnjov
123
The programs included with the Debian GNU/Linux system are free software;
124
the exact distribution terms for each program are described in the
125
individual files in /usr/share/doc/*/copyright.
126
127
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
128
permitted by applicable law.
129 18 Ivan Lezhnjov
Last login: Thu Oct 18 06:07:22 2012 from bitarena.localdomain
130
postgres@bitarena-clone:~$ less .ssh/authorized_keys
131
postgres@bitarena-clone:~$ </pre>
132 1 Ivan Lezhnjov
133 18 Ivan Lezhnjov
This will copy @postgres@ system user's public SSH key on master to standby into a file @authorized_keys@, which will allow @postgres@ on master to login to slave host without a password and also in a secure fashion.
134 1 Ivan Lezhnjov
135 19 Ivan Lezhnjov
Remember that this has been done on master server only. In a similar manner, you should take care of the slave host.
136 1 Ivan Lezhnjov
137
It's worth noting that PITRtools runs some actions remotely. For example, when a base backup action is run on a slave host, cmd_standby script establishes SSH session to a master host and runs various psql commands to deal with checkpoints, copy log files, etc. This also requires PostgreSQL user password to access the database, and it is often needed to be entered 4 and more times for a base backup action to complete. 
138
139
Once you played around with PITRtools enough to get hang of things, you could avoid having to enter password manually each time by either using .pgpass file (which is a "standard feature of PostgreSQL":http://www.postgresql.org/docs/8.4/static/libpq-pgpass.html) or make sure there's a "trust relationship configured for localhost in @pg_hba.conf@ file":http://www.postgresql.org/docs/8.4/static/auth-methods.html.
140 19 Ivan Lezhnjov
141
For the sake of clarity, consider these examples:
142
143
.pgpass
144
<pre>
145
127.0.0.1:*:postgres:postgres:postgrespass
146
</pre>
147
148
pg_hba.conf
149
<pre>
150
host postgres postgres 127.0.0.1/32 trust
151
</pre>
152 1 Ivan Lezhnjov
153
In the console output examples you're going to see below I entered password manually, but this, of course, is really inconvenient in production environment.
154
155
h1. Master Configuration File
156
157
Sample configuration files are pretty good as they are with the defaults they come. Chances are you're not going to change a lot in there. Make sure you read all *.README files, though, because they contain helpful extra information about PITRtools and configuration parameters that will help you decide how to best configure your servers.
158
159
We start with the master host by editing PITRtools configuration file for master/archiver.
160
161
<pre>postgres@bitarena:~$ vim PITRtools/cmd_archiver.ini
162
	
163
	[DEFAULT]
164
	; online or offline
165
	state: online					
166
	
167
	; The base database directory
168
	
169
	pgdata: /var/lib/postgresql/8.4/main
170
	
171
	; where to remotely copy archives
172
	r_archivedir: /var/lib/postgresql/archive
173
	
174
	; where to locally copy archives
175
	l_archivedir: /var/lib/postgresql/archive
176
	
177
	; where is rsync				
178
	rsync_bin: /usr/bin/rsync			
179
	
180
	; extra rsync flags
181
	rsync_flags: -z
182
	
183
	; option 2 or 3, if running RHEL5 or similar it is likely 2
184
	; if you are running something that ships remotely modern software
185
	; it will be 3
186
	
187
	rsync_version = 3
188
	
189
	; IP of slave					
190
	slaves: bitarena-clone
191
	
192
	; the user that will be using scp				
193
	user: postgres				
194
	
195
	; if scp can't connect in 10 seconds error
196
	ssh_timeout: 10
197
	
198
	; command to process in ok					
199
	notify_ok: echo OK
200
	
201
	; command to process in warning
202
	notify_warning:  echo WARNING
203
	
204
	; command to process in critical
205
	notify_critical: echo CRITICAL
206
	
207
	; if you want to debug on/off only		 	
208
	debug: on
209
	
210
	; if you want ssh debug (warning noisy)
211
	ssh_debug: off
212
</pre>
213
214
215
Note that you can use domain names instead of IP addresses for slaves: parameter, it works either way just fine. You might also want to turn debugging on while you're learning PITRtools. It helps to see what software does, if you try to understand better how it works.
216
217 11 Ivan Lezhnjov
h2. Turn On and Configure Archiving
218 1 Ivan Lezhnjov
219 14 Joshua Drake
If you are not using Streaming Replication / Hot Standby, we need to turn on archiving functionality in PostgreSQL. This is purely a PostgreSQL feature, but we can also take advantage of using PITRtools thanks to flexible design of PostgreSQL.
220 2 Ivan Lezhnjov
221
<pre>postgres@bitarena:~$ vim /etc/postgresql/8.4/main/postgresql.conf
222
223
...
224
archive_mode=on
225
archive_command = '/var/lib/postgresql/PITRtools/bin/cmd_archiver -C /var/lib/postgresql/PITRtools/cmd_archiver.ini -F %p'
226
...
227
</pre>
228
229
230
Here we explicitly turn archiving mode on and tell PostgreSQL to use @cmd_archiver@, part of PITRtools, as the archiving command. 
231
It must be provided with a path to a configuration file (@-C@ switch), and a path to a log file (@-F@ switch). @%p@ is substituted by PostgreSQL with the actual log file location in the file system. To learn more about how to use @cmd_archiver@ run
232
233
<pre>postgres@bitarena:~$ PITRtools/bin/cmd_archiver --help</pre>
234
235
236 16 Ivan Lezhnjov
Please, keep in mind that PITRtools isn't supposed to be run by @root@ user. Technically, it can be run by any system user other than root, and unless you have a customized configuration your default PostgreSQL user will be @postgres@, so PITRtools are expected to be run as that system user.
237 2 Ivan Lezhnjov
238
Restart PostgreSQL for changes to take effect.
239
240
<pre>postgres@bitarena:~$ /etc/init.d/postgresql restart
241
Restarting PostgreSQL 8.4 database server: main.
242
postgres@bitarena:~#
243
</pre>
244
245 12 Ivan Lezhnjov
h2. Install Helper Scripts
246 2 Ivan Lezhnjov
247 5 Ivan Lezhnjov
Apply cmd_standby.sql to the database of PITRtools user (usually @postgres@). This is required for master server only.
248 2 Ivan Lezhnjov
249
<pre>root@bitarena:~# psql -U postgres < ~/GIT/PITRTools/cmd_standby.sql
250
CREATE FUNCTION
251
COMMENT
252
CREATE FUNCTION
253
CREATE FUNCTION
254
CREATE FUNCTION
255
COMMENT
256
CREATE FUNCTION
257
COMMENT
258
root@bitarena:~#</pre>
259
260 13 Joshua Drake
>Note: If you are running Postgres 9.2, use cmd_standby.92.sql 
261 2 Ivan Lezhnjov
262 12 Ivan Lezhnjov
h2. Initialize Master Environment
263 2 Ivan Lezhnjov
264
At this point we're pretty much done configuring the master server. All that is left to do is initialize master environment by running
265
266
<pre>postgres@bitarena:~$ PITRtools/bin/cmd_archiver -C PITRtools/cmd_archiver.ini -I
267
We are initializing queues, one moment.
268
269
NOTICE: init_env_func()
270
NOTICE: generate_slave_list_func()
271
NOTICE: Your slaves are: ['bitarena-clone']
272
postgres@bitarena:~$ ls -lah
273
total 40K
274
drwxr-xr-x  7 postgres postgres 4.0K Sep 28 10:25 .
275
drwxr-xr-x 34 root     root     4.0K Sep 27 02:32 ..
276
drwxr-xr-x  3 postgres postgres 4.0K Sep 24 15:10 8.4
277
drwx------  2 postgres postgres 4.0K Sep 27 02:23 .aptitude
278
drwxr-xr-x  3 postgres postgres 4.0K Sep 28 10:25 archive
279
-rw-------  1 postgres postgres 1001 Sep 28 10:03 .bash_history
280
drwxr-xr-x  2 postgres postgres 4.0K Sep 28 10:25 PITRtools
281
-rw-------  1 postgres postgres 1.4K Sep 28 05:08 .psql_history
282
drwx------  2 postgres postgres 4.0K Sep 28 09:29 .ssh
283
-rw-------  1 postgres postgres 3.6K Sep 28 10:25 .viminfo
284
postgres@bitarena:~$ ls -lah archive/bitarena-clone/
285
total 8.0K
286
drwxr-xr-x 2 postgres postgres 4.0K Sep 28 10:25 .
287
drwxr-xr-x 3 postgres postgres 4.0K Sep 28 10:25 ..
288
postgres@bitarena:~#</pre>
289
290
291
As you can see @-I@ switch tells @cmd_archiver@ to do a couple of internal actions, as well as prepare file system layout by creating necessary directories. 
292
293
@archive/@ directory has been created automatically by @cmd_archiver@ and contains sub-directory named after IP address or DNS name of a slave, @bitarena-clone@ in this example. This sub-directory is used in those circumstances when master fails to successfully transfer WAL log files to slave and stores them temporarily in this local directory. Once slave is back online, these files should be transferred to slave.
294
295
Effectively, after you've initialized master it starts trying to ship WAL files to slave. However, the slave host isn't configured yet, so the log delivery will fail and the log files should be found in @l_archivedir/slave_FQDNorIP/@ folder on master host. Once we configure slave, these will be shipped as soon as a new WAL segment is created on master.
296
297
298
h1. Slave aka Standby Server
299
300
Now we can prepare the slave host. SSH key-based login for @postgres@ system user should be working in both directions, from master to slave, as well as from slave to master.
301
302
Assuming you've arranged for this as it was suggested before, we can now go on and edit slave configuration file to perform first important step and initialize slave environment.
303
304 5 Ivan Lezhnjov
One of the configuration file parameters, namely @pgdata:@, will ask you to specify PostgreSQL data directory. You could look it up in @postgresql.conf@, or, If your PostgreSQL is already running, you could see this information like this:
305 2 Ivan Lezhnjov
306
<pre>root@bitarena:~# ps axuwf |grep postgre
307
root      4233  0.0  0.0   3304   756 pts/1    S+   10:56   0:00          \_ grep postgre
308
postgres  3352  0.0  0.4  46452  5464 ?        S    09:59   0:03 /usr/lib/postgresql/8.4/bin/postgres 
309
-D /var/lib/postgresql/8.4/main -c config_file=/etc/postgresql/8.4/main/postgresql.conf
310
...</pre>
311
312
Remember, note down the path somewhere or copy it to the clipboard. We'll need it in a minute.
313
314 12 Ivan Lezhnjov
h2. Slave Configuration File
315 2 Ivan Lezhnjov
316
Now edit slave configuration file. Most defaults are good and safe options, but your setup may be different from what I have in this how-to. So, be careful to understand what you're doing.
317
318
<pre>postgres@bitarena-clone:~$ vim PITRtools/cmd_standby.ini
319
	[DEFAULT]
320
	; what major version are we using?
321
	pgversion: 8.4
322
	
323
	; Used for 8.2 (8.1?), should be set to something > than checkpoint_segments on master				
324
	numarchives: 10			
325
	
326
	; Whether or not to use streaming replication.  If this is set to "on"
327
	; PITRTools will configure the standby server to replicate from master
328
	; using streaming replication.
329
	; This can only be used with PostgreSQL 9.0 and up.
330
	use_streaming_replication: off
331
	
332
	; File to touch to end replication when using streaming replication.
333
	trigger_file: /var/lib/postgresql/PITRtools/cmd_end_recovery
334
	
335
	; User to connect to master DB while using streaming replication,
336
	; ignored if not using streaming replication.
337
	repl_db_user: replication
338
	
339
	; Password for the user repl_db_user.
340
	repl_db_password: secret
341
	
342
	; sslmode to use when connecting for streaming replication. 
343
	; Accepted values: the same as libpq: disable, allow, prefer, require, verify-ca and verify-full
344
	; Default: sslmode: prefer
345
	sslmode: prefer
346
	
347
	; Commands needed for execution
348
	; absolute path to ssh
349
	ssh: /usr/bin/ssh		
350
	
351
	; absolute path to rsync
352
	rsync: /usr/bin/rsync
353
	
354
	; extra rsync flags
355
	rsync_flags: -z
356 17 Ivan Lezhnjov
357
	; Confs
358
359
	; This is the postgresql.conf to be used for the failover
360
	postgresql_conf_failover: /var/lib/postgresql/PITRtools/failover/postgesql.conf
361
362
	; This is the pg_hba.conf to be used for the failover
363
	pg_hba_conf_failover: /var/lib/postgresql/PITRtools/failover/pg_hba.conf
364 2 Ivan Lezhnjov
	
365
	; the path to to the postgres bin		
366
	pg_standby: /usr/lib/postgresql/8.4/bin/pg_standby
367
	pg_ctl: /usr/lib/postgresql/8.4/bin/pg_ctl
368
	
369
	; path to psql on the master
370
	r_psql: /usr/lib/postgresql/8.4/bin/psql	
371
	
372
	; Generalized information
373
	
374
	; the port postgresql runs on (master)
375
	port: 5432
376
	
377
	; ip or name of master server 			
378
	master_public_ip: bitarena
379
	
380
	; the ip address we should use when processing remote shell		
381
	master_local_ip: 127.0.0.1
382
	
383
	; the user performed initdb	
384
	user: postgres
385
	
386
	; on or off			
387
	debug: on
388
	
389
	; on or off
390
	ssh_debug: off
391
	
392
	; the timeout for ssh before we throw an alarm
393
	ssh_timeout: 30			
394
	
395
	; should be the same as r_archivedir for archiver
396
	archivedir: /var/lib/postgresql/archive
397
	
398
	; where you executed initdb -D to	
399
	pgdata: /var/lib/postgresql/8.4/main
400
	
401
	; Confs
402
	
403
	; This is the postgresql.conf to be used when not in standby
404
	postgresql_conf: /etc/postgresql/8.4/main/postgresql.conf		
405
	
406
	; This is the pg_hba.conf to be used when not in standby
407
	pg_hba_conf: /etc/postgresql/8.4/main/pg_hba.conf		
408
	
409
	; By default postgresql.conf and pg_hba.conf will be copied from the
410
	; locations specified above to pgdata directory on failover.
411
	;
412
	; Uncomment the following to make postgres actually use the above conf
413
	; files w/o copying them to pgdata.
414
	;no_copy_conf: true
415
	
416
	; The recovery.conf file to create when starting up
417
	; Defaults to %(pgdata)/recovery.conf
418
	recovery_conf: /var/lib/postgresql/8.4/main/recovery.conf
419
	
420
	; Useful when postgresql.conf doesn't specify log destination
421
	; Will be passed with -l to pg_ctl when starting the server.
422
	;
423
	; If you're worried about having complete logs, either make sure
424
	; postgresql.conf points to a log file, or use the logfile: parameter.
425
	;
426
	; Otherwise postgresql will print on standard stdout and nothing
427
	; will be recorded in the logs
428
	;
429
	;logfile: /var/log/postgresql/postgresql.log
430
	
431
	; Alarms
432
	
433
	notify_critical: echo CRITICAL
434
	notify_warning: echo WARNING
435
	notify_ok: echo OK
436
	
437
	; On failover action
438
	
439
	; Whatever is placed here will be executed on -FS must return 0
440
	
441
	action_failover: /var/lib/postgresql/PITRtools/failover.sh</pre>
442
443
@action_failover:@ script has to exist and have permissions of at least chmod u+x equivalent, it could be just a placeholder script with a simple action of:
444
445
<pre>#!/bin/bash
446
touch /var/lib/postgresql/PITRtools/failover_happened</pre>
447
	
448
but it's meant as a way to let you do certain actions on failover, those would be very specific for each given setup. It's good to know, though, that PITRtools lets you take actions automatically when failover happens. Use this feature to make your setup more sophisticated.
449 17 Ivan Lezhnjov
450
In addition, when doing failover, there are two more additional options to take into consideration, namely @postgresql_conf_failover:@ and @pg_hba_conf_failover:@. Both allow you to start server on failover using alternative configuration. This is meant to provide users with a way to prepare their failover scenario configuration in advance.
451 2 Ivan Lezhnjov
452 12 Ivan Lezhnjov
h2. Initialize Slave Environment
453 2 Ivan Lezhnjov
454
First stop PostgreSQL, then initialize slave environment.
455
456
<pre>postgres@bitarena-clone:~$ /etc/init.d/postgresql stop
457
postgres@bitarena-clone:~$ PITRtools/bin/cmd_standby -C PITRtools/cmd_standby.ini -I
458
NOTICE: check_pgpid_func()
459
DEBUG: executing query /usr/bin/ssh -o ConnectTimeout=30 -o StrictHostKeyChecking=no  postgres@bitarena  "/usr/lib/postgresql/8.4/bin/psql -A -t -Upostgres -p5432 -dpostgres -h127.0.0.1  by 'SELECT * FROM cmd_get_data_dirs()'
460
Password for user postgres: postgrespass
461
462
DEBUG: /var/lib/postgresql/8.4/main
463
postgres@bitarena-clone:~$</pre>
464
465
Please, note that if @archivedir: /var/lib/postgresql/archive@ hasn't been created, you should do so manually as @postgres@ system user (or set @postgres@ user and group as the ownership information for this directory). PITRtools should do this automatically for you, but earlier versions were known not to do so. This is important, and the next step in slave configuration, which is base backup, will fail if @archivedir:@ doesn't exist.
466 3 Ivan Lezhnjov
467 12 Ivan Lezhnjov
h2. Making a Base Backup
468 3 Ivan Lezhnjov
469
Before you proceed check if @archivedir:@ exists on slave and WAL files are being shipped to it from master host. WAL files are generated and shipped only when new data is stored in the database on master host. To help simulate data flow and check whether archiving and shipping is happening, try this SQL statement on master host:
470
471
<pre>postgres@bitarena:~$ psql
472
psql (8.4.13)
473
Type "help" for help.
474
475
postgres=# create table testPITRtools1 as select * from pg_class, pg_description;
476
postgresq=# \q
477
postgres@bitarena:~$</pre>
478
	
479
You could create a couple of tables like that to generate enough WAL segments. See @archivedir:@ directory on slave to check whether any WAL files have been copied there. If they have, everything works as expected and you can try to make a base backup on slave host:
480
481
<pre>postgres@bitarena-clone:~$ PITRtools/bin/cmd_standby -C PITRtools/cmd_standby.ini -B
482
NOTICE: check_pgpid_func()
483
DEBUG: executing query /usr/bin/ssh -o ConnectTimeout=30 -o StrictHostKeyChecking=no  postgres@bitarena  "/usr/lib/postgresql/8.4/bin/psql -A -t -Upostgres -p5432 -dpostgres -h127.0.0.1  by  'checkpoint' 
484
Password for user postgres: postgrespass
485
486
DEBUG: CHECKPOINT
487
DEBUG: executing query /usr/bin/ssh -o ConnectTimeout=30 -o StrictHostKeyChecking=no  postgres@bitarena  "/usr/lib/postgresql/8.4/bin/psql -A -t -Upostgres -p5432 -dpostgres -h127.0.0.1  by  'SELECT cmd_pg_start_backup()' 
488
Password for user postgres: postgrespass
489
490
DEBUG: cmd_pg_start_backup:  1
491
DEBUG: executing query /usr/bin/ssh -o ConnectTimeout=30 -o StrictHostKeyChecking=no  postgres@bitarena  "/usr/lib/postgresql/8.4/bin/psql -A -t -Upostgres -p5432 -dpostgres -h127.0.0.1  by 'SELECT * FROM cmd_get_data_dirs()'
492
Password for user postgres: postgrespass
493
494
DEBUG: executing query /usr/bin/ssh -o ConnectTimeout=30 -o StrictHostKeyChecking=no  postgres@bitarena  "/usr/lib/postgresql/8.4/bin/psql -A -t -Upostgres -p5432 -dpostgres -h127.0.0.1  by 'SELECT * FROM cmd_get_pgdata() LIMIT 1'
495
Password for user postgres: postgrespass
496
497
receiving incremental file list
498
./
499
backup_label
500
backup_label.old
501
postmaster.opts
502
base/1/
503
base/1/pg_internal.init
504
base/11564/
505
base/11564/pg_internal.init
506
base/16499/
507
base/16499/pg_internal.init
508
base/33069/
509
base/33069/33084
510
base/33069/33268
511
base/33069/33974
512
base/33069/33974_fsm
513
base/33069/33974_vm
514
base/33069/33980
515
base/33069/33993
516
base/33069/33993_fsm
517
base/33069/33993_vm
518
base/33069/33999
519
base/33069/33999_fsm
520
base/33069/pg_internal.init
521
global/
522
global/pg_auth
523
global/pg_control
524
global/pg_database
525
pg_clog/0000
526
pg_multixact/offsets/0000
527
pg_stat_tmp/
528
pg_stat_tmp/pgstat.stat
529
pg_subtrans/0001
530
531
Number of files: 1537
532
Number of files transferred: 25
533
Total file size: 189868759 bytes
534
Total transferred file size: 22162037 bytes
535
Literal data: 920320 bytes
536
Matched data: 21241717 bytes
537
File list size: 20500
538
File list generation time: 0.004 seconds
539
File list transfer time: 0.000 seconds
540
Total bytes sent: 65011
541
Total bytes received: 189901
542
543
sent 65011 bytes  received 189901 bytes  101964.80 bytes/sec
544
total size is 189868759  speedup is 744.84
545
DEBUG: executing query /usr/bin/ssh -o ConnectTimeout=30 -o StrictHostKeyChecking=no  postgres@bitarena  "/usr/lib/postgresql/8.4/bin/psql -A -t -Upostgres -p5432 -dpostgres -h127.0.0.1  by  'SELECT cmd_pg_stop_backup()' 
546
Password for user postgres: postgrespass
547
548
DEBUG: cmd_pg_stop_backup: 
549
postgres@bitarena-clone:~$</pre>
550
551
As you can see what happens is that PITRtools puts master into backup mode, synchronizes data directories (including tablespaces, if any) from master to slave and then exits backup mode on master. If base backup action had failed before it properly finished (say, you had lost connection to slave while rsync was copying files over to it), you'd need to intervene and run manually @-Astop_basebackup:@
552
553
<pre>postgres@bitarena-clone:~$ PITRtools/bin/cmd_standby -C PITRtools/cmd_standby.ini -Astop_basebackup
554
...</pre>
555
556
After that run base backup action again. Just make sure it finishes its work properly (use console output for successful base backup action above as a reference).
557
558
If you want a cold standby you're done. If you need a warm standby, then run:
559
560
<pre>postgres@bitarena-clone:~$ PITRtools/bin/cmd_standby -C PITRtools/cmd_standby.ini -S
561
NOTICE: check_pgpid_func()
562
server starting
563
postgres@bitarena-clone:~$ 2012-10-16 02:47:31 PDT LOG:  database system was interrupted; last known up at 2012-10-16 02:44:37 PDT
564
2012-10-16 02:47:31 PDT LOG:  starting archive recovery
565
2012-10-16 02:47:31 PDT LOG:  restore_command = '/usr/lib/postgresql/8.4/bin/pg_standby -s5 -w0 -c -d  /var/lib/postgresql/archive %f %p %r '
566
Trigger file 		: <not set>
567
Waiting for WAL file	: 00000001.history
568
WAL file path		: /var/lib/postgresql/archive/00000001.history
569
Restoring to		: pg_xlog/RECOVERYHISTORY
570
Sleep interval		: 5 seconds
571
Max wait interval	: 0 forever
572
Command for restore	: cp "/var/lib/postgresql/archive/00000001.history" "pg_xlog/RECOVERYHISTORY"
573
Keep archive history	: 000000000000000000000000 and later
574
running restore		:cp: cannot stat `/var/lib/postgresql/archive/00000001.history': No such file or directory
575
cp: cannot stat `/var/lib/postgresql/archive/00000001.history': No such file or directory
576
cp: cannot stat `/var/lib/postgresql/archive/00000001.history': No such file or directory
577
cp: cannot stat `/var/lib/postgresql/archive/00000001.history': No such file or directory
578
not restored
579
history file not found
580
Trigger file 		: <not set>
581
Waiting for WAL file	: 000000010000000100000025.00000020.backup
582
WAL file path		: /var/lib/postgresql/archive/000000010000000100000025.00000020.backup
583
Restoring to		: pg_xlog/RECOVERYHISTORY
584
Sleep interval		: 5 seconds
585
Max wait interval	: 0 forever
586
Command for restore	: cp "/var/lib/postgresql/archive/000000010000000100000025.00000020.backup" "pg_xlog/RECOVERYHISTORY"
587
Keep archive history	: 000000000000000000000000 and later
588
running restore		: OK
589
2012-10-16 02:48:01 PDT LOG:  restored log file "000000010000000100000025.00000020.backup" from archive
590
Trigger file 		: <not set>
591
Waiting for WAL file	: 000000010000000100000025
592
WAL file path		: /var/lib/postgresql/archive/000000010000000100000025
593
Restoring to		: pg_xlog/RECOVERYXLOG
594
Sleep interval		: 5 seconds
595
Max wait interval	: 0 forever
596
Command for restore	: cp "/var/lib/postgresql/archive/000000010000000100000025" "pg_xlog/RECOVERYXLOG"
597
Keep archive history	: 000000000000000000000000 and later
598
running restore		: OK
599
600
2012-10-16 02:48:02 PDT LOG:  restored log file "000000010000000100000025" from archive
601
2012-10-16 02:48:02 PDT LOG:  automatic recovery in progress
602
2012-10-16 02:48:02 PDT LOG:  redo starts at 1/25000020, consistency will be reached at 1/2504FFC4
603
2012-10-16 02:48:03 PDT LOG:  consistent recovery state reached
604
Trigger file 		: <not set>
605
Waiting for WAL file	: 000000010000000100000026
606
WAL file path		: /var/lib/postgresql/archive/000000010000000100000026
607
Restoring to		: pg_xlog/RECOVERYXLOG
608
Sleep interval		: 5 seconds
609
Max wait interval	: 0 forever
610
Command for restore	: cp "/var/lib/postgresql/archive/000000010000000100000026" "pg_xlog/RECOVERYXLOG"
611
Keep archive history	: 000000010000000100000025 and later
612
WAL file not present yet.
613
WAL file not present yet.
614
WAL file not present yet.
615
WAL file not present yet.
616
WAL file not present yet.
617
WAL file not present yet.
618
WAL file not present yet.
619
WAL file not present yet.
620
WAL file not present yet.
621
WAL file not present yet.</pre>
622
623
At this point PostgreSQL armed with PITRtools on master server will be continuously shipping log files to @archivedir:@ on slave. Once shipped, the WAL files will be immediately replayed, because slave in standby mode continuously scans @archivedir:@ for new WAL files and replays them as soon as they become available (this can be seen from example console output above).
624
625
h1. Failing Over
626
627
Now you have a warm standby mirroring changes occurring on the master server. When your master server becomes unavailable due to any reason, you could turn this warm standby server into a production instance by simply running on standby machine a failover action as shown below. For this PostgreSQL on master must not be running, otherwise PITRtools will throw out a warning and refuse to failover.
628
629
<pre>postgres@bitarena-clone:~$ PITRtools/bin/cmd_standby -C PITRtools/cmd_standby.ini -F999
630
631
NOTICE: check_pgpid_func()
632
2012-10-16 02:56:20 PDT LOG:  received fast shutdown request
633
2012-10-16 02:56:20 PDT LOG:  aborting any active transactions
634
waiting for server to shut down....2012-10-16 02:56:20 PDT LOG:  shutting down
635
2012-10-16 02:56:20 PDT LOG:  database system is shut down
636
 done
637
server stopped
638
server starting
639
NOTICE: Statistics are not replicated in warm standy mode.
640
HINT: Execute ANALYZE on your databases
641
postgres@bitarena-clone:~$ 2012-10-16 02:56:22 PDT LOG:  database system was interrupted while in recovery at log time 2012-10-16 02:54:35 PDT
642
2012-10-16 02:56:22 PDT HINT:  If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.
643
2012-10-16 02:56:22 PDT LOG:  starting archive recovery
644
2012-10-16 02:56:22 PDT LOG:  restore_command = 'cp /var/lib/postgresql/archive/%f "%p"'
645
cp: cannot stat `/var/lib/postgresql/archive/00000001.history': No such file or directory
646
2012-10-16 02:56:23 PDT LOG:  restored log file "000000010000000100000026" from archive
647
2012-10-16 02:56:23 PDT LOG:  automatic recovery in progress
648
2012-10-16 02:56:23 PDT LOG:  redo starts at 1/2635B428, consistency will be reached at 1/27000000
649
cp: cannot stat `/var/lib/postgresql/archive/000000010000000100000027': No such file or directory
650
2012-10-16 02:56:23 PDT LOG:  could not open file "pg_xlog/000000010000000100000027" (log file 1, segment 39): No such file or directory
651
2012-10-16 02:56:23 PDT LOG:  redo done at 1/265AD744
652
2012-10-16 02:56:23 PDT LOG:  last completed transaction was at log time 2012-10-16 02:55:07.387709-07
653
2012-10-16 02:56:24 PDT LOG:  restored log file "000000010000000100000026" from archive
654
cp: cannot stat `/var/lib/postgresql/archive/00000002.history': No such file or directory
655
cp: cannot stat `/var/lib/postgresql/archive/00000003.history': No such file or directory
656
cp: cannot stat `/var/lib/postgresql/archive/00000004.history': No such file or directory
657
cp: cannot stat `/var/lib/postgresql/archive/00000005.history': No such file or directory
658
cp: cannot stat `/var/lib/postgresql/archive/00000006.history': No such file or directory
659
cp: cannot stat `/var/lib/postgresql/archive/00000007.history': No such file or directory
660
cp: cannot stat `/var/lib/postgresql/archive/00000008.history': No such file or directory
661
cp: cannot stat `/var/lib/postgresql/archive/00000009.history': No such file or directory
662
2012-10-16 02:56:24 PDT LOG:  selected new timeline ID: 9
663
cp: cannot stat `/var/lib/postgresql/archive/00000001.history': No such file or directory
664
2012-10-16 02:56:25 PDT LOG:  archive recovery complete
665
2012-10-16 02:56:26 PDT LOG:  database system is ready to accept connections
666
2012-10-16 02:56:26 PDT LOG:  autovacuum launcher started
667
668
postgres@bitarena-clone:~$</pre>
669
670
This will create @recovery.conf@ file under @pg_data:@ directory and restart PostgreSQL to enter production mode of operation.
671
672
After this, you'd be basically running a copy of production master server. Keep in mind that you would also need to change IP addresses and/or load balancing configuration, routing, firewall rules or anything else that might be in the way of establishing a successful connection to this host. This is where @action_failover:@ script could come in handy. 
673
674
Plan in advance, figure this all out to avoid any downtime before you will need to failover.
675
676
677
h1. Other things you could do with PITRtools and some tips
678
679
This how-to is meant to help you get started with PITRtools. PITRtools can do more, though, than just help you configure standby.
680
681
682
h2. Point-In-Time Recovery
683
684
<pre>postgres@bitarena-clone:~$ PITRtools/bin/cmd_standby -C PITRtools/cmd_standby.ini -F999 -R '2008-05-28 11:00:38.059389'
685
...</pre>
686
687
This is essentially a restore to a specific point in time action. In general, you can only restore to a point in time while using cold standby, because PITR will "stop" recovering at the point in time you've specified, and both warm and hot standby servers have already recovered all the WAL files they can. In this regard, it would be a good idea to have a cold standby around for disaster (at logical level) recovery.
688
689
Once this has been done, you can't choose another timestamp to restore to.
690
691
h2. Entering Standby Mode After Failover On Slave
692
693
Suppose you failed over to your standby slave, which is running now as a replacement of master for your applications. You've fixed problems with the actual master and want this slave host to enter standby mode again. Here's how you'd do it:
694
695
<pre>postgres@bitarena-clone:~$ PITRtools/bin/cmd_standby -C PITRtools/cmd_standby.ini -Astop
696
...</pre>
697
698
This will stop entire PostgreSQL service. You could also use PostgreSQL init script to achieve the same instead. If you need more fine-grained control use @pg_ctlcluser 8.4 main stop@ (see @man pg_ctlcluster@ for more details). Take a new base backup as before and enter standby mode:
699
700
<pre>postgres@bitarena-clone:~$ PITRtools/bin/cmd_standby -C PITRtools/cmd_standby.ini -B
701
...
702
703
postgres@bitarena-clone:~$ PITRtools/bin/cmd_standby -C PITRtools/cmd_standby.ini -S
704
...</pre>
705
706
Again, if you want a cold standby just don't run -S action after -B.
707 4 Ivan Lezhnjov
708
h2. Alerts
709
710
Alerting is designed to run your custom scripts. You could easily integrate PITRtools alerting to your existing NMS be it Nagios, Zabbix, or anything else, send e-mails or take actions you decide.
711
712
h2. Logging
713
714
It bears repeating -- just In case you overlooked this in standby sample configuration file notes -- if your @postgresql.conf@ doesn't specify a log file to write to and you don't use @logfile:@ parameter in @cmd_standby.ini@ the output will be directed to stdout (your console) and nothing will ever be written to a log file on disk. 
715
716
If you restart PostgreSQL, that'll fix the problem, but you can avoid it in the first place by either specifying log file to write to in @postgresql.conf@ or by using @logfile:@ parameter in @cmd_standby.ini@.
717
718
h2. Troubleshooting
719
720
Set @debug:@ parameter in configuration files to @on@ and scrutinize the information. PostgreSQL log file is also a good place to look at.
721
722
h1. Getting Help
723
724
A very low-traffic mailing list for PITRtools can be found here http://lists.commandprompt.com/mailman/listinfo/pitrtools/
725
There is also consulting available from "Command Prompt":https://commandprompt.com/contact/