Project

General

Profile

How To Configure Warm Standby with PITRTools » History » Version 23

Ivan Lezhnjov, 05/07/2013 09:42 AM

1 1 Ivan Lezhnjov
{{toc}}
2
3 20 Ivan Lezhnjov
h1. How To Configure Warm Standby with pitrtools
4 1 Ivan Lezhnjov
5 20 Ivan Lezhnjov
This how-to demonstrates how to configure a warm standby using PostgreSQL 8.4 and pitrtools.
6 1 Ivan Lezhnjov
7 20 Ivan Lezhnjov
h1. What Is pitrtools and Why Would You Want To Use It?
8 1 Ivan Lezhnjov
9 20 Ivan Lezhnjov
Essentially, pitrtools is a wrapper around standard tools, such as rsync and PostgreSQL's internal functionality, that makes, among other things, creating and managing standby configurations and subsequent failover to a standby a snap.
10 1 Ivan Lezhnjov
11 20 Ivan Lezhnjov
With the help of pitrtools you could do more, namely:
12 1 Ivan Lezhnjov
* secure shipping of log files to configured standby server over SSL protected link;
13
* streaming replication;
14
* enable/disable archiving without the need to restart PostgreSQL;
15
* stay informed by generating alerts based on various levels of severity of events happening during the process on both ends of configuration;
16
* automatically take a base backup, including table spaces, restore archives and purge old ones (if PostgreSQL >8.3);
17
* failover to the latest restore point and point-in-time recovery (restore using timestamps);
18
* etc.
19
20 20 Ivan Lezhnjov
Obviously, pitrtools attempts to make things simpler, more secure and easier to manage.
21 1 Ivan Lezhnjov
22
h1. Naming Conventions
23
24
Master server can be referred to also as archiver.
25
Slave is often called standby.
26
27
These names are used interchangeably.
28
29
h1. The Test Setup
30
31 20 Ivan Lezhnjov
It's a good idea to first setup pitrtools and play with it before you actually go ahead and change configuration of your production servers. To show how pitrtools is configured and work, I'll describe the entire process using a 2 hosts test setup as an example.
32 1 Ivan Lezhnjov
33
So, there are 2 hosts named @bitarena@ and @bitarena-clone@, both being virtualized instances of Debian Squeeze. I assume you're experienced enough to install Debian yourself and know how to find your way around the system.
34
35 20 Ivan Lezhnjov
@bitarena@ is designated role of master server aka archiver, @bitarena-clone@ is a slave aka standby server. pitrtools is installed on both hosts, but each host uses different tools from the package.
36 1 Ivan Lezhnjov
37
h1. The Process
38
39 20 Ivan Lezhnjov
These are major steps in actual order of execution that one would have to follow to get pitrtools-enabled setup running:
40 1 Ivan Lezhnjov
41
On master server
42
43
		* Turn on archiving
44 21 Ivan Lezhnjov
		* Install helper scripts
45 1 Ivan Lezhnjov
		* @cmd_archiver -C $CONFIG -I@
46
47
On slave server
48
49
		* @cmd_standby -C $CONFIG -I@
50
		* @cmd_standby -C $CONFIG -B@
51
		* @cmd_standby -C $CONFIG -S@
52
53 20 Ivan Lezhnjov
h1. Installing and Configuring pitrtools
54 1 Ivan Lezhnjov
55 23 Ivan Lezhnjov
If your system doesn't have a package for pitrtools download a tarball from pitrtools project page at https://public.commandprompt.com/projects/pitrtools/wiki.
56 20 Ivan Lezhnjov
57 23 Ivan Lezhnjov
Installing pitrtools is a matter of unpacking an archive file and making sure everything which came with it is owned by system `postgres' user.
58 1 Ivan Lezhnjov
59 20 Ivan Lezhnjov
h1. SSH Key-based Login for pitrtools
60 1 Ivan Lezhnjov
61 20 Ivan Lezhnjov
pitrtools relies on rsync and SSH heavily to do its work, e.g. making a base backup and shipping WAL log files from master to slave server -- all that happens over an SSL-protected communication channel. This is the area where pitrtools makes one's life easier, because otherwise you'd have to find a way to copy log files to a slave server somehow (most likely it would be some sort of networking mount point).
62 1 Ivan Lezhnjov
63 5 Ivan Lezhnjov
Therefore one of the prerequisites is to configure SSH key-based logins between the two hosts for @postgres@ system user that don't require a password or a passphrase.
64 1 Ivan Lezhnjov
65
*Master*
66
<pre>root@bitarena:~# su - postgres
67
68
postgres@bitarena:~$ ssh-keygen -t rsa
69
... 
70
postgres@bitarena:~$ ls -la /var/lib/postgresql/.ssh/
71
total 16
72
drwx------ 2 postgres postgres 4096 Sep 28 09:17 .
73
drwxr-xr-x 5 postgres postgres 4096 Sep 28 09:17 ..
74
-rw------- 1 postgres postgres 1675 Sep 28 09:17 id_rsa
75
-rw-r--r-- 1 postgres postgres  399 Sep 28 09:17 id_rsa.pub
76
postgres@bitarena:~#
77
</pre>
78
79
*Slave*
80
<pre>root@bitarena-clone:~# su - postgres
81
postgres@bitarena-clone:~$ ssh-keygen -t rsa
82
... </pre>
83
84
Answer all the questions and ssh-keygen will create both private and public (*.pub file) RSA keys. For now just create the key pair, one on on each host (master and slave). When done, proceed to exchanging public keys between the hosts like this:
85
86
*Master*
87 18 Ivan Lezhnjov
<pre>postgres@bitarena:~$ ssh-copy-id bitarena-clone
88
Now try logging into the machine, with "ssh 'bitarena-clone'", and check in:
89 1 Ivan Lezhnjov
90 18 Ivan Lezhnjov
  .ssh/authorized_keys
91 1 Ivan Lezhnjov
92 18 Ivan Lezhnjov
to make sure we haven't added extra keys that you weren't expecting.
93 1 Ivan Lezhnjov
94
postgres@bitarena:~$ ssh bitarena-clone
95 18 Ivan Lezhnjov
Linux bitarena-clone 2.6.32-5-686 #1 SMP Sun May 6 04:01:19 UTC 2012 i686
96 1 Ivan Lezhnjov
97
The programs included with the Debian GNU/Linux system are free software;
98
the exact distribution terms for each program are described in the
99
individual files in /usr/share/doc/*/copyright.
100
101
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
102
permitted by applicable law.
103 18 Ivan Lezhnjov
Last login: Thu Oct 18 06:07:22 2012 from bitarena.localdomain
104
postgres@bitarena-clone:~$ less .ssh/authorized_keys
105
postgres@bitarena-clone:~$ </pre>
106 1 Ivan Lezhnjov
107 18 Ivan Lezhnjov
This will copy @postgres@ system user's public SSH key on master to standby into a file @authorized_keys@, which will allow @postgres@ on master to login to slave host without a password and also in a secure fashion.
108 1 Ivan Lezhnjov
109 19 Ivan Lezhnjov
Remember that this has been done on master server only. In a similar manner, you should take care of the slave host.
110 1 Ivan Lezhnjov
111 20 Ivan Lezhnjov
It's worth noting that pitrtools runs some actions remotely. For example, when a base backup action is run on a slave host, cmd_standby script establishes SSH session to a master host and runs various psql commands to deal with checkpoints, copy log files, etc. This also requires PostgreSQL user password to access the database, and it is often needed to be entered 4 and more times for a base backup action to complete. 
112 1 Ivan Lezhnjov
113 20 Ivan Lezhnjov
Once you played around with pitrtools enough to get hang of things, you could avoid having to enter password manually each time by either using .pgpass file (which is a "standard feature of PostgreSQL":http://www.postgresql.org/docs/8.4/static/libpq-pgpass.html) or make sure there's a "trust relationship configured for localhost in @pg_hba.conf@ file":http://www.postgresql.org/docs/8.4/static/auth-methods.html.
114 19 Ivan Lezhnjov
115
For the sake of clarity, consider these examples:
116
117
.pgpass
118
<pre>
119
127.0.0.1:*:postgres:postgres:postgrespass
120
</pre>
121
122
pg_hba.conf
123
<pre>
124
host postgres postgres 127.0.0.1/32 trust
125
</pre>
126 1 Ivan Lezhnjov
127
In the console output examples you're going to see below I entered password manually, but this, of course, is really inconvenient in production environment.
128
129
h1. Master Configuration File
130
131 20 Ivan Lezhnjov
Sample configuration files are pretty good as they are with the defaults they come. Chances are you're not going to change a lot in there. Make sure you read all *.README files, though, because they contain helpful extra information about pitrtools and configuration parameters that will help you decide how to best configure your servers.
132 1 Ivan Lezhnjov
133 20 Ivan Lezhnjov
We start with the master host by editing pitrtools configuration file for master/archiver.
134 1 Ivan Lezhnjov
135 20 Ivan Lezhnjov
<pre>postgres@bitarena:~$ vim pitrtools/cmd_archiver.ini
136 1 Ivan Lezhnjov
	
137
	[DEFAULT]
138
	; online or offline
139
	state: online					
140
	
141
	; The base database directory
142
	
143
	pgdata: /var/lib/postgresql/8.4/main
144
	
145
	; where to remotely copy archives
146
	r_archivedir: /var/lib/postgresql/archive
147
	
148
	; where to locally copy archives
149
	l_archivedir: /var/lib/postgresql/archive
150
	
151
	; where is rsync				
152
	rsync_bin: /usr/bin/rsync			
153
	
154
	; extra rsync flags
155
	rsync_flags: -z
156
	
157
	; option 2 or 3, if running RHEL5 or similar it is likely 2
158
	; if you are running something that ships remotely modern software
159
	; it will be 3
160
	
161
	rsync_version = 3
162
	
163
	; IP of slave					
164
	slaves: bitarena-clone
165
	
166
	; the user that will be using scp				
167
	user: postgres				
168
	
169
	; if scp can't connect in 10 seconds error
170
	ssh_timeout: 10
171
	
172
	; command to process in ok					
173
	notify_ok: echo OK
174
	
175
	; command to process in warning
176
	notify_warning:  echo WARNING
177
	
178
	; command to process in critical
179
	notify_critical: echo CRITICAL
180
	
181
	; if you want to debug on/off only		 	
182
	debug: on
183
	
184
	; if you want ssh debug (warning noisy)
185
	ssh_debug: off
186
</pre>
187
188
189 20 Ivan Lezhnjov
Note that you can use domain names instead of IP addresses for slaves: parameter, it works either way just fine. You might also want to turn debugging on while you're learning pitrtools. It helps to see what software does, if you try to understand better how it works.
190 1 Ivan Lezhnjov
191 11 Ivan Lezhnjov
h2. Turn On and Configure Archiving
192 1 Ivan Lezhnjov
193 20 Ivan Lezhnjov
If you are not using Streaming Replication / Hot Standby, we need to turn on archiving functionality in PostgreSQL. This is purely a PostgreSQL feature, but we can also take advantage of using pitrtools thanks to flexible design of PostgreSQL.
194 2 Ivan Lezhnjov
195
<pre>postgres@bitarena:~$ vim /etc/postgresql/8.4/main/postgresql.conf
196
197
...
198
archive_mode=on
199 20 Ivan Lezhnjov
archive_command = '/var/lib/postgresql/pitrtools/bin/cmd_archiver -C /var/lib/postgresql/pitrtools/cmd_archiver.ini -F %p'
200 2 Ivan Lezhnjov
...
201
</pre>
202
203
204 20 Ivan Lezhnjov
Here we explicitly turn archiving mode on and tell PostgreSQL to use @cmd_archiver@, part of pitrtools, as the archiving command. 
205 2 Ivan Lezhnjov
It must be provided with a path to a configuration file (@-C@ switch), and a path to a log file (@-F@ switch). @%p@ is substituted by PostgreSQL with the actual log file location in the file system. To learn more about how to use @cmd_archiver@ run
206
207 20 Ivan Lezhnjov
<pre>postgres@bitarena:~$ pitrtools/bin/cmd_archiver --help</pre>
208 2 Ivan Lezhnjov
209
210 20 Ivan Lezhnjov
Please, keep in mind that pitrtools isn't supposed to be run by @root@ user. Technically, it can be run by any system user other than root, and unless you have a customized configuration your default PostgreSQL user will be @postgres@, so pitrtools are expected to be run as that system user.
211 2 Ivan Lezhnjov
212
Restart PostgreSQL for changes to take effect.
213
214
<pre>postgres@bitarena:~$ /etc/init.d/postgresql restart
215
Restarting PostgreSQL 8.4 database server: main.
216
postgres@bitarena:~#
217
</pre>
218
219 12 Ivan Lezhnjov
h2. Install Helper Scripts
220 2 Ivan Lezhnjov
221 20 Ivan Lezhnjov
Apply cmd_standby.sql to the database of pitrtools user (usually @postgres@). This is required for master server only.
222 2 Ivan Lezhnjov
223 20 Ivan Lezhnjov
<pre>root@bitarena:~# psql -U postgres < ~/GIT/pitrtools/cmd_standby.sql
224 2 Ivan Lezhnjov
CREATE FUNCTION
225
COMMENT
226
CREATE FUNCTION
227
CREATE FUNCTION
228
CREATE FUNCTION
229
COMMENT
230
CREATE FUNCTION
231
COMMENT
232
root@bitarena:~#</pre>
233
234 13 Joshua Drake
>Note: If you are running Postgres 9.2, use cmd_standby.92.sql 
235 2 Ivan Lezhnjov
236 12 Ivan Lezhnjov
h2. Initialize Master Environment
237 2 Ivan Lezhnjov
238
At this point we're pretty much done configuring the master server. All that is left to do is initialize master environment by running
239
240 20 Ivan Lezhnjov
<pre>postgres@bitarena:~$ pitrtools/bin/cmd_archiver -C pitrtools/cmd_archiver.ini -I
241 2 Ivan Lezhnjov
We are initializing queues, one moment.
242
243
NOTICE: init_env_func()
244
NOTICE: generate_slave_list_func()
245
NOTICE: Your slaves are: ['bitarena-clone']
246
postgres@bitarena:~$ ls -lah
247
total 40K
248
drwxr-xr-x  7 postgres postgres 4.0K Sep 28 10:25 .
249
drwxr-xr-x 34 root     root     4.0K Sep 27 02:32 ..
250
drwxr-xr-x  3 postgres postgres 4.0K Sep 24 15:10 8.4
251
drwx------  2 postgres postgres 4.0K Sep 27 02:23 .aptitude
252
drwxr-xr-x  3 postgres postgres 4.0K Sep 28 10:25 archive
253
-rw-------  1 postgres postgres 1001 Sep 28 10:03 .bash_history
254 20 Ivan Lezhnjov
drwxr-xr-x  2 postgres postgres 4.0K Sep 28 10:25 pitrtools
255 2 Ivan Lezhnjov
-rw-------  1 postgres postgres 1.4K Sep 28 05:08 .psql_history
256
drwx------  2 postgres postgres 4.0K Sep 28 09:29 .ssh
257
-rw-------  1 postgres postgres 3.6K Sep 28 10:25 .viminfo
258
postgres@bitarena:~$ ls -lah archive/bitarena-clone/
259
total 8.0K
260
drwxr-xr-x 2 postgres postgres 4.0K Sep 28 10:25 .
261
drwxr-xr-x 3 postgres postgres 4.0K Sep 28 10:25 ..
262
postgres@bitarena:~#</pre>
263
264
265
As you can see @-I@ switch tells @cmd_archiver@ to do a couple of internal actions, as well as prepare file system layout by creating necessary directories. 
266
267
@archive/@ directory has been created automatically by @cmd_archiver@ and contains sub-directory named after IP address or DNS name of a slave, @bitarena-clone@ in this example. This sub-directory is used in those circumstances when master fails to successfully transfer WAL log files to slave and stores them temporarily in this local directory. Once slave is back online, these files should be transferred to slave.
268
269
Effectively, after you've initialized master it starts trying to ship WAL files to slave. However, the slave host isn't configured yet, so the log delivery will fail and the log files should be found in @l_archivedir/slave_FQDNorIP/@ folder on master host. Once we configure slave, these will be shipped as soon as a new WAL segment is created on master.
270
271
272
h1. Slave aka Standby Server
273
274
Now we can prepare the slave host. SSH key-based login for @postgres@ system user should be working in both directions, from master to slave, as well as from slave to master.
275
276
Assuming you've arranged for this as it was suggested before, we can now go on and edit slave configuration file to perform first important step and initialize slave environment.
277
278 5 Ivan Lezhnjov
One of the configuration file parameters, namely @pgdata:@, will ask you to specify PostgreSQL data directory. You could look it up in @postgresql.conf@, or, If your PostgreSQL is already running, you could see this information like this:
279 2 Ivan Lezhnjov
280
<pre>root@bitarena:~# ps axuwf |grep postgre
281
root      4233  0.0  0.0   3304   756 pts/1    S+   10:56   0:00          \_ grep postgre
282
postgres  3352  0.0  0.4  46452  5464 ?        S    09:59   0:03 /usr/lib/postgresql/8.4/bin/postgres 
283
-D /var/lib/postgresql/8.4/main -c config_file=/etc/postgresql/8.4/main/postgresql.conf
284
...</pre>
285
286
Remember, note down the path somewhere or copy it to the clipboard. We'll need it in a minute.
287
288 12 Ivan Lezhnjov
h2. Slave Configuration File
289 2 Ivan Lezhnjov
290
Now edit slave configuration file. Most defaults are good and safe options, but your setup may be different from what I have in this how-to. So, be careful to understand what you're doing.
291
292 20 Ivan Lezhnjov
<pre>postgres@bitarena-clone:~$ vim pitrtools/cmd_standby.ini
293 2 Ivan Lezhnjov
	[DEFAULT]
294
	; what major version are we using?
295
	pgversion: 8.4
296
	
297
	; Used for 8.2 (8.1?), should be set to something > than checkpoint_segments on master				
298
	numarchives: 10			
299
	
300
	; Whether or not to use streaming replication.  If this is set to "on"
301 20 Ivan Lezhnjov
	; pitrtools will configure the standby server to replicate from master
302 2 Ivan Lezhnjov
	; using streaming replication.
303
	; This can only be used with PostgreSQL 9.0 and up.
304
	use_streaming_replication: off
305
	
306
	; File to touch to end replication when using streaming replication.
307 20 Ivan Lezhnjov
	trigger_file: /var/lib/postgresql/pitrtools/cmd_end_recovery
308 2 Ivan Lezhnjov
	
309
	; User to connect to master DB while using streaming replication,
310
	; ignored if not using streaming replication.
311
	repl_db_user: replication
312
	
313
	; Password for the user repl_db_user.
314
	repl_db_password: secret
315
	
316
	; sslmode to use when connecting for streaming replication. 
317
	; Accepted values: the same as libpq: disable, allow, prefer, require, verify-ca and verify-full
318
	; Default: sslmode: prefer
319
	sslmode: prefer
320
	
321
	; Commands needed for execution
322
	; absolute path to ssh
323
	ssh: /usr/bin/ssh		
324
	
325
	; absolute path to rsync
326
	rsync: /usr/bin/rsync
327
	
328
	; extra rsync flags
329
	rsync_flags: -z
330 17 Ivan Lezhnjov
331
	; Confs
332
333
	; This is the postgresql.conf to be used for the failover
334 20 Ivan Lezhnjov
	postgresql_conf_failover: /var/lib/postgresql/pitrtools/failover/postgesql.conf
335 17 Ivan Lezhnjov
336
	; This is the pg_hba.conf to be used for the failover
337 20 Ivan Lezhnjov
	pg_hba_conf_failover: /var/lib/postgresql/pitrtools/failover/pg_hba.conf
338 2 Ivan Lezhnjov
	
339
	; the path to to the postgres bin		
340
	pg_standby: /usr/lib/postgresql/8.4/bin/pg_standby
341
	pg_ctl: /usr/lib/postgresql/8.4/bin/pg_ctl
342
	
343
	; path to psql on the master
344
	r_psql: /usr/lib/postgresql/8.4/bin/psql	
345
	
346
	; Generalized information
347
	
348
	; the port postgresql runs on (master)
349
	port: 5432
350
	
351
	; ip or name of master server 			
352
	master_public_ip: bitarena
353
	
354
	; the ip address we should use when processing remote shell		
355
	master_local_ip: 127.0.0.1
356
	
357
	; the user performed initdb	
358
	user: postgres
359
	
360
	; on or off			
361
	debug: on
362
	
363
	; on or off
364
	ssh_debug: off
365
	
366
	; the timeout for ssh before we throw an alarm
367
	ssh_timeout: 30			
368
	
369
	; should be the same as r_archivedir for archiver
370
	archivedir: /var/lib/postgresql/archive
371
	
372
	; where you executed initdb -D to	
373
	pgdata: /var/lib/postgresql/8.4/main
374
	
375
	; Confs
376
	
377
	; This is the postgresql.conf to be used when not in standby
378
	postgresql_conf: /etc/postgresql/8.4/main/postgresql.conf		
379
	
380
	; This is the pg_hba.conf to be used when not in standby
381
	pg_hba_conf: /etc/postgresql/8.4/main/pg_hba.conf		
382
	
383
	; By default postgresql.conf and pg_hba.conf will be copied from the
384
	; locations specified above to pgdata directory on failover.
385
	;
386
	; Uncomment the following to make postgres actually use the above conf
387
	; files w/o copying them to pgdata.
388
	;no_copy_conf: true
389
	
390
	; The recovery.conf file to create when starting up
391
	; Defaults to %(pgdata)/recovery.conf
392
	recovery_conf: /var/lib/postgresql/8.4/main/recovery.conf
393
	
394
	; Useful when postgresql.conf doesn't specify log destination
395
	; Will be passed with -l to pg_ctl when starting the server.
396
	;
397
	; If you're worried about having complete logs, either make sure
398
	; postgresql.conf points to a log file, or use the logfile: parameter.
399
	;
400
	; Otherwise postgresql will print on standard stdout and nothing
401
	; will be recorded in the logs
402
	;
403
	;logfile: /var/log/postgresql/postgresql.log
404
	
405
	; Alarms
406
	
407
	notify_critical: echo CRITICAL
408
	notify_warning: echo WARNING
409
	notify_ok: echo OK
410
	
411
	; On failover action
412
	
413
	; Whatever is placed here will be executed on -FS must return 0
414
	
415 20 Ivan Lezhnjov
	action_failover: /var/lib/postgresql/pitrtools/failover.sh</pre>
416 2 Ivan Lezhnjov
417
@action_failover:@ script has to exist and have permissions of at least chmod u+x equivalent, it could be just a placeholder script with a simple action of:
418
419
<pre>#!/bin/bash
420 20 Ivan Lezhnjov
touch /var/lib/postgresql/pitrtools/failover_happened</pre>
421 2 Ivan Lezhnjov
	
422 20 Ivan Lezhnjov
but it's meant as a way to let you do certain actions on failover, those would be very specific for each given setup. It's good to know, though, that pitrtools lets you take actions automatically when failover happens. Use this feature to make your setup more sophisticated.
423 17 Ivan Lezhnjov
424
In addition, when doing failover, there are two more additional options to take into consideration, namely @postgresql_conf_failover:@ and @pg_hba_conf_failover:@. Both allow you to start server on failover using alternative configuration. This is meant to provide users with a way to prepare their failover scenario configuration in advance.
425 2 Ivan Lezhnjov
426 12 Ivan Lezhnjov
h2. Initialize Slave Environment
427 2 Ivan Lezhnjov
428
First stop PostgreSQL, then initialize slave environment.
429
430
<pre>postgres@bitarena-clone:~$ /etc/init.d/postgresql stop
431 20 Ivan Lezhnjov
postgres@bitarena-clone:~$ pitrtools/bin/cmd_standby -C pitrtools/cmd_standby.ini -I
432 2 Ivan Lezhnjov
NOTICE: check_pgpid_func()
433
DEBUG: executing query /usr/bin/ssh -o ConnectTimeout=30 -o StrictHostKeyChecking=no  postgres@bitarena  "/usr/lib/postgresql/8.4/bin/psql -A -t -Upostgres -p5432 -dpostgres -h127.0.0.1  by 'SELECT * FROM cmd_get_data_dirs()'
434
Password for user postgres: postgrespass
435
436
DEBUG: /var/lib/postgresql/8.4/main
437
postgres@bitarena-clone:~$</pre>
438
439 20 Ivan Lezhnjov
Please, note that if @archivedir: /var/lib/postgresql/archive@ hasn't been created, you should do so manually as @postgres@ system user (or set @postgres@ user and group as the ownership information for this directory). pitrtools should do this automatically for you, but earlier versions were known not to do so. This is important, and the next step in slave configuration, which is base backup, will fail if @archivedir:@ doesn't exist.
440 3 Ivan Lezhnjov
441 12 Ivan Lezhnjov
h2. Making a Base Backup
442 3 Ivan Lezhnjov
443
Before you proceed check if @archivedir:@ exists on slave and WAL files are being shipped to it from master host. WAL files are generated and shipped only when new data is stored in the database on master host. To help simulate data flow and check whether archiving and shipping is happening, try this SQL statement on master host:
444
445
<pre>postgres@bitarena:~$ psql
446
psql (8.4.13)
447
Type "help" for help.
448
449 20 Ivan Lezhnjov
postgres=# create table testpitrtools1 as select * from pg_class, pg_description;
450 3 Ivan Lezhnjov
postgresq=# \q
451
postgres@bitarena:~$</pre>
452
	
453
You could create a couple of tables like that to generate enough WAL segments. See @archivedir:@ directory on slave to check whether any WAL files have been copied there. If they have, everything works as expected and you can try to make a base backup on slave host:
454
455 20 Ivan Lezhnjov
<pre>postgres@bitarena-clone:~$ pitrtools/bin/cmd_standby -C pitrtools/cmd_standby.ini -B
456 3 Ivan Lezhnjov
NOTICE: check_pgpid_func()
457
DEBUG: executing query /usr/bin/ssh -o ConnectTimeout=30 -o StrictHostKeyChecking=no  postgres@bitarena  "/usr/lib/postgresql/8.4/bin/psql -A -t -Upostgres -p5432 -dpostgres -h127.0.0.1  by  'checkpoint' 
458
Password for user postgres: postgrespass
459
460
DEBUG: CHECKPOINT
461
DEBUG: executing query /usr/bin/ssh -o ConnectTimeout=30 -o StrictHostKeyChecking=no  postgres@bitarena  "/usr/lib/postgresql/8.4/bin/psql -A -t -Upostgres -p5432 -dpostgres -h127.0.0.1  by  'SELECT cmd_pg_start_backup()' 
462
Password for user postgres: postgrespass
463
464
DEBUG: cmd_pg_start_backup:  1
465
DEBUG: executing query /usr/bin/ssh -o ConnectTimeout=30 -o StrictHostKeyChecking=no  postgres@bitarena  "/usr/lib/postgresql/8.4/bin/psql -A -t -Upostgres -p5432 -dpostgres -h127.0.0.1  by 'SELECT * FROM cmd_get_data_dirs()'
466
Password for user postgres: postgrespass
467
468
DEBUG: executing query /usr/bin/ssh -o ConnectTimeout=30 -o StrictHostKeyChecking=no  postgres@bitarena  "/usr/lib/postgresql/8.4/bin/psql -A -t -Upostgres -p5432 -dpostgres -h127.0.0.1  by 'SELECT * FROM cmd_get_pgdata() LIMIT 1'
469
Password for user postgres: postgrespass
470
471
receiving incremental file list
472
./
473
backup_label
474
backup_label.old
475
postmaster.opts
476
base/1/
477
base/1/pg_internal.init
478
base/11564/
479
base/11564/pg_internal.init
480
base/16499/
481
base/16499/pg_internal.init
482
base/33069/
483
base/33069/33084
484
base/33069/33268
485
base/33069/33974
486
base/33069/33974_fsm
487
base/33069/33974_vm
488
base/33069/33980
489
base/33069/33993
490
base/33069/33993_fsm
491
base/33069/33993_vm
492
base/33069/33999
493
base/33069/33999_fsm
494
base/33069/pg_internal.init
495
global/
496
global/pg_auth
497
global/pg_control
498
global/pg_database
499
pg_clog/0000
500
pg_multixact/offsets/0000
501
pg_stat_tmp/
502
pg_stat_tmp/pgstat.stat
503
pg_subtrans/0001
504
505
Number of files: 1537
506
Number of files transferred: 25
507
Total file size: 189868759 bytes
508
Total transferred file size: 22162037 bytes
509
Literal data: 920320 bytes
510
Matched data: 21241717 bytes
511
File list size: 20500
512
File list generation time: 0.004 seconds
513
File list transfer time: 0.000 seconds
514
Total bytes sent: 65011
515
Total bytes received: 189901
516
517
sent 65011 bytes  received 189901 bytes  101964.80 bytes/sec
518
total size is 189868759  speedup is 744.84
519
DEBUG: executing query /usr/bin/ssh -o ConnectTimeout=30 -o StrictHostKeyChecking=no  postgres@bitarena  "/usr/lib/postgresql/8.4/bin/psql -A -t -Upostgres -p5432 -dpostgres -h127.0.0.1  by  'SELECT cmd_pg_stop_backup()' 
520
Password for user postgres: postgrespass
521
522
DEBUG: cmd_pg_stop_backup: 
523
postgres@bitarena-clone:~$</pre>
524
525 20 Ivan Lezhnjov
As you can see what happens is that pitrtools puts master into backup mode, synchronizes data directories (including tablespaces, if any) from master to slave and then exits backup mode on master. If base backup action had failed before it properly finished (say, you had lost connection to slave while rsync was copying files over to it), you'd need to intervene and run manually @-Astop_basebackup:@
526 3 Ivan Lezhnjov
527 20 Ivan Lezhnjov
<pre>postgres@bitarena-clone:~$ pitrtools/bin/cmd_standby -C pitrtools/cmd_standby.ini -Astop_basebackup
528 3 Ivan Lezhnjov
...</pre>
529
530
After that run base backup action again. Just make sure it finishes its work properly (use console output for successful base backup action above as a reference).
531
532 22 Ivan Lezhnjov
h2. Start a Standby
533
534 3 Ivan Lezhnjov
If you want a cold standby you're done. If you need a warm standby, then run:
535
536 20 Ivan Lezhnjov
<pre>postgres@bitarena-clone:~$ pitrtools/bin/cmd_standby -C pitrtools/cmd_standby.ini -S
537 3 Ivan Lezhnjov
NOTICE: check_pgpid_func()
538
server starting
539
postgres@bitarena-clone:~$ 2012-10-16 02:47:31 PDT LOG:  database system was interrupted; last known up at 2012-10-16 02:44:37 PDT
540
2012-10-16 02:47:31 PDT LOG:  starting archive recovery
541
2012-10-16 02:47:31 PDT LOG:  restore_command = '/usr/lib/postgresql/8.4/bin/pg_standby -s5 -w0 -c -d  /var/lib/postgresql/archive %f %p %r '
542
Trigger file 		: <not set>
543
Waiting for WAL file	: 00000001.history
544
WAL file path		: /var/lib/postgresql/archive/00000001.history
545
Restoring to		: pg_xlog/RECOVERYHISTORY
546
Sleep interval		: 5 seconds
547
Max wait interval	: 0 forever
548
Command for restore	: cp "/var/lib/postgresql/archive/00000001.history" "pg_xlog/RECOVERYHISTORY"
549
Keep archive history	: 000000000000000000000000 and later
550
running restore		:cp: cannot stat `/var/lib/postgresql/archive/00000001.history': No such file or directory
551
cp: cannot stat `/var/lib/postgresql/archive/00000001.history': No such file or directory
552
cp: cannot stat `/var/lib/postgresql/archive/00000001.history': No such file or directory
553
cp: cannot stat `/var/lib/postgresql/archive/00000001.history': No such file or directory
554
not restored
555
history file not found
556
Trigger file 		: <not set>
557
Waiting for WAL file	: 000000010000000100000025.00000020.backup
558
WAL file path		: /var/lib/postgresql/archive/000000010000000100000025.00000020.backup
559
Restoring to		: pg_xlog/RECOVERYHISTORY
560
Sleep interval		: 5 seconds
561
Max wait interval	: 0 forever
562
Command for restore	: cp "/var/lib/postgresql/archive/000000010000000100000025.00000020.backup" "pg_xlog/RECOVERYHISTORY"
563
Keep archive history	: 000000000000000000000000 and later
564
running restore		: OK
565
2012-10-16 02:48:01 PDT LOG:  restored log file "000000010000000100000025.00000020.backup" from archive
566
Trigger file 		: <not set>
567
Waiting for WAL file	: 000000010000000100000025
568
WAL file path		: /var/lib/postgresql/archive/000000010000000100000025
569
Restoring to		: pg_xlog/RECOVERYXLOG
570
Sleep interval		: 5 seconds
571
Max wait interval	: 0 forever
572
Command for restore	: cp "/var/lib/postgresql/archive/000000010000000100000025" "pg_xlog/RECOVERYXLOG"
573
Keep archive history	: 000000000000000000000000 and later
574
running restore		: OK
575
576
2012-10-16 02:48:02 PDT LOG:  restored log file "000000010000000100000025" from archive
577
2012-10-16 02:48:02 PDT LOG:  automatic recovery in progress
578
2012-10-16 02:48:02 PDT LOG:  redo starts at 1/25000020, consistency will be reached at 1/2504FFC4
579
2012-10-16 02:48:03 PDT LOG:  consistent recovery state reached
580
Trigger file 		: <not set>
581
Waiting for WAL file	: 000000010000000100000026
582
WAL file path		: /var/lib/postgresql/archive/000000010000000100000026
583
Restoring to		: pg_xlog/RECOVERYXLOG
584
Sleep interval		: 5 seconds
585
Max wait interval	: 0 forever
586
Command for restore	: cp "/var/lib/postgresql/archive/000000010000000100000026" "pg_xlog/RECOVERYXLOG"
587
Keep archive history	: 000000010000000100000025 and later
588
WAL file not present yet.
589
WAL file not present yet.
590
WAL file not present yet.
591
WAL file not present yet.
592
WAL file not present yet.
593
WAL file not present yet.
594
WAL file not present yet.
595
WAL file not present yet.
596
WAL file not present yet.
597
WAL file not present yet.</pre>
598
599 20 Ivan Lezhnjov
At this point PostgreSQL armed with pitrtools on master server will be continuously shipping log files to @archivedir:@ on slave. Once shipped, the WAL files will be immediately replayed, because slave in standby mode continuously scans @archivedir:@ for new WAL files and replays them as soon as they become available (this can be seen from example console output above).
600 3 Ivan Lezhnjov
601
h1. Failing Over
602
603 20 Ivan Lezhnjov
Now you have a warm standby mirroring changes occurring on the master server. When your master server becomes unavailable due to any reason, you could turn this warm standby server into a production instance by simply running on standby machine a failover action as shown below. For this PostgreSQL on master must not be running, otherwise pitrtools will throw out a warning and refuse to failover.
604 3 Ivan Lezhnjov
605 20 Ivan Lezhnjov
<pre>postgres@bitarena-clone:~$ pitrtools/bin/cmd_standby -C pitrtools/cmd_standby.ini -F999
606 3 Ivan Lezhnjov
607
NOTICE: check_pgpid_func()
608
2012-10-16 02:56:20 PDT LOG:  received fast shutdown request
609
2012-10-16 02:56:20 PDT LOG:  aborting any active transactions
610
waiting for server to shut down....2012-10-16 02:56:20 PDT LOG:  shutting down
611
2012-10-16 02:56:20 PDT LOG:  database system is shut down
612
 done
613
server stopped
614
server starting
615
NOTICE: Statistics are not replicated in warm standy mode.
616
HINT: Execute ANALYZE on your databases
617
postgres@bitarena-clone:~$ 2012-10-16 02:56:22 PDT LOG:  database system was interrupted while in recovery at log time 2012-10-16 02:54:35 PDT
618
2012-10-16 02:56:22 PDT HINT:  If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.
619
2012-10-16 02:56:22 PDT LOG:  starting archive recovery
620
2012-10-16 02:56:22 PDT LOG:  restore_command = 'cp /var/lib/postgresql/archive/%f "%p"'
621
cp: cannot stat `/var/lib/postgresql/archive/00000001.history': No such file or directory
622
2012-10-16 02:56:23 PDT LOG:  restored log file "000000010000000100000026" from archive
623
2012-10-16 02:56:23 PDT LOG:  automatic recovery in progress
624
2012-10-16 02:56:23 PDT LOG:  redo starts at 1/2635B428, consistency will be reached at 1/27000000
625
cp: cannot stat `/var/lib/postgresql/archive/000000010000000100000027': No such file or directory
626
2012-10-16 02:56:23 PDT LOG:  could not open file "pg_xlog/000000010000000100000027" (log file 1, segment 39): No such file or directory
627
2012-10-16 02:56:23 PDT LOG:  redo done at 1/265AD744
628
2012-10-16 02:56:23 PDT LOG:  last completed transaction was at log time 2012-10-16 02:55:07.387709-07
629
2012-10-16 02:56:24 PDT LOG:  restored log file "000000010000000100000026" from archive
630
cp: cannot stat `/var/lib/postgresql/archive/00000002.history': No such file or directory
631
cp: cannot stat `/var/lib/postgresql/archive/00000003.history': No such file or directory
632
cp: cannot stat `/var/lib/postgresql/archive/00000004.history': No such file or directory
633
cp: cannot stat `/var/lib/postgresql/archive/00000005.history': No such file or directory
634
cp: cannot stat `/var/lib/postgresql/archive/00000006.history': No such file or directory
635
cp: cannot stat `/var/lib/postgresql/archive/00000007.history': No such file or directory
636
cp: cannot stat `/var/lib/postgresql/archive/00000008.history': No such file or directory
637
cp: cannot stat `/var/lib/postgresql/archive/00000009.history': No such file or directory
638
2012-10-16 02:56:24 PDT LOG:  selected new timeline ID: 9
639
cp: cannot stat `/var/lib/postgresql/archive/00000001.history': No such file or directory
640
2012-10-16 02:56:25 PDT LOG:  archive recovery complete
641
2012-10-16 02:56:26 PDT LOG:  database system is ready to accept connections
642
2012-10-16 02:56:26 PDT LOG:  autovacuum launcher started
643
644
postgres@bitarena-clone:~$</pre>
645
646
This will create @recovery.conf@ file under @pg_data:@ directory and restart PostgreSQL to enter production mode of operation.
647
648
After this, you'd be basically running a copy of production master server. Keep in mind that you would also need to change IP addresses and/or load balancing configuration, routing, firewall rules or anything else that might be in the way of establishing a successful connection to this host. This is where @action_failover:@ script could come in handy. 
649
650
Plan in advance, figure this all out to avoid any downtime before you will need to failover.
651
652
653 20 Ivan Lezhnjov
h1. Other things you could do with pitrtools and some tips
654 3 Ivan Lezhnjov
655 20 Ivan Lezhnjov
This how-to is meant to help you get started with pitrtools. pitrtools can do more, though, than just help you configure standby.
656 3 Ivan Lezhnjov
657
658
h2. Point-In-Time Recovery
659
660 20 Ivan Lezhnjov
<pre>postgres@bitarena-clone:~$ pitrtools/bin/cmd_standby -C pitrtools/cmd_standby.ini -F999 -R '2008-05-28 11:00:38.059389'
661 3 Ivan Lezhnjov
...</pre>
662
663
This is essentially a restore to a specific point in time action. In general, you can only restore to a point in time while using cold standby, because PITR will "stop" recovering at the point in time you've specified, and both warm and hot standby servers have already recovered all the WAL files they can. In this regard, it would be a good idea to have a cold standby around for disaster (at logical level) recovery.
664
665
Once this has been done, you can't choose another timestamp to restore to.
666
667
h2. Entering Standby Mode After Failover On Slave
668
669
Suppose you failed over to your standby slave, which is running now as a replacement of master for your applications. You've fixed problems with the actual master and want this slave host to enter standby mode again. Here's how you'd do it:
670
671 20 Ivan Lezhnjov
<pre>postgres@bitarena-clone:~$ pitrtools/bin/cmd_standby -C pitrtools/cmd_standby.ini -Astop
672 3 Ivan Lezhnjov
...</pre>
673
674
This will stop entire PostgreSQL service. You could also use PostgreSQL init script to achieve the same instead. If you need more fine-grained control use @pg_ctlcluser 8.4 main stop@ (see @man pg_ctlcluster@ for more details). Take a new base backup as before and enter standby mode:
675
676 20 Ivan Lezhnjov
<pre>postgres@bitarena-clone:~$ pitrtools/bin/cmd_standby -C pitrtools/cmd_standby.ini -B
677 3 Ivan Lezhnjov
...
678
679 20 Ivan Lezhnjov
postgres@bitarena-clone:~$ pitrtools/bin/cmd_standby -C pitrtools/cmd_standby.ini -S
680 3 Ivan Lezhnjov
...</pre>
681
682
Again, if you want a cold standby just don't run -S action after -B.
683 4 Ivan Lezhnjov
684
h2. Alerts
685
686 20 Ivan Lezhnjov
Alerting is designed to run your custom scripts. You could easily integrate pitrtools alerting to your existing NMS be it Nagios, Zabbix, or anything else, send e-mails or take actions you decide.
687 4 Ivan Lezhnjov
688
h2. Logging
689
690
It bears repeating -- just In case you overlooked this in standby sample configuration file notes -- if your @postgresql.conf@ doesn't specify a log file to write to and you don't use @logfile:@ parameter in @cmd_standby.ini@ the output will be directed to stdout (your console) and nothing will ever be written to a log file on disk. 
691
692
If you restart PostgreSQL, that'll fix the problem, but you can avoid it in the first place by either specifying log file to write to in @postgresql.conf@ or by using @logfile:@ parameter in @cmd_standby.ini@.
693
694
h2. Troubleshooting
695
696
Set @debug:@ parameter in configuration files to @on@ and scrutinize the information. PostgreSQL log file is also a good place to look at.
697
698
h1. Getting Help
699
700 20 Ivan Lezhnjov
A very low-traffic mailing list for pitrtools can be found here http://lists.commandprompt.com/mailman/listinfo/pitrtools/
701 4 Ivan Lezhnjov
There is also consulting available from "Command Prompt":https://commandprompt.com/contact/