How To Configure Warm Standby with PITRTools » History » Version 29
Ivan Lezhnjov, 05/08/2013 06:47 AM
1 | 1 | Ivan Lezhnjov | {{toc}} |
---|---|---|---|
2 | |||
3 | 20 | Ivan Lezhnjov | h1. How To Configure Warm Standby with pitrtools |
4 | 1 | Ivan Lezhnjov | |
5 | 20 | Ivan Lezhnjov | This how-to demonstrates how to configure a warm standby using PostgreSQL 8.4 and pitrtools. |
6 | 1 | Ivan Lezhnjov | |
7 | 20 | Ivan Lezhnjov | h1. What Is pitrtools and Why Would You Want To Use It? |
8 | 1 | Ivan Lezhnjov | |
9 | 20 | Ivan Lezhnjov | Essentially, pitrtools is a wrapper around standard tools, such as rsync and PostgreSQL's internal functionality, that makes, among other things, creating and managing standby configurations and subsequent failover to a standby a snap. |
10 | 1 | Ivan Lezhnjov | |
11 | 20 | Ivan Lezhnjov | With the help of pitrtools you could do more, namely: |
12 | 1 | Ivan Lezhnjov | * secure shipping of log files to configured standby server over SSL protected link; |
13 | * streaming replication; |
||
14 | * enable/disable archiving without the need to restart PostgreSQL; |
||
15 | * stay informed by generating alerts based on various levels of severity of events happening during the process on both ends of configuration; |
||
16 | * automatically take a base backup, including table spaces, restore archives and purge old ones (if PostgreSQL >8.3); |
||
17 | * failover to the latest restore point and point-in-time recovery (restore using timestamps); |
||
18 | * etc. |
||
19 | |||
20 | 20 | Ivan Lezhnjov | Obviously, pitrtools attempts to make things simpler, more secure and easier to manage. |
21 | 1 | Ivan Lezhnjov | |
22 | h1. Naming Conventions |
||
23 | |||
24 | Master server can be referred to also as archiver. |
||
25 | Slave is often called standby. |
||
26 | |||
27 | These names are used interchangeably. |
||
28 | |||
29 | h1. The Test Setup |
||
30 | |||
31 | 20 | Ivan Lezhnjov | It's a good idea to first setup pitrtools and play with it before you actually go ahead and change configuration of your production servers. To show how pitrtools is configured and work, I'll describe the entire process using a 2 hosts test setup as an example. |
32 | 1 | Ivan Lezhnjov | |
33 | So, there are 2 hosts named @bitarena@ and @bitarena-clone@, both being virtualized instances of Debian Squeeze. I assume you're experienced enough to install Debian yourself and know how to find your way around the system. |
||
34 | |||
35 | 20 | Ivan Lezhnjov | @bitarena@ is designated role of master server aka archiver, @bitarena-clone@ is a slave aka standby server. pitrtools is installed on both hosts, but each host uses different tools from the package. |
36 | 1 | Ivan Lezhnjov | |
37 | h1. The Process |
||
38 | |||
39 | 20 | Ivan Lezhnjov | These are major steps in actual order of execution that one would have to follow to get pitrtools-enabled setup running: |
40 | 1 | Ivan Lezhnjov | |
41 | On master server |
||
42 | |||
43 | * Turn on archiving |
||
44 | 21 | Ivan Lezhnjov | * Install helper scripts |
45 | 1 | Ivan Lezhnjov | * @cmd_archiver -C $CONFIG -I@ |
46 | |||
47 | On slave server |
||
48 | |||
49 | * @cmd_standby -C $CONFIG -I@ |
||
50 | * @cmd_standby -C $CONFIG -B@ |
||
51 | * @cmd_standby -C $CONFIG -S@ |
||
52 | |||
53 | 20 | Ivan Lezhnjov | h1. Installing and Configuring pitrtools |
54 | 1 | Ivan Lezhnjov | |
55 | 23 | Ivan Lezhnjov | If your system doesn't have a package for pitrtools download a tarball from pitrtools project page at https://public.commandprompt.com/projects/pitrtools/wiki. |
56 | 20 | Ivan Lezhnjov | |
57 | 24 | Ivan Lezhnjov | Installing pitrtools is a matter of unpacking an archive file. The recommended default is `postgres' home directory. |
58 | |||
59 | You can put it anywhere in your filesystem as long as the files belong to and can be accessed by `postgres'. |
||
60 | 1 | Ivan Lezhnjov | |
61 | 20 | Ivan Lezhnjov | h1. SSH Key-based Login for pitrtools |
62 | 1 | Ivan Lezhnjov | |
63 | 20 | Ivan Lezhnjov | pitrtools relies on rsync and SSH heavily to do its work, e.g. making a base backup and shipping WAL log files from master to slave server -- all that happens over an SSL-protected communication channel. This is the area where pitrtools makes one's life easier, because otherwise you'd have to find a way to copy log files to a slave server somehow (most likely it would be some sort of networking mount point). |
64 | 1 | Ivan Lezhnjov | |
65 | 5 | Ivan Lezhnjov | Therefore one of the prerequisites is to configure SSH key-based logins between the two hosts for @postgres@ system user that don't require a password or a passphrase. |
66 | 1 | Ivan Lezhnjov | |
67 | *Master* |
||
68 | <pre>root@bitarena:~# su - postgres |
||
69 | |||
70 | postgres@bitarena:~$ ssh-keygen -t rsa |
||
71 | ... |
||
72 | postgres@bitarena:~$ ls -la /var/lib/postgresql/.ssh/ |
||
73 | total 16 |
||
74 | drwx------ 2 postgres postgres 4096 Sep 28 09:17 . |
||
75 | drwxr-xr-x 5 postgres postgres 4096 Sep 28 09:17 .. |
||
76 | -rw------- 1 postgres postgres 1675 Sep 28 09:17 id_rsa |
||
77 | -rw-r--r-- 1 postgres postgres 399 Sep 28 09:17 id_rsa.pub |
||
78 | postgres@bitarena:~# |
||
79 | </pre> |
||
80 | |||
81 | *Slave* |
||
82 | <pre>root@bitarena-clone:~# su - postgres |
||
83 | postgres@bitarena-clone:~$ ssh-keygen -t rsa |
||
84 | ... </pre> |
||
85 | |||
86 | Answer all the questions and ssh-keygen will create both private and public (*.pub file) RSA keys. For now just create the key pair, one on on each host (master and slave). When done, proceed to exchanging public keys between the hosts like this: |
||
87 | |||
88 | *Master* |
||
89 | 18 | Ivan Lezhnjov | <pre>postgres@bitarena:~$ ssh-copy-id bitarena-clone |
90 | Now try logging into the machine, with "ssh 'bitarena-clone'", and check in: |
||
91 | 1 | Ivan Lezhnjov | |
92 | 18 | Ivan Lezhnjov | .ssh/authorized_keys |
93 | 1 | Ivan Lezhnjov | |
94 | 18 | Ivan Lezhnjov | to make sure we haven't added extra keys that you weren't expecting. |
95 | 1 | Ivan Lezhnjov | |
96 | postgres@bitarena:~$ ssh bitarena-clone |
||
97 | 18 | Ivan Lezhnjov | Linux bitarena-clone 2.6.32-5-686 #1 SMP Sun May 6 04:01:19 UTC 2012 i686 |
98 | 1 | Ivan Lezhnjov | |
99 | The programs included with the Debian GNU/Linux system are free software; |
||
100 | the exact distribution terms for each program are described in the |
||
101 | individual files in /usr/share/doc/*/copyright. |
||
102 | |||
103 | Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent |
||
104 | permitted by applicable law. |
||
105 | 18 | Ivan Lezhnjov | Last login: Thu Oct 18 06:07:22 2012 from bitarena.localdomain |
106 | postgres@bitarena-clone:~$ less .ssh/authorized_keys |
||
107 | postgres@bitarena-clone:~$ </pre> |
||
108 | 1 | Ivan Lezhnjov | |
109 | 18 | Ivan Lezhnjov | This will copy @postgres@ system user's public SSH key on master to standby into a file @authorized_keys@, which will allow @postgres@ on master to login to slave host without a password and also in a secure fashion. |
110 | 1 | Ivan Lezhnjov | |
111 | 19 | Ivan Lezhnjov | Remember that this has been done on master server only. In a similar manner, you should take care of the slave host. |
112 | 1 | Ivan Lezhnjov | |
113 | 20 | Ivan Lezhnjov | It's worth noting that pitrtools runs some actions remotely. For example, when a base backup action is run on a slave host, cmd_standby script establishes SSH session to a master host and runs various psql commands to deal with checkpoints, copy log files, etc. This also requires PostgreSQL user password to access the database, and it is often needed to be entered 4 and more times for a base backup action to complete. |
114 | 1 | Ivan Lezhnjov | |
115 | 20 | Ivan Lezhnjov | Once you played around with pitrtools enough to get hang of things, you could avoid having to enter password manually each time by either using .pgpass file (which is a "standard feature of PostgreSQL":http://www.postgresql.org/docs/8.4/static/libpq-pgpass.html) or make sure there's a "trust relationship configured for localhost in @pg_hba.conf@ file":http://www.postgresql.org/docs/8.4/static/auth-methods.html. |
116 | 19 | Ivan Lezhnjov | |
117 | For the sake of clarity, consider these examples: |
||
118 | |||
119 | .pgpass |
||
120 | <pre> |
||
121 | 127.0.0.1:*:postgres:postgres:postgrespass |
||
122 | </pre> |
||
123 | |||
124 | pg_hba.conf |
||
125 | <pre> |
||
126 | host postgres postgres 127.0.0.1/32 trust |
||
127 | </pre> |
||
128 | 1 | Ivan Lezhnjov | |
129 | In the console output examples you're going to see below I entered password manually, but this, of course, is really inconvenient in production environment. |
||
130 | |||
131 | h1. Master Configuration File |
||
132 | |||
133 | 20 | Ivan Lezhnjov | Sample configuration files are pretty good as they are with the defaults they come. Chances are you're not going to change a lot in there. Make sure you read all *.README files, though, because they contain helpful extra information about pitrtools and configuration parameters that will help you decide how to best configure your servers. |
134 | 1 | Ivan Lezhnjov | |
135 | 20 | Ivan Lezhnjov | We start with the master host by editing pitrtools configuration file for master/archiver. |
136 | 1 | Ivan Lezhnjov | |
137 | 27 | Ivan Lezhnjov | <pre>postgres@bitarena:~$ vim pitrtools/etc/cmd_archiver.ini |
138 | 1 | Ivan Lezhnjov | |
139 | [DEFAULT] |
||
140 | ; online or offline |
||
141 | state: online |
||
142 | |||
143 | ; The base database directory |
||
144 | |||
145 | pgdata: /var/lib/postgresql/8.4/main |
||
146 | |||
147 | ; where to remotely copy archives |
||
148 | r_archivedir: /var/lib/postgresql/archive |
||
149 | |||
150 | ; where to locally copy archives |
||
151 | l_archivedir: /var/lib/postgresql/archive |
||
152 | 25 | Ivan Lezhnjov | |
153 | ; where is ssh and what flags to add |
||
154 | ssh: /usr/bin/ssh |
||
155 | 1 | Ivan Lezhnjov | |
156 | ; where is rsync |
||
157 | rsync_bin: /usr/bin/rsync |
||
158 | |||
159 | ; extra rsync flags |
||
160 | rsync_flags: -z |
||
161 | |||
162 | ; option 2 or 3, if running RHEL5 or similar it is likely 2 |
||
163 | ; if you are running something that ships remotely modern software |
||
164 | ; it will be 3 |
||
165 | |||
166 | rsync_version = 3 |
||
167 | |||
168 | ; IP of slave |
||
169 | slaves: bitarena-clone |
||
170 | |||
171 | 25 | Ivan Lezhnjov | ; the user for the ssh connection to a standby |
172 | 1 | Ivan Lezhnjov | user: postgres |
173 | |||
174 | 25 | Ivan Lezhnjov | ; if rsync can't connect in 10 seconds error |
175 | 1 | Ivan Lezhnjov | ssh_timeout: 10 |
176 | |||
177 | ; command to process in ok |
||
178 | notify_ok: echo OK |
||
179 | |||
180 | ; command to process in warning |
||
181 | notify_warning: echo WARNING |
||
182 | |||
183 | ; command to process in critical |
||
184 | notify_critical: echo CRITICAL |
||
185 | |||
186 | ; if you want to debug on/off only |
||
187 | debug: on |
||
188 | |||
189 | ; if you want ssh debug (warning noisy) |
||
190 | ssh_debug: off |
||
191 | </pre> |
||
192 | |||
193 | |||
194 | 20 | Ivan Lezhnjov | Note that you can use domain names instead of IP addresses for slaves: parameter, it works either way just fine. You might also want to turn debugging on while you're learning pitrtools. It helps to see what software does, if you try to understand better how it works. |
195 | 1 | Ivan Lezhnjov | |
196 | 11 | Ivan Lezhnjov | h2. Turn On and Configure Archiving |
197 | 1 | Ivan Lezhnjov | |
198 | 20 | Ivan Lezhnjov | If you are not using Streaming Replication / Hot Standby, we need to turn on archiving functionality in PostgreSQL. This is purely a PostgreSQL feature, but we can also take advantage of using pitrtools thanks to flexible design of PostgreSQL. |
199 | 2 | Ivan Lezhnjov | |
200 | <pre>postgres@bitarena:~$ vim /etc/postgresql/8.4/main/postgresql.conf |
||
201 | |||
202 | ... |
||
203 | archive_mode=on |
||
204 | 26 | Ivan Lezhnjov | archive_command = '/var/lib/postgresql/pitrtools/bin/cmd_archiver -C /var/lib/postgresql/pitrtools/etc/cmd_archiver.ini -F %p' |
205 | 2 | Ivan Lezhnjov | ... |
206 | </pre> |
||
207 | |||
208 | |||
209 | 20 | Ivan Lezhnjov | Here we explicitly turn archiving mode on and tell PostgreSQL to use @cmd_archiver@, part of pitrtools, as the archiving command. |
210 | 2 | Ivan Lezhnjov | It must be provided with a path to a configuration file (@-C@ switch), and a path to a log file (@-F@ switch). @%p@ is substituted by PostgreSQL with the actual log file location in the file system. To learn more about how to use @cmd_archiver@ run |
211 | |||
212 | 20 | Ivan Lezhnjov | <pre>postgres@bitarena:~$ pitrtools/bin/cmd_archiver --help</pre> |
213 | 2 | Ivan Lezhnjov | |
214 | |||
215 | 20 | Ivan Lezhnjov | Please, keep in mind that pitrtools isn't supposed to be run by @root@ user. Technically, it can be run by any system user other than root, and unless you have a customized configuration your default PostgreSQL user will be @postgres@, so pitrtools are expected to be run as that system user. |
216 | 2 | Ivan Lezhnjov | |
217 | Restart PostgreSQL for changes to take effect. |
||
218 | |||
219 | <pre>postgres@bitarena:~$ /etc/init.d/postgresql restart |
||
220 | Restarting PostgreSQL 8.4 database server: main. |
||
221 | postgres@bitarena:~# |
||
222 | </pre> |
||
223 | |||
224 | 12 | Ivan Lezhnjov | h2. Install Helper Scripts |
225 | 2 | Ivan Lezhnjov | |
226 | 20 | Ivan Lezhnjov | Apply cmd_standby.sql to the database of pitrtools user (usually @postgres@). This is required for master server only. |
227 | 2 | Ivan Lezhnjov | |
228 | 29 | Ivan Lezhnjov | <pre>postgres@bitarena:~$ psql -U postgres < ~/pitrtools/scripts/cmd_standby.sql |
229 | 2 | Ivan Lezhnjov | CREATE FUNCTION |
230 | COMMENT |
||
231 | CREATE FUNCTION |
||
232 | CREATE FUNCTION |
||
233 | CREATE FUNCTION |
||
234 | COMMENT |
||
235 | CREATE FUNCTION |
||
236 | COMMENT |
||
237 | 29 | Ivan Lezhnjov | postgres@bitarena:~$</pre> |
238 | 2 | Ivan Lezhnjov | |
239 | 28 | Ivan Lezhnjov | >Note: If you are running Postgres 9.2, use cmd_standby.92.sql |
240 | 2 | Ivan Lezhnjov | |
241 | 12 | Ivan Lezhnjov | h2. Initialize Master Environment |
242 | 2 | Ivan Lezhnjov | |
243 | At this point we're pretty much done configuring the master server. All that is left to do is initialize master environment by running |
||
244 | |||
245 | 20 | Ivan Lezhnjov | <pre>postgres@bitarena:~$ pitrtools/bin/cmd_archiver -C pitrtools/cmd_archiver.ini -I |
246 | 2 | Ivan Lezhnjov | We are initializing queues, one moment. |
247 | |||
248 | NOTICE: init_env_func() |
||
249 | NOTICE: generate_slave_list_func() |
||
250 | NOTICE: Your slaves are: ['bitarena-clone'] |
||
251 | postgres@bitarena:~$ ls -lah |
||
252 | total 40K |
||
253 | drwxr-xr-x 7 postgres postgres 4.0K Sep 28 10:25 . |
||
254 | drwxr-xr-x 34 root root 4.0K Sep 27 02:32 .. |
||
255 | drwxr-xr-x 3 postgres postgres 4.0K Sep 24 15:10 8.4 |
||
256 | drwx------ 2 postgres postgres 4.0K Sep 27 02:23 .aptitude |
||
257 | drwxr-xr-x 3 postgres postgres 4.0K Sep 28 10:25 archive |
||
258 | -rw------- 1 postgres postgres 1001 Sep 28 10:03 .bash_history |
||
259 | 20 | Ivan Lezhnjov | drwxr-xr-x 2 postgres postgres 4.0K Sep 28 10:25 pitrtools |
260 | 2 | Ivan Lezhnjov | -rw------- 1 postgres postgres 1.4K Sep 28 05:08 .psql_history |
261 | drwx------ 2 postgres postgres 4.0K Sep 28 09:29 .ssh |
||
262 | -rw------- 1 postgres postgres 3.6K Sep 28 10:25 .viminfo |
||
263 | postgres@bitarena:~$ ls -lah archive/bitarena-clone/ |
||
264 | total 8.0K |
||
265 | drwxr-xr-x 2 postgres postgres 4.0K Sep 28 10:25 . |
||
266 | drwxr-xr-x 3 postgres postgres 4.0K Sep 28 10:25 .. |
||
267 | postgres@bitarena:~#</pre> |
||
268 | |||
269 | |||
270 | As you can see @-I@ switch tells @cmd_archiver@ to do a couple of internal actions, as well as prepare file system layout by creating necessary directories. |
||
271 | |||
272 | @archive/@ directory has been created automatically by @cmd_archiver@ and contains sub-directory named after IP address or DNS name of a slave, @bitarena-clone@ in this example. This sub-directory is used in those circumstances when master fails to successfully transfer WAL log files to slave and stores them temporarily in this local directory. Once slave is back online, these files should be transferred to slave. |
||
273 | |||
274 | Effectively, after you've initialized master it starts trying to ship WAL files to slave. However, the slave host isn't configured yet, so the log delivery will fail and the log files should be found in @l_archivedir/slave_FQDNorIP/@ folder on master host. Once we configure slave, these will be shipped as soon as a new WAL segment is created on master. |
||
275 | |||
276 | h1. Slave aka Standby Server |
||
277 | |||
278 | Now we can prepare the slave host. SSH key-based login for @postgres@ system user should be working in both directions, from master to slave, as well as from slave to master. |
||
279 | |||
280 | Assuming you've arranged for this as it was suggested before, we can now go on and edit slave configuration file to perform first important step and initialize slave environment. |
||
281 | |||
282 | 5 | Ivan Lezhnjov | One of the configuration file parameters, namely @pgdata:@, will ask you to specify PostgreSQL data directory. You could look it up in @postgresql.conf@, or, If your PostgreSQL is already running, you could see this information like this: |
283 | 2 | Ivan Lezhnjov | |
284 | <pre>root@bitarena:~# ps axuwf |grep postgre |
||
285 | root 4233 0.0 0.0 3304 756 pts/1 S+ 10:56 0:00 \_ grep postgre |
||
286 | postgres 3352 0.0 0.4 46452 5464 ? S 09:59 0:03 /usr/lib/postgresql/8.4/bin/postgres |
||
287 | -D /var/lib/postgresql/8.4/main -c config_file=/etc/postgresql/8.4/main/postgresql.conf |
||
288 | ...</pre> |
||
289 | |||
290 | Remember, note down the path somewhere or copy it to the clipboard. We'll need it in a minute. |
||
291 | |||
292 | 12 | Ivan Lezhnjov | h2. Slave Configuration File |
293 | 2 | Ivan Lezhnjov | |
294 | Now edit slave configuration file. Most defaults are good and safe options, but your setup may be different from what I have in this how-to. So, be careful to understand what you're doing. |
||
295 | |||
296 | 20 | Ivan Lezhnjov | <pre>postgres@bitarena-clone:~$ vim pitrtools/cmd_standby.ini |
297 | 2 | Ivan Lezhnjov | [DEFAULT] |
298 | ; what major version are we using? |
||
299 | pgversion: 8.4 |
||
300 | |||
301 | ; Used for 8.2 (8.1?), should be set to something > than checkpoint_segments on master |
||
302 | numarchives: 10 |
||
303 | |||
304 | ; Whether or not to use streaming replication. If this is set to "on" |
||
305 | 20 | Ivan Lezhnjov | ; pitrtools will configure the standby server to replicate from master |
306 | 2 | Ivan Lezhnjov | ; using streaming replication. |
307 | ; This can only be used with PostgreSQL 9.0 and up. |
||
308 | use_streaming_replication: off |
||
309 | |||
310 | ; File to touch to end replication when using streaming replication. |
||
311 | 20 | Ivan Lezhnjov | trigger_file: /var/lib/postgresql/pitrtools/cmd_end_recovery |
312 | 2 | Ivan Lezhnjov | |
313 | ; User to connect to master DB while using streaming replication, |
||
314 | ; ignored if not using streaming replication. |
||
315 | repl_db_user: replication |
||
316 | |||
317 | ; Password for the user repl_db_user. |
||
318 | repl_db_password: secret |
||
319 | |||
320 | ; sslmode to use when connecting for streaming replication. |
||
321 | ; Accepted values: the same as libpq: disable, allow, prefer, require, verify-ca and verify-full |
||
322 | ; Default: sslmode: prefer |
||
323 | sslmode: prefer |
||
324 | |||
325 | ; Commands needed for execution |
||
326 | ; absolute path to ssh |
||
327 | ssh: /usr/bin/ssh |
||
328 | |||
329 | ; absolute path to rsync |
||
330 | rsync: /usr/bin/rsync |
||
331 | |||
332 | ; extra rsync flags |
||
333 | rsync_flags: -z |
||
334 | 17 | Ivan Lezhnjov | |
335 | ; Confs |
||
336 | |||
337 | ; This is the postgresql.conf to be used for the failover |
||
338 | 20 | Ivan Lezhnjov | postgresql_conf_failover: /var/lib/postgresql/pitrtools/failover/postgesql.conf |
339 | 17 | Ivan Lezhnjov | |
340 | ; This is the pg_hba.conf to be used for the failover |
||
341 | 20 | Ivan Lezhnjov | pg_hba_conf_failover: /var/lib/postgresql/pitrtools/failover/pg_hba.conf |
342 | 2 | Ivan Lezhnjov | |
343 | ; the path to to the postgres bin |
||
344 | pg_standby: /usr/lib/postgresql/8.4/bin/pg_standby |
||
345 | pg_ctl: /usr/lib/postgresql/8.4/bin/pg_ctl |
||
346 | |||
347 | ; path to psql on the master |
||
348 | r_psql: /usr/lib/postgresql/8.4/bin/psql |
||
349 | |||
350 | ; Generalized information |
||
351 | |||
352 | ; the port postgresql runs on (master) |
||
353 | port: 5432 |
||
354 | |||
355 | ; ip or name of master server |
||
356 | master_public_ip: bitarena |
||
357 | |||
358 | ; the ip address we should use when processing remote shell |
||
359 | master_local_ip: 127.0.0.1 |
||
360 | |||
361 | ; the user performed initdb |
||
362 | user: postgres |
||
363 | |||
364 | ; on or off |
||
365 | debug: on |
||
366 | |||
367 | ; on or off |
||
368 | ssh_debug: off |
||
369 | |||
370 | ; the timeout for ssh before we throw an alarm |
||
371 | ssh_timeout: 30 |
||
372 | |||
373 | ; should be the same as r_archivedir for archiver |
||
374 | archivedir: /var/lib/postgresql/archive |
||
375 | |||
376 | ; where you executed initdb -D to |
||
377 | pgdata: /var/lib/postgresql/8.4/main |
||
378 | |||
379 | ; Confs |
||
380 | |||
381 | ; This is the postgresql.conf to be used when not in standby |
||
382 | postgresql_conf: /etc/postgresql/8.4/main/postgresql.conf |
||
383 | |||
384 | ; This is the pg_hba.conf to be used when not in standby |
||
385 | pg_hba_conf: /etc/postgresql/8.4/main/pg_hba.conf |
||
386 | |||
387 | ; By default postgresql.conf and pg_hba.conf will be copied from the |
||
388 | ; locations specified above to pgdata directory on failover. |
||
389 | ; |
||
390 | ; Uncomment the following to make postgres actually use the above conf |
||
391 | ; files w/o copying them to pgdata. |
||
392 | ;no_copy_conf: true |
||
393 | |||
394 | ; The recovery.conf file to create when starting up |
||
395 | ; Defaults to %(pgdata)/recovery.conf |
||
396 | recovery_conf: /var/lib/postgresql/8.4/main/recovery.conf |
||
397 | |||
398 | ; Useful when postgresql.conf doesn't specify log destination |
||
399 | ; Will be passed with -l to pg_ctl when starting the server. |
||
400 | ; |
||
401 | ; If you're worried about having complete logs, either make sure |
||
402 | ; postgresql.conf points to a log file, or use the logfile: parameter. |
||
403 | ; |
||
404 | ; Otherwise postgresql will print on standard stdout and nothing |
||
405 | ; will be recorded in the logs |
||
406 | ; |
||
407 | ;logfile: /var/log/postgresql/postgresql.log |
||
408 | |||
409 | ; Alarms |
||
410 | |||
411 | notify_critical: echo CRITICAL |
||
412 | notify_warning: echo WARNING |
||
413 | notify_ok: echo OK |
||
414 | |||
415 | ; On failover action |
||
416 | |||
417 | ; Whatever is placed here will be executed on -FS must return 0 |
||
418 | |||
419 | 20 | Ivan Lezhnjov | action_failover: /var/lib/postgresql/pitrtools/failover.sh</pre> |
420 | 2 | Ivan Lezhnjov | |
421 | @action_failover:@ script has to exist and have permissions of at least chmod u+x equivalent, it could be just a placeholder script with a simple action of: |
||
422 | |||
423 | <pre>#!/bin/bash |
||
424 | 20 | Ivan Lezhnjov | touch /var/lib/postgresql/pitrtools/failover_happened</pre> |
425 | 2 | Ivan Lezhnjov | |
426 | 20 | Ivan Lezhnjov | but it's meant as a way to let you do certain actions on failover, those would be very specific for each given setup. It's good to know, though, that pitrtools lets you take actions automatically when failover happens. Use this feature to make your setup more sophisticated. |
427 | 17 | Ivan Lezhnjov | |
428 | In addition, when doing failover, there are two more additional options to take into consideration, namely @postgresql_conf_failover:@ and @pg_hba_conf_failover:@. Both allow you to start server on failover using alternative configuration. This is meant to provide users with a way to prepare their failover scenario configuration in advance. |
||
429 | 2 | Ivan Lezhnjov | |
430 | 12 | Ivan Lezhnjov | h2. Initialize Slave Environment |
431 | 2 | Ivan Lezhnjov | |
432 | First stop PostgreSQL, then initialize slave environment. |
||
433 | |||
434 | <pre>postgres@bitarena-clone:~$ /etc/init.d/postgresql stop |
||
435 | 20 | Ivan Lezhnjov | postgres@bitarena-clone:~$ pitrtools/bin/cmd_standby -C pitrtools/cmd_standby.ini -I |
436 | 2 | Ivan Lezhnjov | NOTICE: check_pgpid_func() |
437 | DEBUG: executing query /usr/bin/ssh -o ConnectTimeout=30 -o StrictHostKeyChecking=no postgres@bitarena "/usr/lib/postgresql/8.4/bin/psql -A -t -Upostgres -p5432 -dpostgres -h127.0.0.1 by 'SELECT * FROM cmd_get_data_dirs()' |
||
438 | Password for user postgres: postgrespass |
||
439 | |||
440 | DEBUG: /var/lib/postgresql/8.4/main |
||
441 | postgres@bitarena-clone:~$</pre> |
||
442 | |||
443 | 20 | Ivan Lezhnjov | Please, note that if @archivedir: /var/lib/postgresql/archive@ hasn't been created, you should do so manually as @postgres@ system user (or set @postgres@ user and group as the ownership information for this directory). pitrtools should do this automatically for you, but earlier versions were known not to do so. This is important, and the next step in slave configuration, which is base backup, will fail if @archivedir:@ doesn't exist. |
444 | 3 | Ivan Lezhnjov | |
445 | 12 | Ivan Lezhnjov | h2. Making a Base Backup |
446 | 3 | Ivan Lezhnjov | |
447 | Before you proceed check if @archivedir:@ exists on slave and WAL files are being shipped to it from master host. WAL files are generated and shipped only when new data is stored in the database on master host. To help simulate data flow and check whether archiving and shipping is happening, try this SQL statement on master host: |
||
448 | |||
449 | <pre>postgres@bitarena:~$ psql |
||
450 | psql (8.4.13) |
||
451 | Type "help" for help. |
||
452 | |||
453 | 20 | Ivan Lezhnjov | postgres=# create table testpitrtools1 as select * from pg_class, pg_description; |
454 | 3 | Ivan Lezhnjov | postgresq=# \q |
455 | postgres@bitarena:~$</pre> |
||
456 | |||
457 | You could create a couple of tables like that to generate enough WAL segments. See @archivedir:@ directory on slave to check whether any WAL files have been copied there. If they have, everything works as expected and you can try to make a base backup on slave host: |
||
458 | |||
459 | 20 | Ivan Lezhnjov | <pre>postgres@bitarena-clone:~$ pitrtools/bin/cmd_standby -C pitrtools/cmd_standby.ini -B |
460 | 3 | Ivan Lezhnjov | NOTICE: check_pgpid_func() |
461 | DEBUG: executing query /usr/bin/ssh -o ConnectTimeout=30 -o StrictHostKeyChecking=no postgres@bitarena "/usr/lib/postgresql/8.4/bin/psql -A -t -Upostgres -p5432 -dpostgres -h127.0.0.1 by 'checkpoint' |
||
462 | Password for user postgres: postgrespass |
||
463 | |||
464 | DEBUG: CHECKPOINT |
||
465 | DEBUG: executing query /usr/bin/ssh -o ConnectTimeout=30 -o StrictHostKeyChecking=no postgres@bitarena "/usr/lib/postgresql/8.4/bin/psql -A -t -Upostgres -p5432 -dpostgres -h127.0.0.1 by 'SELECT cmd_pg_start_backup()' |
||
466 | Password for user postgres: postgrespass |
||
467 | |||
468 | DEBUG: cmd_pg_start_backup: 1 |
||
469 | DEBUG: executing query /usr/bin/ssh -o ConnectTimeout=30 -o StrictHostKeyChecking=no postgres@bitarena "/usr/lib/postgresql/8.4/bin/psql -A -t -Upostgres -p5432 -dpostgres -h127.0.0.1 by 'SELECT * FROM cmd_get_data_dirs()' |
||
470 | Password for user postgres: postgrespass |
||
471 | |||
472 | DEBUG: executing query /usr/bin/ssh -o ConnectTimeout=30 -o StrictHostKeyChecking=no postgres@bitarena "/usr/lib/postgresql/8.4/bin/psql -A -t -Upostgres -p5432 -dpostgres -h127.0.0.1 by 'SELECT * FROM cmd_get_pgdata() LIMIT 1' |
||
473 | Password for user postgres: postgrespass |
||
474 | |||
475 | receiving incremental file list |
||
476 | ./ |
||
477 | backup_label |
||
478 | backup_label.old |
||
479 | postmaster.opts |
||
480 | base/1/ |
||
481 | base/1/pg_internal.init |
||
482 | base/11564/ |
||
483 | base/11564/pg_internal.init |
||
484 | base/16499/ |
||
485 | base/16499/pg_internal.init |
||
486 | base/33069/ |
||
487 | base/33069/33084 |
||
488 | base/33069/33268 |
||
489 | base/33069/33974 |
||
490 | base/33069/33974_fsm |
||
491 | base/33069/33974_vm |
||
492 | base/33069/33980 |
||
493 | base/33069/33993 |
||
494 | base/33069/33993_fsm |
||
495 | base/33069/33993_vm |
||
496 | base/33069/33999 |
||
497 | base/33069/33999_fsm |
||
498 | base/33069/pg_internal.init |
||
499 | global/ |
||
500 | global/pg_auth |
||
501 | global/pg_control |
||
502 | global/pg_database |
||
503 | pg_clog/0000 |
||
504 | pg_multixact/offsets/0000 |
||
505 | pg_stat_tmp/ |
||
506 | pg_stat_tmp/pgstat.stat |
||
507 | pg_subtrans/0001 |
||
508 | |||
509 | Number of files: 1537 |
||
510 | Number of files transferred: 25 |
||
511 | Total file size: 189868759 bytes |
||
512 | Total transferred file size: 22162037 bytes |
||
513 | Literal data: 920320 bytes |
||
514 | Matched data: 21241717 bytes |
||
515 | File list size: 20500 |
||
516 | File list generation time: 0.004 seconds |
||
517 | File list transfer time: 0.000 seconds |
||
518 | Total bytes sent: 65011 |
||
519 | Total bytes received: 189901 |
||
520 | |||
521 | sent 65011 bytes received 189901 bytes 101964.80 bytes/sec |
||
522 | total size is 189868759 speedup is 744.84 |
||
523 | DEBUG: executing query /usr/bin/ssh -o ConnectTimeout=30 -o StrictHostKeyChecking=no postgres@bitarena "/usr/lib/postgresql/8.4/bin/psql -A -t -Upostgres -p5432 -dpostgres -h127.0.0.1 by 'SELECT cmd_pg_stop_backup()' |
||
524 | Password for user postgres: postgrespass |
||
525 | |||
526 | DEBUG: cmd_pg_stop_backup: |
||
527 | postgres@bitarena-clone:~$</pre> |
||
528 | |||
529 | 20 | Ivan Lezhnjov | As you can see what happens is that pitrtools puts master into backup mode, synchronizes data directories (including tablespaces, if any) from master to slave and then exits backup mode on master. If base backup action had failed before it properly finished (say, you had lost connection to slave while rsync was copying files over to it), you'd need to intervene and run manually @-Astop_basebackup:@ |
530 | 3 | Ivan Lezhnjov | |
531 | 20 | Ivan Lezhnjov | <pre>postgres@bitarena-clone:~$ pitrtools/bin/cmd_standby -C pitrtools/cmd_standby.ini -Astop_basebackup |
532 | 3 | Ivan Lezhnjov | ...</pre> |
533 | |||
534 | After that run base backup action again. Just make sure it finishes its work properly (use console output for successful base backup action above as a reference). |
||
535 | |||
536 | 22 | Ivan Lezhnjov | h2. Start a Standby |
537 | |||
538 | 3 | Ivan Lezhnjov | If you want a cold standby you're done. If you need a warm standby, then run: |
539 | |||
540 | 20 | Ivan Lezhnjov | <pre>postgres@bitarena-clone:~$ pitrtools/bin/cmd_standby -C pitrtools/cmd_standby.ini -S |
541 | 3 | Ivan Lezhnjov | NOTICE: check_pgpid_func() |
542 | server starting |
||
543 | postgres@bitarena-clone:~$ 2012-10-16 02:47:31 PDT LOG: database system was interrupted; last known up at 2012-10-16 02:44:37 PDT |
||
544 | 2012-10-16 02:47:31 PDT LOG: starting archive recovery |
||
545 | 2012-10-16 02:47:31 PDT LOG: restore_command = '/usr/lib/postgresql/8.4/bin/pg_standby -s5 -w0 -c -d /var/lib/postgresql/archive %f %p %r ' |
||
546 | Trigger file : <not set> |
||
547 | Waiting for WAL file : 00000001.history |
||
548 | WAL file path : /var/lib/postgresql/archive/00000001.history |
||
549 | Restoring to : pg_xlog/RECOVERYHISTORY |
||
550 | Sleep interval : 5 seconds |
||
551 | Max wait interval : 0 forever |
||
552 | Command for restore : cp "/var/lib/postgresql/archive/00000001.history" "pg_xlog/RECOVERYHISTORY" |
||
553 | Keep archive history : 000000000000000000000000 and later |
||
554 | running restore :cp: cannot stat `/var/lib/postgresql/archive/00000001.history': No such file or directory |
||
555 | cp: cannot stat `/var/lib/postgresql/archive/00000001.history': No such file or directory |
||
556 | cp: cannot stat `/var/lib/postgresql/archive/00000001.history': No such file or directory |
||
557 | cp: cannot stat `/var/lib/postgresql/archive/00000001.history': No such file or directory |
||
558 | not restored |
||
559 | history file not found |
||
560 | Trigger file : <not set> |
||
561 | Waiting for WAL file : 000000010000000100000025.00000020.backup |
||
562 | WAL file path : /var/lib/postgresql/archive/000000010000000100000025.00000020.backup |
||
563 | Restoring to : pg_xlog/RECOVERYHISTORY |
||
564 | Sleep interval : 5 seconds |
||
565 | Max wait interval : 0 forever |
||
566 | Command for restore : cp "/var/lib/postgresql/archive/000000010000000100000025.00000020.backup" "pg_xlog/RECOVERYHISTORY" |
||
567 | Keep archive history : 000000000000000000000000 and later |
||
568 | running restore : OK |
||
569 | 2012-10-16 02:48:01 PDT LOG: restored log file "000000010000000100000025.00000020.backup" from archive |
||
570 | Trigger file : <not set> |
||
571 | Waiting for WAL file : 000000010000000100000025 |
||
572 | WAL file path : /var/lib/postgresql/archive/000000010000000100000025 |
||
573 | Restoring to : pg_xlog/RECOVERYXLOG |
||
574 | Sleep interval : 5 seconds |
||
575 | Max wait interval : 0 forever |
||
576 | Command for restore : cp "/var/lib/postgresql/archive/000000010000000100000025" "pg_xlog/RECOVERYXLOG" |
||
577 | Keep archive history : 000000000000000000000000 and later |
||
578 | running restore : OK |
||
579 | |||
580 | 2012-10-16 02:48:02 PDT LOG: restored log file "000000010000000100000025" from archive |
||
581 | 2012-10-16 02:48:02 PDT LOG: automatic recovery in progress |
||
582 | 2012-10-16 02:48:02 PDT LOG: redo starts at 1/25000020, consistency will be reached at 1/2504FFC4 |
||
583 | 2012-10-16 02:48:03 PDT LOG: consistent recovery state reached |
||
584 | Trigger file : <not set> |
||
585 | Waiting for WAL file : 000000010000000100000026 |
||
586 | WAL file path : /var/lib/postgresql/archive/000000010000000100000026 |
||
587 | Restoring to : pg_xlog/RECOVERYXLOG |
||
588 | Sleep interval : 5 seconds |
||
589 | Max wait interval : 0 forever |
||
590 | Command for restore : cp "/var/lib/postgresql/archive/000000010000000100000026" "pg_xlog/RECOVERYXLOG" |
||
591 | Keep archive history : 000000010000000100000025 and later |
||
592 | WAL file not present yet. |
||
593 | WAL file not present yet. |
||
594 | WAL file not present yet. |
||
595 | WAL file not present yet. |
||
596 | WAL file not present yet. |
||
597 | WAL file not present yet. |
||
598 | WAL file not present yet. |
||
599 | WAL file not present yet. |
||
600 | WAL file not present yet. |
||
601 | WAL file not present yet.</pre> |
||
602 | |||
603 | 20 | Ivan Lezhnjov | At this point PostgreSQL armed with pitrtools on master server will be continuously shipping log files to @archivedir:@ on slave. Once shipped, the WAL files will be immediately replayed, because slave in standby mode continuously scans @archivedir:@ for new WAL files and replays them as soon as they become available (this can be seen from example console output above). |
604 | 3 | Ivan Lezhnjov | |
605 | h1. Failing Over |
||
606 | |||
607 | 20 | Ivan Lezhnjov | Now you have a warm standby mirroring changes occurring on the master server. When your master server becomes unavailable due to any reason, you could turn this warm standby server into a production instance by simply running on standby machine a failover action as shown below. For this PostgreSQL on master must not be running, otherwise pitrtools will throw out a warning and refuse to failover. |
608 | 3 | Ivan Lezhnjov | |
609 | 20 | Ivan Lezhnjov | <pre>postgres@bitarena-clone:~$ pitrtools/bin/cmd_standby -C pitrtools/cmd_standby.ini -F999 |
610 | 3 | Ivan Lezhnjov | |
611 | NOTICE: check_pgpid_func() |
||
612 | 2012-10-16 02:56:20 PDT LOG: received fast shutdown request |
||
613 | 2012-10-16 02:56:20 PDT LOG: aborting any active transactions |
||
614 | waiting for server to shut down....2012-10-16 02:56:20 PDT LOG: shutting down |
||
615 | 2012-10-16 02:56:20 PDT LOG: database system is shut down |
||
616 | done |
||
617 | server stopped |
||
618 | server starting |
||
619 | NOTICE: Statistics are not replicated in warm standy mode. |
||
620 | HINT: Execute ANALYZE on your databases |
||
621 | postgres@bitarena-clone:~$ 2012-10-16 02:56:22 PDT LOG: database system was interrupted while in recovery at log time 2012-10-16 02:54:35 PDT |
||
622 | 2012-10-16 02:56:22 PDT HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target. |
||
623 | 2012-10-16 02:56:22 PDT LOG: starting archive recovery |
||
624 | 2012-10-16 02:56:22 PDT LOG: restore_command = 'cp /var/lib/postgresql/archive/%f "%p"' |
||
625 | cp: cannot stat `/var/lib/postgresql/archive/00000001.history': No such file or directory |
||
626 | 2012-10-16 02:56:23 PDT LOG: restored log file "000000010000000100000026" from archive |
||
627 | 2012-10-16 02:56:23 PDT LOG: automatic recovery in progress |
||
628 | 2012-10-16 02:56:23 PDT LOG: redo starts at 1/2635B428, consistency will be reached at 1/27000000 |
||
629 | cp: cannot stat `/var/lib/postgresql/archive/000000010000000100000027': No such file or directory |
||
630 | 2012-10-16 02:56:23 PDT LOG: could not open file "pg_xlog/000000010000000100000027" (log file 1, segment 39): No such file or directory |
||
631 | 2012-10-16 02:56:23 PDT LOG: redo done at 1/265AD744 |
||
632 | 2012-10-16 02:56:23 PDT LOG: last completed transaction was at log time 2012-10-16 02:55:07.387709-07 |
||
633 | 2012-10-16 02:56:24 PDT LOG: restored log file "000000010000000100000026" from archive |
||
634 | cp: cannot stat `/var/lib/postgresql/archive/00000002.history': No such file or directory |
||
635 | cp: cannot stat `/var/lib/postgresql/archive/00000003.history': No such file or directory |
||
636 | cp: cannot stat `/var/lib/postgresql/archive/00000004.history': No such file or directory |
||
637 | cp: cannot stat `/var/lib/postgresql/archive/00000005.history': No such file or directory |
||
638 | cp: cannot stat `/var/lib/postgresql/archive/00000006.history': No such file or directory |
||
639 | cp: cannot stat `/var/lib/postgresql/archive/00000007.history': No such file or directory |
||
640 | cp: cannot stat `/var/lib/postgresql/archive/00000008.history': No such file or directory |
||
641 | cp: cannot stat `/var/lib/postgresql/archive/00000009.history': No such file or directory |
||
642 | 2012-10-16 02:56:24 PDT LOG: selected new timeline ID: 9 |
||
643 | cp: cannot stat `/var/lib/postgresql/archive/00000001.history': No such file or directory |
||
644 | 2012-10-16 02:56:25 PDT LOG: archive recovery complete |
||
645 | 2012-10-16 02:56:26 PDT LOG: database system is ready to accept connections |
||
646 | 2012-10-16 02:56:26 PDT LOG: autovacuum launcher started |
||
647 | |||
648 | postgres@bitarena-clone:~$</pre> |
||
649 | |||
650 | This will create @recovery.conf@ file under @pg_data:@ directory and restart PostgreSQL to enter production mode of operation. |
||
651 | |||
652 | After this, you'd be basically running a copy of production master server. Keep in mind that you would also need to change IP addresses and/or load balancing configuration, routing, firewall rules or anything else that might be in the way of establishing a successful connection to this host. This is where @action_failover:@ script could come in handy. |
||
653 | |||
654 | Plan in advance, figure this all out to avoid any downtime before you will need to failover. |
||
655 | |||
656 | |||
657 | 20 | Ivan Lezhnjov | h1. Other things you could do with pitrtools and some tips |
658 | 3 | Ivan Lezhnjov | |
659 | 20 | Ivan Lezhnjov | This how-to is meant to help you get started with pitrtools. pitrtools can do more, though, than just help you configure standby. |
660 | 3 | Ivan Lezhnjov | |
661 | |||
662 | h2. Point-In-Time Recovery |
||
663 | |||
664 | 20 | Ivan Lezhnjov | <pre>postgres@bitarena-clone:~$ pitrtools/bin/cmd_standby -C pitrtools/cmd_standby.ini -F999 -R '2008-05-28 11:00:38.059389' |
665 | 3 | Ivan Lezhnjov | ...</pre> |
666 | |||
667 | This is essentially a restore to a specific point in time action. In general, you can only restore to a point in time while using cold standby, because PITR will "stop" recovering at the point in time you've specified, and both warm and hot standby servers have already recovered all the WAL files they can. In this regard, it would be a good idea to have a cold standby around for disaster (at logical level) recovery. |
||
668 | |||
669 | Once this has been done, you can't choose another timestamp to restore to. |
||
670 | |||
671 | h2. Entering Standby Mode After Failover On Slave |
||
672 | |||
673 | Suppose you failed over to your standby slave, which is running now as a replacement of master for your applications. You've fixed problems with the actual master and want this slave host to enter standby mode again. Here's how you'd do it: |
||
674 | |||
675 | 20 | Ivan Lezhnjov | <pre>postgres@bitarena-clone:~$ pitrtools/bin/cmd_standby -C pitrtools/cmd_standby.ini -Astop |
676 | 3 | Ivan Lezhnjov | ...</pre> |
677 | |||
678 | This will stop entire PostgreSQL service. You could also use PostgreSQL init script to achieve the same instead. If you need more fine-grained control use @pg_ctlcluser 8.4 main stop@ (see @man pg_ctlcluster@ for more details). Take a new base backup as before and enter standby mode: |
||
679 | |||
680 | 20 | Ivan Lezhnjov | <pre>postgres@bitarena-clone:~$ pitrtools/bin/cmd_standby -C pitrtools/cmd_standby.ini -B |
681 | 3 | Ivan Lezhnjov | ... |
682 | |||
683 | 20 | Ivan Lezhnjov | postgres@bitarena-clone:~$ pitrtools/bin/cmd_standby -C pitrtools/cmd_standby.ini -S |
684 | 3 | Ivan Lezhnjov | ...</pre> |
685 | |||
686 | Again, if you want a cold standby just don't run -S action after -B. |
||
687 | 4 | Ivan Lezhnjov | |
688 | h2. Alerts |
||
689 | |||
690 | 20 | Ivan Lezhnjov | Alerting is designed to run your custom scripts. You could easily integrate pitrtools alerting to your existing NMS be it Nagios, Zabbix, or anything else, send e-mails or take actions you decide. |
691 | 4 | Ivan Lezhnjov | |
692 | h2. Logging |
||
693 | |||
694 | It bears repeating -- just In case you overlooked this in standby sample configuration file notes -- if your @postgresql.conf@ doesn't specify a log file to write to and you don't use @logfile:@ parameter in @cmd_standby.ini@ the output will be directed to stdout (your console) and nothing will ever be written to a log file on disk. |
||
695 | |||
696 | If you restart PostgreSQL, that'll fix the problem, but you can avoid it in the first place by either specifying log file to write to in @postgresql.conf@ or by using @logfile:@ parameter in @cmd_standby.ini@. |
||
697 | |||
698 | h2. Troubleshooting |
||
699 | |||
700 | Set @debug:@ parameter in configuration files to @on@ and scrutinize the information. PostgreSQL log file is also a good place to look at. |
||
701 | |||
702 | h1. Getting Help |
||
703 | |||
704 | 20 | Ivan Lezhnjov | A very low-traffic mailing list for pitrtools can be found here http://lists.commandprompt.com/mailman/listinfo/pitrtools/ |
705 | 4 | Ivan Lezhnjov | There is also consulting available from "Command Prompt":https://commandprompt.com/contact/ |