- "Do I Know This Already?" Quiz
- Foundation Topics
- Exam Preparation Tasks
Foundation Topics
Backup Strategy
As an administrator, it is your responsibility to develop a solid backup strategy. To create this strategy, you need to answer the following questions:
What needs to be backed up?—This is a critical question because it has an impact on the answers to the rest of the questions. While answering this question, you should consider breaking down your filesystem into smaller components to create a more efficient backup strategy.
How often?—Several factors come into play when answering this question. If you have broken down your filesystem into smaller components, you are really going to answer this question for each component, as the answer varies depending on what is being backed up.
Full or incremental?—A full backup is when everything is backed up, regardless of whether any changes have been made since the last backup. An incremental backup is when a backup is performed only on the files that have changed since a previous backup. Some backup utilities allow for complex backup strategies based on several different levels of incremental backups.
Where will the backup be stored?—Will you use tape devices, optical devices (CD-ROMS/DVDs), external storage devices (USB drives), or network-accessible storage locations? Each storage location has inherent advantages and disadvantages.
What backup tool will be used?—The decision that you make regarding the backup tool has a significant impact on the process of backing up and restoring data. Most Linux distributions come with several tools installed by default, such as the dd and tar commands. In many cases, additional tools are freely available; you just need to install them from the distribution repository. In addition to the tools that come with the distribution, you may want to consider exploring third-party tools, which typically offer more robust solutions.
What Needs to Be Backed Up?
One of the reasons why administrators tend to use multiple partitions (or logical volumes) when installing the operating system is that this lends to developing good backup strategies. Certain directories change more often than others. By making these separate filesystems, you can make use of filesystem features to perform the backup.
For example, it is normally a good idea to back up data not actively being modified. This can pose challenges when backing up users’ home directories. By making /home a separate partition, the partition can then be unmounted and a backup can be performed directly from the partitions. Even better: Make the /home filesystem on a logical volume and use LVM snapshots to create a “frozen” view of the filesystem in the /home directory. This allows users to continue to work on the filesystem while you back up the data.
This doesn’t mean that you will always make separate filesystems for each directory structure that you want to back up. In fact, in some cases, like the /etc directory, this isn’t even possible (/etc must be in the same filesystem as the / filesystem). However, whenever possible, it is generally a good idea to create separate filesystems for directory structures that you are incorporating in your backup strategy.
Table 18-2 Directories/Filesystems to Consider Including in Your Backup Strategy
Directory/Filesystem |
Why You Should Consider |
/home |
If your system has any regular users, this directory structure is certain to be a part of your backup strategy. On servers with no regular users, however, this directory is normally ignored when developing the backup strategy. |
/usr |
The /usr directory rarely changes as this is the location of most of the system’s commands, documentation, and programs. This directory structure normally only changes when new software is added to the system or when existing software is updated. Some administrators argue not to ever back up /usr because if something goes wrong, you can always just reinstall the software. The flaw in this reasoning is that few administrators keep a list of all the software installed on all the systems they administer. So, you should include this directory in your backup strategy. |
/bin |
If you back up the /usr directory, consider including the /bin directory as some of the operating system software is installed in this directory structure. |
/sbin |
If you back up the /usr directory, consider including the /sbin directory as some of the operating system software is installed in this directory structure. |
/opt |
If you have a lot of third-party software installed on your system, you may consider backing up this directory. This isn’t typically the case in most Linux distributions. |
/var |
The primary data stored in the /var directory structure includes log files, the incoming email queue, and the print queue. The print queue should not need backing up, but log files and the email queue may be important, depending on the function of the system. Typically this filesystem is backed up on servers, but often ignored on desktop systems. |
/boot |
The kernel is located in this directory structure. If you install a new kernel, consider backing up this directory structure. Typically it is not backed up on a regular basis. |
/lib and /lib64 |
If you back up the /usr directory, consider including the /lib and /lib64 directories as the operating system libraries are installed in these directory structures. As software is added to the system, new libraries are sometimes added as well. |
/etc |
This directory structure is often overlooked in the backup strategy, but it is also often the directory that changes most frequently. Regular system administration tasks, such as administering software configuration files and managing user/group accounts, result in changes in the /etc directory structure. On an active system, this directory should be backed up on a regular basis. Important note: The /etc directory must be a part of the / filesystem; it cannot be a separate filesystem. |
So, what directories/filesystems should you consider including in your backup strategy? Table 18-2 highlights the ones that are commonly part of a backup strategy.
Which directories/filesystems should you never back up? The following directories either are not stored on the hard drive or contain temporary information that never needs to be backed up:
/dev
/media
/mnt
/net
/proc
/srv
/sys
/var/tmp
How Often?
There is no exact rule that tells you how often to perform backups. To determine how often to perform backups, determine which directories/filesystems you are going to back up and then get an idea of how often data changes on each of them.
Based on your observations, you should be able to determine how often to perform backups. It will likely be a different schedule for different directories, and you also need to consider how often to perform full versus incremental backups.
Full or Incremental?
Not all software tools provide the flexibility to perform incremental backups. But if you are using one that does provide this feature, consider including it in your backup strategy.
If the backup tool does provide incremental backups, there are probably several different levels available. For example:
A level 0 backup would be a full backup.
A level 1 backup would back up all files that have changed since the last lower backup (level 0).
A level 2 backup would back up all files that have changed since the last lower backup (level 0 or 1).
Typically these incremental backups would include the values 1–9. So level 9 backup would back up all files that have changed since the last lower backup (which could be level 0, level 1, level 2, etc.).
To better understand incremental backups, first look at Figure 18-1.
Figure 18-1 Backup strategy #1
The strategy in Figure 18-1 demonstrates a four-week backup period. Every four weeks this cycle repeats. On the first day of the period, a full (level 0) backup is performed. The next day, Monday, a level 2 backup is performed. This backs up everything that changed since the last lower number backup (level 0), essentially one day’s worth of changes.
On Tuesday, a level 3 backup is performed. This backs up everything that has changed since the last lower number backup, the level 2 performed on Monday. Each day during the week, a backup is performed that backs up the last 24 hours of changes to the directory/filesystem.
The following Sunday, a level 1 backup is performed. This backs up all changes since the last lower backup, the level 0 performed at the beginning of the cycle. Essentially, this backs up a week’s worth of changes.
The advantage of this backup plan is that the backups each night take comparatively little time. Sunday’s backups take longer each week, but the rest of the week is a relatively small backup.
The disadvantage of this backup plan is in the recovery. If the filesystem must be restored because the data was lost on Friday of the third week, then the following restores must be performed in order:
The level 0 backup
The level 1 backup performed on Sunday of week 3
The level 2 backup performed on Monday of week 3
The level 3 backup performed on Tuesday of week 3
The level 4 backup performed on Wednesday of week 3
The level 5 backup performed on Thursday of week 3
Now compare the previous backup strategy from Figure 18-1 with the backup strategy in Figure 18-2.
Figure 18-2 Backup strategy #2
With the backup strategy in Figure 18-2, you also perform a full backup on the first day of the cycle. The backups performed Monday through Saturday back up all files that have changed since Sunday. The backup performed on the following Sunday includes all files that have changed since the first backup of the cycle.
The disadvantage of this method is each backup takes more time as the week progresses. The advantage is the recovery process is easier and quicker. If the filesystem must be restored because the data was lost on Friday of the third week, then the following restores must be performed in order:
The level 0 backup
The level 1 backup performed on Sunday of week 3
The level 5 backup performed on Thursday of week 3
There are many other backup strategies, including the famous Tower of Hanoi, which is based on a mathematical puzzle game. The important thing to remember is that you should research the different methods and find the one that is right for your situation.
Where Will the Backup Be Stored?
There are four primary locations where you can store backup data. Table 18-3 describes each and provides some of the advantages and disadvantages that you should consider.
Table 18-3 Backup Storage Locations
Location |
Advantage |
Disadvantage |
Tape |
Low cost Medium shelf life |
Slow Requires special hardware Requires a lot of maintenance |
Disk |
Fast Easily available |
Not portable |
Remote |
Normally easily available Easy to have data secured offsite |
Depends on network access Could be expensive Could be slow |
Optical media |
Decent speed Low cost Hardware easy to obtain and affordable |
Low storage capacity Most often “once write,” can’t be reused |
What Backup Tool Will Be Used?
The rest of this chapter explores different backup tools. The following tools are explored as they are all LPIC-2 exam objectives:
dd
tar
rsync
Amanda
Bacula
BackupPC
In addition to these backup tools, you should be aware of a few other tools used for creating and restoring files:
dump/restore—Not used as often as in the past, these tools were designed to back up and restore entire filesystems. They do support both full and incremental backups, which makes them one of the few standard backup tools that have this feature.
cpio—Similar to the tar command, the cpio command can be used to merge files from multiple locations into a single archive.
gzip/gunzip—While the gzip command doesn’t provide an essential feature that you want a backup tool to provide (namely, it doesn’t merge files together), it does compress files. As a result, it could be used to compress a backup file.
bzip2/bunzip2—While the bgzip2 command doesn’t provide an essential feature that you want a backup tool to provide (namely, it doesn’t merge files together), it does compress files. As a result, it could be used to compress a backup file.
zip/unzip—An advantage of this tool is not only does it merge files together and compress them, but it uses a standard compression technique used on multiple operating systems, including many non-Linux operating systems.
Standard Backup Utilities
These utilities are considered standard as you can expect them to be on just about every distribution of Linux. The advantage of this is that not only can you use the tools to perform a backup on just about every system, but even more importantly, you can view and restore the backups on just about every system. It is frustrating and time-consuming to deal with an esoteric backup file that you lack the software for to even determine what is in the backup.
The dd Command
The dd command is useful to back up entire devices, either entire hard disks, individual partitions, or logical volumes. For example, to back up an entire hard disk to a second hard disk, execute a command like the following:
[root@localhost ~]# dd if=/dev/sda of=/dev/sdb
The if option is used to specify the input device. The of option is used to specify the output device. Make sure when you execute this command that the /dev/sdb hard disk is at least as large as the /dev/sda hard disk.
What if you don’t have a spare hard disk, but you have enough room on a device (such as an external USB hard disk)? In this case, place the output into an image file:
[root@localhost ~]# dd if=/dev/sda of=/mnt/hda.img
You can also use the dd command to back up the contents of a CD-ROM or DVD into an ISO image:
[root@localhost ~]# dd if=/dev/cdrom of=cdrom.iso
The ISO image file can be used to create more CD-ROMs. Or it can be shared via the network to make the contents of the CD-ROM easily available (rather than passing the CD-ROM disc around the office).
It is also helpful to know that both image and ISO files can be treated as regular filesystems in the sense they can be mounted and explored:
[root@localhost ~]# mkdir /test
[root@localhost ~]# mount -o loop /mnt/had.img /test
One of the advantages of the dd command is that it can back up anything on the hard disk, not just files and directories. For example at the beginning of each disk is an area called the MBR (master boot record). For the boot disk the MBR contains the boot loader (GRUB) and a copy of the partition table. It can be useful to have a backup of this data:
[root@localhost ~]# dd if=/dev/sda of=/root/mbr.img bs=512 count=1
The bs option indicates the block size, and the count indicates how many blocks to back up. The values of 512 and 1 make sense because the MBR size is 512 bytes.
I would suggest storing the MBR image on an external device. If the system fails to boot because of a corrupted MBR, you can boot off a recovery CD and restore the MBR with a single command:
[root@localhost ~]# dd if=mbr.img of=/dev/sda
The tar Command
The tape archive command was originally designed to back up filesystems to tape devices. While many people now use the tar command to back up to nontape devices, you should be aware of how to use tape devices as well.
Tape device names in Linux follow the /dev/st* and /dev/nst* convention. The first tape device name is assigned the device name of /dev/st0, and the second tape device is accessible via the /dev/st1 device name.
The name /dev/nst0 also refers to the first tape device, but it sends a no rewind signal to the tape device. This is important for when you need to write multiple volumes to the tape. The default behavior of the tape drive is to automatically rewind when the backup is complete. If you wrote another backup to the same tape, you would end up overwriting the first backup unless you used the /dev/nst0 device name when performing the first backup.
If you are working with tape devices, you should be aware of the mt command. This command is designed to allow you to directly manipulate the tape devices, including moving from one volume to another and deleting the contents of a tape. Some common examples:
[root@localhost ~]# mt -f /dev/nst0 fsf 1 #skip forward one file
(AKA, volume)
[root@localhost ~]# mt -f /dev/st0 rewind #rewinds the tape
[root@localhost ~]# mt -f /dev/st0 status #prints information about
tape device
[root@localhost ~]# mt -f /dev/st0 erase #erases tape in tape
drive
To create a backup (AKA, a tar ball) with the tar utility, use the -c (create) option in conjunction with the -f (filename) option:
[root@localhost ~]# tar -cf /tmp/xinet.tar /etc/xinetd.d
tar: Removing leading '/' from member names
The leading / characters are removed from the filenames, so instead of backing up absolute pathnames, the pathnames are relative. This makes it easier to specify where the files are restored. Having the leading / would result in files always being stored in the exact same location.
To see the contents of a tar ball, use the -t (table of contents) option in conjunction with the -f option, as shown in Example 18-1.
Example 18-1 Contents of a tar Ball Using tar -tf
[root@localhost ~]# tar -tf /tmp/xinet.tar
etc/xinetd.d/
etc/xinetd.d/rsync
etc/xinetd.d/discard-stream
etc/xinetd.d/discard-dgram
etc/xinetd.d/time-dgram
etc/xinetd.d/echo-dgram
etc/xinetd.d/daytime-stream
etc/xinetd.d/chargen-stream
etc/xinetd.d/daytime-dgram
etc/xinetd.d/chargen-dgram
etc/xinetd.d/time-stream
etc/xinetd.d/telnet
etc/xinetd.d/echo-stream
etc/xinetd.d/tcpmux-server
You often want to see detailed information when listing the contents of the tar ball. Include the -v (verbose) option to see additional information, as shown in Example 18-2.
Example 18-2 The –v Option to See Details of the tar Ball
[root@localhost ~]# tar -tvf /tmp/xinet.tar
drwxr-xr-x root/root 0 2015-11-02 11:52 etc/xinetd.d/
-rw-r--r-- root/root 332 2014-03-28 03:54 etc/xinetd.d/rsync
-rw------- root/root 1159 2013-10-07 10:35 etc/xinetd.d/discard-stream
-rw------- root/root 1157 2013-10-07 10:35 etc/xinetd.d/discard-dgram
-rw------- root/root 1149 2013-10-07 10:35 etc/xinetd.d/time-dgram
-rw------- root/root 1148 2013-10-07 10:35 etc/xinetd.d/echo-dgram
-rw------- root/root 1159 2013-10-07 10:35 etc/xinetd.d/daytime-stream
-rw------- root/root 1159 2013-10-07 10:35 etc/xinetd.d/chargen-stream
-rw------- root/root 1157 2013-10-07 10:35 etc/xinetd.d/daytime-dgram
-rw------- root/root 1157 2013-10-07 10:35 etc/xinetd.d/chargen-dgram
-rw------- root/root 1150 2013-10-07 10:35 etc/xinetd.d/time-stream
-rw------- root/root 302 2015-11-02 11:52 etc/xinetd.d/telnet
-rw------- root/root 1150 2013-10-07 10:35 etc/xinetd.d/echo-stream
-rw------- root/root 1212 2013-10-07 10:35 etc/xinetd.d/tcpmux-server
To extract all the contents of the tar ball into the current directory, use the -x (extract) option in conjunction with the -f option, as shown in Example 18-3.
Example 18-3 Using tar –xf for Extracting Contents from the tar Ball
[root@localhost ~]# cd /tmp
[root@localhost tmp]# tar -xf xinet.tar
[root@localhost tmp]# ls
backup pulse-iqQ3aLCZD30z virtual-root.MLN2pc virtual-root.zAkrYZ
etc pulse-lZAnjZ6xlqVu virtual-root.o6Mepr xinet.tar
keyring-9D6mpL source virtual-root.vtPUaj zip-3.0-1.el6.src.rpm
orbit-gdm virtual-root.7AHBKz virtual-root.y6Q4gw
orbit-root virtual-root.EaUiye virtual-root.Ye1rtc
[root@localhost tmp]# ls etc
xinetd.d
[root@localhost tmp]# ls etc/xinetd.d
chargen-dgram daytime-stream echo-dgram tcpmux-server time-stream
chargen-stream discard-dgram echo-stream telnet
daytime-dgram discard-stream rsync time-dgram
Suppose your tar ball contains thousands of files and you only need a few files. You can list the filenames at the end of the tar command to perform this partial restore:
[root@localhost tmp]# tar -xf xinet.tar etc/xinetd.d/rsync
[root@localhost tmp]# ls etc/xinetd.d
rsync
There are many options to the tar command; consult Table 18-4 to learn about some of the more useful options (including those already covered, which are listed in bold).
Table 18-4 Useful tar Options
Option |
Description |
-A |
Append to an existing tar ball. |
-c |
Create a tar ball. |
-C |
Set the current directory. |
-d |
Display the difference between an existing tar ball and what is currently on the filesystem. |
--delete |
Delete files from tar ball; not possible on tapes. |
-j |
Compress tar ball with the bzip2 command. |
-t |
List the table of contents of the tar ball. |
-x |
Extract the contents of the tar ball. |
-z |
Compress tar ball with the gzip command. |
-W |
Attempt to verify after writing. Note: One of the objectives on the exam is to verify the integrity of backup files, so you may be asked a question regarding this option. |
The rsync Command
The rsync command provides a different set of backup features than those provided by the tar and dd commands. It is designed to back up files to a remote system. It can communicate via SSH, making the backup process secure. Additionally, it only backs up files that have changed since the last backup.
For example, the command shown in Example 18-4 performs a recursive backup of the /etc/xinetd.d directory to the /backup directory of the server1 machine.
Example 18-4 The rsync Command
[root@localhost ~]# rsync -av -e ssh /etc/xinetd.d server1:/backup
root@server1's password:
sending incremental file list
xinetd.d/
xinetd.d/chargen-dgram
xinetd.d/chargen-stream
xinetd.d/daytime-dgram
xinetd.d/daytime-stream
xinetd.d/discard-dgram
xinetd.d/discard-stream
xinetd.d/echo-dgram
xinetd.d/echo-stream
xinetd.d/rsync
xinetd.d/tcpmux-server
xinetd.d/telnet
xinetd.d/time-dgram
xinetd.d/time-stream
sent 14235 bytes received 263 bytes 1159.84 bytes/sec
total size is 13391 speedup is 0.92
The options used from the previous command: -v = verbose, -a = archive, -e ssh = execute via ssh. The first argument is what to copy, and the second argument is where to copy it.
Suppose a change takes place to one of the files in the /etc/xinetd.d directory:
[root@localhost ~]# chkconfig telnet off #changes /etc/xinetd.d/telent
Note that when the rsync command is executed again, only the modified file is transferred:
[root@localhost ~]# rsync -av -e ssh /etc/xinetd.d server1:/backup
root@server1's password:
sending incremental file list
xinetd.d/
xinetd.d/telnet
sent 631 bytes received 41 bytes 192.00 bytes/sec
total size is 13392 speedup is 19.93
Third-Party Backup Utilities
Many third-party backup utilities are available for Linux. If you are studying for the LPIC-2 certification exam, you should realize that the exam objective states “Awareness of network backup solutions such as Amanda, Bacula, and BackupPC.” This means you should understand what these solutions provide, but don’t need to know any details.
Amanda
The Advanced Maryland Automatic Network Disk Archiver (AMANDA) is an open source software tool popular on both UNIX and Linux distributions. While there is a freely available community version, there is also an enterprise version that provides support (for a fee, of course).
Amanda provides a scheduler, making it easier for a system administrator to automate the backup process. It also supports writing to either tape device or hard disk.
Bacula
Bacula is an open source product that supports clients from different platforms, including Linux, Microsoft Windows, OS X, and UNIX. One of the compelling features of Bacula is the capability to automate backup, freeing the system administrator from this routine task.
Configuration of Bacula on the server side can be accomplished via a web interface, GUI-based tools, or command line tools.
One disadvantage of Bacula is that the format of the backup data is not compatible with other backup formats, such as the tar command’s format. This makes it difficult to deal with the backup data unless you have the Bacula tools installed on the system.
BackupPC
The BackupPC software provides a disk-to-disk solution that includes a web-based front end. Because it runs through a web interface, no client software needs to be installed. The server software provides the web interface to perform the backup.
Another advantage of BackupPC is that the server runs on many different Linux distributions as well as on several UNIX systems. The software also supports several standard protocols to transfer the data, including NFS, SSH, rsync, and SMB (Server Message Blocks, a Microsoft Windows protocol). This provides you with flexibility in backing up data from different client systems.
