Save essential data, without bothering with unnecessary values
Several methods allow you to sort what should and should not be saved, on condition that a simple principle is respected: always save everything that is necessary to survive an accident. And whatever the excess!
Telling your backup product which files and databases to back up may seem trivial, but doing so can have a massive impact on your data recovery ability. Choosing the right backup methods comes down to making sure that everything that needs to be backed up is actually backed up, while trying not to back up worthless data.
Include a physical server
Virtually all backup products require initial installation and configuration at a physical server level. This means that for any of the procedures in this article to work, you must first install the appropriate software and authorization on each physical server in the data center. It also means that every VMware or Hyper-V server (not to be confused with every VM on those servers), every UNIX or Windows physical server, and all cloud services are backed up. This initial connection and authentication is essential before the backup system can do what is expected of it.
Selective inclusion
The most common method of including files, objects, or databases in a backup system is to manually select them when setting up backups for that system. Here are three examples of selective incorporation:
– Click on the vCenter or Hyper-V control panel and manually select the VMs to back up;
– Manually select one or more databases from a list of all the databases;
– Manually select one or more file systems or sub-directories.
This method is the most commonly used because it sticks to most people’s way of thinking: backing up your data is like selecting what you want to back up. It also helps to minimize the amount of worthless backed up data, as very few people would choose a test VM or database, or file system like / tmp on UNIX. But selective inclusion doesn’t take into account what happens over time. If you only back up the systems you select manually, what happens when the configuration changes? For example, what happens if we add new VMs to a given VMware server? What happens if we move a given VM from VMware to Hyper-V or to the cloud? If the VM was manually selected in VMware, it will not be automatically backed up when its configuration changes. Usually, backup experts warn against this inclusive selection method because the risk of data loss is just too high.
Automatic inclusion
If a specific VM or database server is added to the backup configuration, another very common method is to simply request the backup of any VMs, databases, or file systems on it. This is the safest method of inclusion, as it ensures that every new data source will be backed up. It addresses the issue of selective inclusion, because virtual machines – or a virtual machine that has been moved from one type of configuration to another – will be automatically backed up without having to require someone to intervene. Some say that this method virtually guarantees the backup of worthless data. If in a sense this is true, this method also guarantees automatic backup of important data. The worst thing that can happen with selective inclusion is that a really important file system, database, or VM is not backed up. With automatic inclusion, the worst thing that can happen is saving the worthless data as well.
Selective exclusion
This technique is generally used in conjunction with an automatic inclusion system. A customer configures its backup systems to back up every VM, database, or file system, except those that are specifically flagged on an exclusion list. Selective exclusion is a bit like taking the butter and the money out of the butter, as it allows automatic inclusion to be used to ensure that all important data is backed up, while automatically excluding data with no known value. The procedure can be done in a user interface, where a customer clicks and manually selects drives or databases that they know have no value. An administrator trying to save space can add test databases or hard drives, or file systems like / tmp to the exclusion list to make sure the space is not wasted.
Another way to set up selective exclusion is to use wildcards or regular expressions to identify what should not be saved. For example, it is possible to specify * .tmp, * .bak, * .cache as exclusion patterns by joker: all files with these extensions would not be backed up. Those who are used to regular expressions can be very creative in excluding certain types of files no matter where they are located.
Tag-based inclusion
A very modern solution to include data in a backup is to use tags, which are quite common in the world of VMs. This not only makes it possible to back up only VMs with a certain tag, but also to specify how they should be backed up. For example, you can decide that virtual machines with a #database tag should be backed up with the database backup policy that will treat these virtual machines in a particular way. The same goes for VMs with hashtags like #fileserver, #test, etc. It is possible to create several types of backup policies and associate them with particular behaviors, and then apply these policies to different VMs via hashtags. This is an automatic inclusion variant, as any new VM will automatically be added to the appropriate hashtag-based backup policy. One can also continue to use the automatic exclusion system to ensure that the worthless data is not saved.
Default inclusion
Anytime you use automatic inclusion or tag inclusion, you also need some sort of “catch-all” mechanism. For example, if a VM or a database is not automatically selected by some type of hashtag or other mechanism, we will want to make sure that it is always backed up. The more smart systems like tag-based inclusion are used, the more important a default inclusion system becomes. If your backup system supports it, it works as follows: Any VM or database that is not already selected by an automatic policy or a policy based on tags will be backed up by that policy. Obviously, the policy will not be tailored to the needs of this particular system, but at least some backups are made. This allows that particular policy to be monitored to see if any systems are backed up using a default include system. If so, you may need to investigate the reasons and resolve the issue by putting them into the appropriate backup configuration mode.
We must always keep in mind this fundamental rule specific to the design of backup systems: you cannot restore what has not been backed up. No one was ever made redundant because they backed up too much data, but many people were made redundant because they didn’t back up enough data. Do your best to avoid unnecessary backups, but it’s better to err on the side of caution than the other way around. Better to worry about data that would not be backed up than worthless data that would have been backed up. This will prevent what many people call a “CV generator event”.