Games

Development of a backup strategy. Data Integrity Methods and Tools Overview of Backup Systems

The book is intended for readers who are familiar with computer systems and the industry. information technologies and those who want to expand their knowledge of storage systems and the architecture of Windows NT, directly related to such systems. The book focuses on enterprise storage systems, while less attention is paid to consumer-level systems. This publication attempts to support the interests of software professionals with little knowledge of storage technologies and storage professionals who seek additional knowledge of the Windows NT storage and processing architecture. At the same time, the book will be of interest to all readers who intend to obtain comprehensive information on the topic described.

Book:

Sections on this page:

There are various backup schemes that are used, for example, in a data center. It is worth noting that different backup categories can be used together. Backup is classified as follows:

architecture based;

functionality based;

based on network infrastructure.

Let's consider each type of classification in more detail.

5.3.1 Architecture-based backup classification

One type of backup classification is based on architecture. The backup depends on the objects to which it is applied, and on how much the backup application supports such objects. The available architectural backup types are described in sections 5.3.1.1–5.3.1.3.

5.3.1.1 Backup at the level of disk images and logical blocks

In this case, the backup application works with blocks of data. Typically, such a backup scheme requires the termination of access to the copied data from all applications on the server. The application accesses the hard disk regardless of its internal structure, and then performs read/write operations at the logical block level.>

The advantage of this type of backup is the speed of backup and restore operations, which is especially important for recovering data after critical system failures. The disadvantage is that there is a ban on access to the disk by applications and even the operating system. Another disadvantage is copying an excessive amount of unused logical blocks from the backup when backing up a disk with allowed files. Some backup applications provide the appropriate programming logic needed to detect and skip unused logical blocks. These backups are called sparse copies of the disk image.

Finally, it is quite difficult to retrieve only a specific file or several files, unlike restoring all the data on a disk. To do this, the backup software must process the file system metadata stored on the tape and calculate the location on the tape of the required file. Some programs allow you to restore certain files from an image-level backup, but only for some operating systems. Other applications try to optimize file recovery from an image-level backup by writing file metadata to tape, such as a file location table for the FAT16 file system.

The version of NTFS that ships with Windows 2000 already contains all the metadata in the files, such as a bitmap that corresponds to the location of logical blocks. The data recovery program finds the necessary metadata, from which it calculates the location on the tape of each necessary logical block of the required file. After that, the tape is scrolled in one direction and all the necessary sections are read during the rewind process, which allows you to get all the data for file recovery. The tape does not rewind in both directions, so not only is the recovery time shortened, but the life of the tape is also reduced. The described backup applications include, for example, Legato Celestra.

Please note that sometimes the choice of backup method is limited. If the database is using blank disk A new volume without a file system, you only have to choose between an image-level backup and an application-level backup (this type of backup is discussed in section 5.3.1.3).

5.3.1.2 File level backup

In this type of backup, the backup program uses the services of the operating and file systems. One advantage is the efficiency of restoring a specific file or set of files. Another advantage is that the files can be accessed simultaneously by the operating system and applications when backing up.

Not without its drawbacks, however. Backups take longer, especially when compared to image-level backups. If a copy is being made a large number small files, the load on the operating system and file system when accessing directory metadata can be significant. In addition, there is the problem of open files, which was described earlier.

Another disadvantage is related to security. This problem occurs regardless of the backup method (image level or file level) and is due to the fact that the backup is performed under the rights of the administrator or backup operator account, and not the user. This is the only way to recover files from different users in a single restore operation. A prerequisite is that file metadata, such as access control lists and file ownership data, are correctly configured. Solving the problem requires support from the file and operating system APIs, which is necessary to set up metadata when restoring data from a backup. In addition, the backup and restore application must correctly use the provided capabilities.

5.3.1.3 Application level backup

In this case, data backup and restore is performed at the application level, such as Microsoft SQL Server or Microsoft Exchange.. The backup is done using the API provided by the application. In this case, the backup consists of a set of files and objects that form the state of the system at a certain point in time. The main problem is that the backup and restore operations are tightly coupled to the application. If the API or functionality of an existing API changes with the release of a new application, the administrator will have to go to new version reservation programs.

Applications use a blank disk without a file system, or write a huge file to it that hosts the application's own metadata. An example of such an application is Microsoft Exchange. On Windows XP and Windows Server 2003 supports important NTFS features that make recovering such files possible. The file is restored in logical blocks and marked at the end new feature Win32 API called SetFileValidData.

5.3.2 Classification of backup based on functionality

Another method of classifying backup applications is to classify them based on the features provided by the backup process. Please note that data centers typically use at least two, and most often all of the types of redundancy described below, namely: full, differential, and incremental.

5.3.2.1 Full backup

At full backup(full backup) full set files or objects and their associated metadata are copied to the backup media. The advantage is that only one set of recovery media is used in the event of a system failure. The disadvantage is the copy time, as all data is copied. Full backups are often performed at the disk image level or at the block level.

5.3.2.2 Differential backup

At differential backup(differential backup) are archived all changes that have occurred since the last full backup. Since differential backups can be created at the image level or at the file level, this set of changes will be the set of changed disk blocks (for an image level backup) or the set of changed files (for a file level backup). The main advantage of differential backup is the significant reduction in backup time compared to full backup. On the other hand, crash recovery takes longer. Failover recovery will require two data recovery operations. The first one will restore data from a full backup, while the second one will restore data from a differential backup.

With low-cost storage subsystems, file-level differential backup is used when applications create many small files and modify some files after a full backup is taken. At the same time, such backup does not apply if HDD used by database management applications that constantly make small changes to huge database files. Thus, a file-level backup will create a copy of the entire file. An example of such a program is Microsoft Exchange, which constantly strives to make small changes to huge database files.

With older storage subsystems, image-level differential backup can be used in any situation, including backing up database application files. The reason for this efficiency is the storage of a large amount of metadata, which allows you to quickly determine what has changed since the backup. disk blocks. Thus, only changed disk blocks will be backed up, and a large number of disk blocks that have not changed will not be copied. Even though backup performance is better with older storage subsystems, there is still a need for an API that allows you to start a backup at a point in time and continue I/O after the backup is complete. The way the older model of the storage subsystem works is to reduce the data I/O that must be stopped during backup.

5.3.2.3 Incremental backup

At incremental backup(incremental backup) are archived only changes since the last full or differential backup. Obviously, this type of backup takes less time, because files that have not changed since the last full or incremental backup are not copied to the backup media. The disadvantage of this method is the length of the failover operation, as it is performed using a set of multiple media corresponding to the latest full backup and multiple incremental backups.

In the absence of older models of the storage subsystem, incremental backups are performed when changing or adding different sets files. With older storage subsystem models, block-based incremental backups can be used because enough metadata is available to identify blocks that have changed.

5.3.3 Classification of backup based on network infrastructure

One way to classify backups is based on network topology and its impact on choosing the best method for redundant connected nodes. Backup types dependent on network infrastructure (DAS, NAS, SAN backup independent of local network and from the server) are discussed in sections 5.3.3.1 to 5.3.3.4.

5.3.3.1 DAS redundancy

This oldest type of backup originated in the days when storage devices were connected directly to the server. Despite the development of network storage devices, DAS redundancy remains popular enough to copy data hosted on Windows servers. The DAS redundancy scheme is shown in fig. 5.3. / The advantage of DAS redundancy is its ease of use. The application on the server reads the data from the corresponding disk volume and writes it to the tape. However, DAS redundancy has a number of disadvantages.

Using multiple tape drives (one for each server that needs to be backed up), which is costly. In other words, sharing a single drive across multiple servers is next to impossible.

High Total Cost of Ownership (TCO) because multi-tape backup requires multiple administrators.

Storing multiple tapes can be confusing.

Because data across multiple servers is often duplicated but out of sync, the same data is migrated to tape as well, so storing similar data across multiple tapes can be confusing.

Rice. 5.3. DAS Redundancy

Last but not least, the server must process read/write data requests between the disk and the tape drive.

5.3.3.2 NAS redundancy

As noted in Chapter 3, the era of DAS storage ended with the advent of client/server systems, where clients and servers began to share LAN resources. This allowed for an architecture in which a tape drive connected to a server is accessed by multiple network servers.

On fig. Figure 5.4 shows a typical NAS redundancy scenario. The left area of the diagram shows multiple servers. These can be application servers or file and print servers. AT right area there is a backup server and a tape drive connected to it. This drive can be used to back up information from multiple application servers, file servers, and print servers. Thus, NAS redundancy allows you to share your tape drive to back up data across multiple servers, resulting in lower overall costs.

NAS redundancy has some disadvantages.

The backup operation is reflected in bandwidth local area network, which often requires LAN segmentation to redirect backup streams to a separate network segment.

The running time of the nodes is increasing. In other words, it increases the time during which servers must be available to serve user requests and transactions. In addition, the amount of data stored on the server increases, which requires more time to back up this data.

Rice. 5.4. NAS redundancy scheme

Given the urgency of the described problems, ensuring the effectiveness of backup becomes the only criterion when designing networks and determining the exact number of backup devices required.

5.3.3.3 SAN redundancy

The development of storage networks has led to the emergence of new concepts of backup. The new capabilities are based on the fact that the storage area network can provide sufficient bandwidth between any two devices and, depending on the topology, is able to provide simultaneous low-latency communication between several pairs of devices. On the other hand, using a Fiber Channel ring topology with more than 30 devices does not provide the ability to create multiple connections with high throughput and low latency, since the total ring bandwidth will be shared among all connected devices.

On fig. Figure 5.5 shows the architecture of a typical backup SAN application. Notice the Fiber Channel Bridge. Most tape drives do not support Fiber Channel (they use a parallel SCSI interface), so you will need a bridge to connect such devices. On fig. 5.5 Windows NT servers are connected to a local network and a storage area network at the same time.

The backup topology (see Figure 5.5) has a number of advantages.

The tape drive may be quite far from the server whose data is being backed up. Such drives are usually equipped with a SCSI interface, although recently Fiber Channel drives have become more common. This means that they can only be connected to a single SCSI bus, making it difficult for multiple servers to share the drive. Fiber Channel SANs, with multi-device support, can successfully solve sharing problems. Note that this still requires a method to correctly access the tape drive using the appropriate permissions. Examples of such methods are presented below.

Rice. 5.5. Backup via network storage

The zoning method allows a single server to access a tape drive at a given time. The problem is to ensure that servers comply with zoning requirements. In addition, you must ensure that a tape changer or multi-cassette drive is used correctly.

The next method is to use SCSI interface commands such as reserve and release.

The method of connecting a tape drive to a server allows you to share the device through a special software server. Sharing a tape drive is a very attractive solution because tape drives are quite expensive. The drives described include, for example, the Tivoli device from IBM.

The LAN-free backup technology got its name because the data transfer is performed outside the LAN by means of a SAN. This reduces the load on the local network, so that applications do not suffer from network bandwidth degradation when data is backed up.

Network-less backup enables more efficient use of resources by sharing tape drives.

Backing up and restoring data without a local network is more resilient to errors, since backups can be performed by several devices at the same time if one device fails. Similarly, multiple devices can be used in data recovery, allowing for more efficient resource planning.

Finally, backup and restore operations complete much faster because SANs provide faster data transfer rates.

5.3.3.4 Server independent redundancy

This backup is sometimes referred to as serverless backup or even third party copying. Note that a server-agnostic backup is usually a LAN-agnostic backup, eliminating the need to move data from a specific host. The idea behind this backup method is to use the SCSI Extended Copy command.

Server-independent backup is based on an initiative of the SNIA, which was implemented in the SCSI Extended Copy commands approved by the INCITS committee, more specifically, the technical subcommittee T10 (document ANSI INCITS.351:2001, SCSI Primary Commands-2). Note that the SCSI standard already described support for copy commands, but previously the commands required all SCSI devices to be connected to the same bus (the Copy command has since been deprecated; see http://www.110. org). The Extended Copy command adds such additional features like using the source and destination of data through different SCSI buses. This fully preserves the addressing supported by the command syntax.

In a server-agnostic backup, the backup server can process other requests while the data is being copied using the data movement agent. The data is transferred directly from the data source to the destination, namely to the backup media (instead of being copied from the source to the backup server and then transferred to the backup media).

Rice. 5.6. Server independent backup

While realizing the benefits of server-agnostic backup, we should not forget that data recovery is a completely different problem. Server-independent restore operations remain extremely rare. Backups created using this technology are very often restored using traditional methods, which involve using a server with some kind of backup and recovery software.

The concept of server-independent backup is illustrated in Figure 1. 5.6. To simplify the diagram, the figure shows the minimum number of components required to illustrate backup. In practice, storage networks have a more complex structure. On fig. 5.6 shows a server under Windows control connected to a Fiber Channel switch using a Fiber Channel HBA. It also uses a Fiber Channel-K-SCSI router that connects to a SCSI tape drive and disk devices. Disk and tape devices do not need to be connected to the same router.

The media server application on the Windows server locates the Data Movement Agent on the router using Plug and Play technology. The backup application defines additional information about the backup (identifier disk device, initial logical block, amount of data to be copied, etc.). The backup server software initially sends a sequence of commands to the tape drive to backup the device and mount the required media. Next, the media server software sends the command Extended Sora to the data movement agent that runs on the router. The agent coordinates the transfer of the required data. When the copy is complete, the agent returns service information to the backup program running on the Windows server.

Several components play an important role in the server-independent backup process, including the data source and destination, the move agent, and the backup server.

Data source is the device that contains the data to be backed up. Typically, an entire volume or disk partition is backed up. The data source must be accessed directly by the data movement agent (more on that later). This means that server-attached storage devices cannot be a server-independent backup source, as direct addressing outside the server is not possible.

Data Destination usually a tape drive to which data is written. The device can be a disk if you are backing up to disk instead of tape. Tape devices are usually connected to a port in the fabric architecture to avoid corrupting the data being transferred to the tape if other parts of the SAN fail. For example, if a tape drive is connected to a split Fiber Channel ring, an error in the operation of another device or device connection/disconnection from the ring can cause the data write to stop and the ring to be reinitialized, thus violating the integrity of the data being written to the tape.

Data Movement Agent usually built into the router via firmware as it must process the SCSI command Extended Sora, which is sent to the router as a Fiber Channel packet. Switches and hubs that only process the Fiber Channel frame header are not well suited to support the Data Movement Agent, but this may change in the future.

The data mover agent is activated after receiving instructions from the media server. Most tape drives attached to a SAN are SCSI devices. Therefore, a router is required that supports packet translation between Fiber Channel and SCSI interfaces. On the this moment Fiber Channel tape drives are becoming more common, and some companies, such as Exabyte, provide firmware for tape drives that add data movement agent functionality. In addition, the core Fiber Channel tape drive libraries typically have built-in Fiber Channel-SCSI routers, allowing the library to use its own data movement agent. Note that the agent may be implemented in junior software workstation or even servers. Crossroads, Pathlight (now ADIC), and Chaparral provide routers with data movers built into the firmware. A SAN can have multiple agents from multiple vendors, which does not prevent agents from coexisting on the same network.

Of course, in order for the data mover agent to be usable, it must be found (using the SCSI command Report LUNs) and provide proper addressing (via WWN) from the backup server. In addition, the agent can run two backups at the same time. For example, one copy session can be conducted to a geographically distant mirror resource, however, for this, the backup server must send two commands.

The backup server is responsible for all commands and operations management. Let's list once again all the main responsibilities of the redundancy server.

The server software makes the tape drive available using the appropriate SCSI commands reserve and release.

Mounting media for backup.

Determination of the exact address of the data source and placement of data in logical blocks, as well as the amount of data to be backed up.

Having received all the necessary information, the server sends a command Extended Soru to the data movement agent. The agent then sends a sequence of commands read data source and writes the information to the destination.

Computer Associates, CommVault, LEGATO, and VERITAS provide programs for server-independent redundancy. Suppliers of routers with server-independent backup features are constantly collaborating with software companies to make their products compatible. Case in that to support basic commands SCSI Extended Copy Manufacturers use different commands.

Please note that despite the fairly mature age of server-independent backup technology, server-independent recovery support from vendors is extremely limited.

5.3.3.5 Windows Server family of operating systems and server-independent backup

Numerous promotional materials and marketing literature claim that a particular method of implementing server-independent backup technology is compatible with Windows 2000. Let's look at this concept in more detail. The following describes each of the four components that make up server-independent backup: source data, data destination, redundancy server software, and data movement agent.

In most cases, a data mover agent that runs outside of a Windows NT server cannot address data stored on a Windows NT server. HBAs connected to a Windows NT server typically act as initiators and do not respond to commands Report LUNs. If the Windows NT server uses a storage device outside the server, such as a RAID array attached to a Fiber Channel switch, then that device will be available to the move agent. So instead of saying that a storage device used by Windows NT cannot be a data source for a server-independent backup, it should be clarified that the data source cannot be a storage device that is internal to a Windows NT server.

Using the Windows NT internal storage as a data destination is also not possible because the destination must also be available to the data mover agent for addressing.

Running the backup program on a Windows computer is a good option. A bus adapter connected to a Windows server can issue a sequence of commands report LUNs to each device (LUN 0) to be discovered. The backup program then looks at all visible devices and logical units, and then finds out which of them can act as a third-party copy agent. Some programs report additional LUNs that are needed when issuing commands Extended Sora. Many backup programs that use additional LUNs go through a device discovery process to test the functionality of the data mover agent.

The SCSI Intermediate Interface (IOCTL) in Windows NT can be used to send a command Extended Send to the data movement agent (the command is sent from a backup server running Windows NT). The Windows NT operating system does not have built-in support for move agents; Plug dnd Play technology allows you to detect an agent, but to register the latter in system registry additional drivers are needed.

One final question remains: Can you run the Data Movement Agent software on a server or workstation running Windows NT? One benefit of this solution is that the move agent will be able to address and access storage devices "visible" to the Windows server. However, a media server hosted outside of Windows NT will not be able to detect storage devices attached to a computer running a Data Movement Agent. The agent must be able to act as an initiator and target device for SCSI commands. Because a HBA connected to a Windows NT computer rarely acts as a target device, the Extended CoPy command may not reach the data mover agent.

Note that on Windows NT, applications use an intermediate interface to issue SCSI commands. (DeviceloControl with parameter IoControlCode, equal IOCTOL_SCSI_PASS_THROUGH or IOCTL_SCSI_PASS_THROUGH_DIRECT).

ALEXEY BEREZHNY, System Administrator. Main activities: virtualization and heterogeneous networks. Another hobby besides writing articles is popularizing free software.

Backup
Theory and practice. Summary

To organize the backup system most effectively, you need to build a real strategy for saving and restoring information.

Backup (or, as it is also called, backup - from the English word “backup”) is an important process in the life of any IT structure. This is a parachute for rescue in case of an unforeseen catastrophe. At the same time, backup is used to create a kind of historical archive of the company's business activities over a certain period of its life. Working without a backup is like living in the open air - the weather can deteriorate at any moment, and there is nowhere to hide. But how to properly organize it so as not to lose important data and not spend fantastic sums on it?

Usually, articles on the topic of organizing backups deal mainly with technical solutions, and only occasionally pay attention to the theory and methods of organizing data storage.

In this article, we will focus on just the opposite: the main attention is paid to general concepts, and technical means will be touched on as examples only. This will allow us to abstract from hardware and software and answer two main questions: “Why are we doing this?”, “Can we do this faster, cheaper and more reliably?”.

Goals and objectives of backup

In the process of organizing backup, two main tasks are set: infrastructure recovery in case of failures (Disaster Recovery) and maintaining a data archive in order to subsequently provide access to information for past periods.

A classic example of a backup for Disaster Recovery is a server system partition image created by the program Acronis True Image.

An example of an archive can be a monthly upload of databases from 1C, recorded on cassettes with subsequent storage in a specially designated place.

There are several factors that differentiate a quick restore backup from an archive:

Data retention period. For archival copies, it is quite long. In some cases, it is regulated not only by business requirements, but also by law. For disaster recovery copies, it is relatively small. Usually, one or two (with increased requirements for reliability) backups for Disaster Recovery are created with a maximum interval of a day or two, after which they are overwritten with fresh ones. In especially critical cases, it is also possible to update the backup for disaster recovery more frequently, for example, once every few hours.
Speed of data access. The speed of access to a long-term archive is not critical in most cases. Usually, the need to “raise data for the period” arises at the time of checking documents, returning to a previous version, etc., that is, not in emergency mode. Another thing is disaster recovery, when the necessary data and service performance must be returned as soon as possible. In this case, the speed of access to the backup is an extremely important indicator.
The composition of the copied information. The backup usually contains only user and business data for the specified period. The disaster recovery copy contains, in addition to this data, either system images or copies of operating system and application software settings and other recovery information.

Sometimes it is possible to combine these tasks. For example, a yearly set of monthly full "snapshots" of a file server, plus changes made during the week. True Image is suitable as a tool for creating such a backup.

The most important thing is to clearly understand what the reservation is for. Let me give you an example: a critical SQL server failed due to a disk array failure. We have the right one in stock Hardware, so the solution to the problem was only to restore the software and data. The company's management asks an understandable question: "When will it work?" - and is unpleasantly surprised to learn that the recovery will take as much as four hours. The fact is that throughout the entire lifetime of the server, only databases were regularly backed up without taking into account the need to restore the server itself with all the settings, including the software of the DBMS itself. Simply put, our heroes only saved databases, and they forgot about the system.

I'll give you another example. Throughout his career, the young professional created a single copy of a file server running Windows Server 2003 using the ntbackup program, including the data and System State in a shared folder on another computer. Due to lack of disk space, this copy was constantly overwritten. Some time later, he was asked to restore a previous version of a multi-page report that had been corrupted on save. It is clear that, having no archive history with Shadow Copy turned off, he could not fulfill this request.

On a note

Shadow Copy, literally - " shadow copy". Ensures that instant copies of the file system are created in such a way that further changes to the original do not affect them in any way. This feature allows you to create multiple hidden copies file for a certain period of time, as well as on-the-fly backups of files opened for writing. The Volume Copy Shadow Service is responsible for the operation of Shadow Copy.

System State, literally - "the state of the system." System State Backup backs up critical components of the Windows family of operating systems. This allows you to restore the previously installed system after the destruction. When copying the System State, the registry, boot and other important files for the system are saved, including those for restoring the Active Directory, Certificate Service database, COM + Class Registration database, SYSVOL directories. In the UNIX family OS, an indirect analogue of copying the System State is saving the contents of the /etc, /usr/local/etc directories and other files necessary for restoring the system state.

Which conclusion follows from this: you need to use both types of backup: both for disaster recovery and for archival storage. At the same time, it is necessary to determine the list of resources to be copied, the time for executing tasks, and also where, how and for how long backup copies will be stored.

With small amounts of data and a not very complex IT infrastructure, you can try to combine both of these tasks in one, for example, do a daily full backup of all disk partitions and databases. But still it is better to distinguish between two goals and select the right means for each of them. Accordingly, a different tool is used for each task, although there are universal solutions, such as the Acronis True Image package or the ntbackup program

It is clear that when determining the goals and objectives of backup, as well as solutions for implementation, it is necessary to proceed from business requirements.

When implementing a disaster recovery task, you can use different strategies.

In some cases, it is necessary to restore the system directly to "bare metal" (bare metal). This can be done, for example, using the Acronis True Image program bundled with the Universal Restore module. In this case, the server configuration can be returned to service in a very short time. For example, a partition with an operating system of 20 GB is quite realistic to lift from a backup copy in eight minutes (provided that the archive copy is available over a 1 GB / s network).

In another option, it is more expedient to simply “revert” the settings to the newly installed system, such as copying configuration files from the /etc folder and others on UNIX-like systems (in Windows, this roughly corresponds to copying and restoring the System State). Of course, with this approach, the server will be put into operation no earlier than the operating system is installed and restored. necessary settings which will take a much longer time. But in any case, the decision what to be Disaster Recovery stems from business needs and resource constraints.

The fundamental difference between backup and redundant redundancy systems

This is another interesting question that I would like to touch on. Redundant hardware redundancy systems refer to the introduction of some redundancy into the hardware in order to maintain operability in the event of a sudden failure of one of the components. A perfect example in this case is a RAID array (Redundant Array of Independent Disks). In the event of a single disk failure, information loss can be avoided and a safe replacement can be made, saving data due to the specific organization of the disk array itself (read more about RAID in).

I have heard the phrase: "We have very reliable equipment, there are RAID arrays everywhere, so we do not need backups." Yes, of course, the same RAID array will protect data from destruction if one fails. hard drive. But that's from data corruption computer virus or it will not save from inept user actions. It will not save RAID even if the file system crashes as a result of an unauthorized reboot.

By the way

The importance of differentiating backup from redundant redundant systems should be considered when planning to back up data, whether it's for an organization or for home computers.

Ask yourself why you are making copies. If we are talking about backup, then it means saving data during an accidental (intentional) action. Redundant redundancy makes it possible to save data, including backup copies, in case of equipment failure.

There are many inexpensive devices on the market today that provide reliable redundancy using RAID arrays or cloud technologies (for example, Amazon S3). It is recommended to use both types of information reservation at the same time.

Andrey Vasiliev, CEO of Qnap Russia

I will give one example. There are cases when events develop according to the following scenario: when a disk fails, data is restored due to the redundancy mechanism, in particular, using the saved checksums. At the same time, there is a significant decrease in performance, the server freezes, control is almost lost. The system administrator, seeing no other way out, reboots the server with a cold restart (in other words, clicks on "RESET"). As a result of such a live overload, file system errors occur. The best that can be expected in this case is the lengthy operation of the disk checker in order to restore the integrity of the file system. In the worst case, you will have to say goodbye to the file system and be puzzled by the question of where, how and in what timeframe you can restore data and server performance.

You will not be able to avoid backup even with a clustered architecture. A failover cluster, in fact, keeps the services entrusted to it working if one of the servers fails. In the case of the above problems, such as a virus attack or data corruption due to the notorious "human factor", no cluster will save.

The only thing that can act as an inferior backup replacement for Disaster Recovery is the presence of a mirror backup server with constant data replication from the main server to the backup one (according to the Primary  Standby principle). In this case, if the main server fails, its tasks will be taken over by the backup one, and you won’t even have to transfer data. But such a system is quite expensive and time-consuming to organize. Do not forget about the need for constant replication.

It becomes clear that such a solution is cost-effective only in the case of critical services with high requirements for fault tolerance and minimal recovery time. As a rule, such schemes are used in very large organizations with a high commodity-money turnover. And this scheme is an inferior replacement for backup, because all the same, if data is damaged by a computer virus, inept user actions, or incorrect operation of the application, data and software on both servers can be affected.

And, of course, no system of redundant redundancy will solve the problem of maintaining a data archive for a certain period.

The concept of "backup window"

Performing a backup puts a heavy load on the redundant server. This is especially true for the disk subsystem and network connections. In some cases, when the copying process has a sufficiently high priority, this may lead to the unavailability of certain services. In addition, copying data at the time of changes is associated with significant difficulties. Of course, there are technical means to avoid problems while maintaining data integrity in this case, but if possible, such copying on the fly is best avoided.

The way out when solving these problems described above suggests itself: to postpone the launch of the copy creation process to an inactive period of time, when the mutual influence of backup and other working systems will be minimal. This time period is called the "backup window". For example, for an organization working on an 8x5 formula (five eight-hour working days a week), such a “window” is usually weekends and night hours.

For systems that work according to the 24x7 formula (24 hours a day all week), the time of minimum activity is used as such a period, when there is no high load on the servers.

Types of backup

To avoid unnecessary material costs when organizing backups, and also, if possible, not to go beyond the backup window, several backup technologies have been developed, which are used depending on the specific situation.

Full backup (or Full backup)

It is the main and fundamental method of creating backups, in which the selected data array is copied in its entirety. This is the most complete and reliable type of backup, although the most expensive. If it is necessary to save several copies of data, the total stored volume will increase in proportion to their number. To prevent such waste, compression algorithms are used, as well as a combination of this method with other types of backup: incremental or differential. And, of course, a full backup is indispensable when you need to prepare a backup for a quick system restore from scratch.

Incremental copy

Unlike a full backup, in this case, not all data (files, sectors, etc.) is copied, but only those that have been changed since the last backup. Various methods can be used to determine the copy time, for example, on systems running Windows operating systems, the corresponding file attribute (archive bit) is used, which is set when the file has been modified, and reset by the backup program. Other systems may use the modification date of the file. It is clear that a scheme using this type of backup will be inferior if you do not perform a full backup from time to time. With a full system restore, you need to restore from the last copy created by Full backup, and then alternately “roll” the data from incremental copies in the order they were created.

What is this type of copy used for? In the case of creating archival copies, it is necessary to reduce the consumed volumes on storage devices (for example, to reduce the number of tape media used). It will also allow you to minimize the time to complete backup tasks, which can be extremely important in conditions when you have to work in a tight 24x7 schedule or download large amounts of information.

Incremental copying has one nuance that you need to know. A piecemeal restore brings back the right deleted files for the recovery period. I'll give you an example. For example, a full backup is performed on weekends, and an incremental backup on weekdays. The user created the file on Monday, changed it on Tuesday, renamed it on Wednesday, deleted it on Thursday. So, with a sequential phased data recovery for a weekly period, we will get two files: with the old name on Tuesday before the renaming, and with a new name created on Wednesday. This happened because different incremental copies stored different versions of the same file, and eventually all variants will be restored. Therefore, when restoring data sequentially from an "as is" archive, it makes sense to reserve more disk space so that deleted files can also fit.

Differential backup

It differs from incremental in that data is copied from the last moment of Full backup. In this case, the data are placed in the archive on an accrual basis. In systems of the Windows family, this effect is achieved by the fact that the archive bit is not reset during differential copying, so the changed data gets into the archive copy until the full copy sets the archive bits to zero.

Due to the fact that each new copy, created in this way, contains data from the previous one, it is more convenient for full recovery of data at the time of the accident. This requires only two copies: the full one and the last of the differential ones, so you can bring the data back to life much faster than rolling all the increments in stages. In addition, this type of copying is spared from the above features of incremental copying, when, with full restoration, old files, like a Phoenix bird, are reborn from the ashes. There is less confusion.

But differential copying is significantly inferior to incremental copying in saving the required space. Since each new copy stores data from the previous ones, the total amount of backed up data can be comparable to a full copy. And, of course, when planning the schedule (and calculating whether the backup process will fit in the time window), you need to take into account the time to create the last, thickest, differential copy.

Backup Topology

Let's look at what backup schemes are.

decentralized scheme

The core of this scheme is a certain common network resource(see fig. 1). For example, a shared folder or an FTP server. A set of backup programs is also needed, from time to time uploading information from servers and workstations, as well as other network objects (for example, configuration files from routers) to this resource. These programs are installed on each server and work independently of each other. The undoubted advantage is the ease of implementation of this scheme and its low cost. Regular tools built into the operating system, or software such as a DBMS, are suitable as copy programs. For example, it can be the ntbackup program for the Windows family, the tar program for UNIX-like operating systems, or a set of scripts containing built-in SQL server commands for dumping databases into backup files. Another advantage is the ability to use various programs and systems, as long as they can all access the target resource for storing backups.

The downside is the slowness of this scheme. Since the programs are installed independently of each other, you have to configure each one individually. It is quite difficult to take into account the peculiarities of the schedule and distribute time intervals in order to avoid contention for the target resource. Monitoring is also difficult, the process of copying from each server must be monitored separately from others, which in turn can lead to high labor costs.

Therefore, this scheme is used in small networks, as well as in a situation where it is impossible to organize a centralized backup scheme with the available tools. A more detailed description of this scheme and practical organization can be found in.

Centralized Backup

Unlike the previous scheme, this case uses a clear hierarchical model that works on the principle of "client-server". In the classic version, special agent programs are installed on each computer, and a server module of the software package is installed on the central server. These systems also have a specialized server management console. The control scheme is as follows: from the console we create tasks for copying, restoring, collecting information about the system, diagnostics, and so on, and the server gives the agents the necessary instructions to perform these operations.

This is how most popular systems backups such as Symantec Backup Exec, CA Bright Store ARCServe Backup, Bacula, and others (see Figure 2).

In addition to various agents for most operating systems, there are developments for backing up popular databases and corporate systems, for example, for MS SQL Server, MS Exchange, Oracle Database, and so on.

For very small companies, in some cases, you can try a simplified version of the centralized backup scheme without the use of agent programs (see Figure 3). Also, this scheme can be used if a special agent is not implemented for the backup software used. Instead, the server module will use pre-existing services and services. For example, "raking out" data from hidden shared folders on Windows servers or copy files via SSH from servers running UNIX systems. This scheme has very significant limitations related to the problems of saving files opened for writing. As a result of such actions, open files will either be skipped and not included in the backup, or copied with errors. There are various workarounds for this problem, such as rerunning the job to copy only previously opened files, but none are reliable. Therefore, such a scheme is suitable for use only in certain situations. For example, in small organizations working in 5x8 mode, with disciplined employees who save changes and close files before leaving home. For organizing such a truncated centralized scheme that works exclusively in a Windows environment, ntbackup is a good fit. If you need to use a similar scheme in heterogeneous environments or exclusively among UNIX computers, I recommend looking towards Backup PC (see).

Figure 4. Mixed backup scheme

What is off-site?

In our turbulent and changing world, events can occur that can cause unpleasant consequences for the IT infrastructure and the business as a whole. For example, a fire in a building. Or a break in the central heating battery in the server room. Or the banal theft of equipment and components. One method to avoid information loss in such situations is to store backups in a location away from the main location. server hardware. At the same time, it is necessary to provide fast way access to the data needed for recovery. The described method is called off-site (in other words, storing copies outside the enterprise). Basically, two methods of organizing this process are used.

Writing data to removable media and their physical movement. In this case, you need to take care of the funds fast delivery media back in case of failure. For example, store them in a nearby building. The advantage of this method is the ability to organize this process without any difficulty. The downside is the complexity of returning the media and the very need to transfer information for storage, as well as the risk of damaging the media during transportation.

Copy data to another location over a network link. For example, using a VPN tunnel over the Internet. The advantage in this case is that there is no need to carry media with information somewhere, the disadvantage is the need to use a sufficiently wide channel (as a rule, this is very expensive) and protect the transmitted data (for example, using the same VPN). The difficulties of transferring large amounts of data can be greatly reduced by using compression algorithms or deduplication technology.

Separately, it is worth mentioning the security measures in the organization of data storage. First of all, it is necessary to take care that the data carriers are in a secure room, and about measures that prevent unauthorized persons from reading the data. For example, use an encryption system, conclude non-disclosure agreements, and so on. If removable media is involved, the data on it must also be encrypted. The marking system used should not help an attacker in data analysis. It is necessary to use a faceless numbering scheme for labeling name carriers transferred files. When transferring data over a network, it is necessary (as already mentioned above) to use secure methods of data transfer, for example, a VPN tunnel.

We have analyzed the main points when organizing a backup. The next part will look at guidelines and provides practical examples for creating an effective backup system.

Description of the backup Windows system, including System State - http://www.datamills.com/Tutorials/systemstate/tutorial.htm .
Description of Shadow Copy - http://ru.wikipedia.org/wiki/Shadow_Copy .
Acronis official website - http://www.acronis.ru/enterprise/products .
The description of ntbackup is http://en.wikipedia.org/wiki/NTBackup.
Berezhnoy A. Optimizing the work of MS SQL Server. // System administrator, No. 1, 2008 - P. 14-22 ().
Berezhnoy A. We organize a backup system for small and medium-sized offices. // System administrator, No. 6, 2009 - P. 14-23 ().
Markelov A. Linux on guard of Windows. Overview and installation of the BackupPC backup system. // System administrator, No. 9, 2004 - P. 2-6 ().
Description of VPN - http://ru.wikipedia.org/wiki/VPN.
Data Deduplication - http://en.wikipedia.org/wiki/Data_deduplication .

In contact with

Every computer user knows for sure that no system is immune from errors and even critical failures, when it is not possible to restore it by conventional means. For this, programs have been developed for and including utilities that allow you to create backup copies hard drives and logical partitions. Consider the most popular utilities of different levels of complexity.

Programs and data recovery: expediency of use

Some users misunderstand how powerful these types of utilities are. Unfortunately, they mistakenly believe that the easiest option is to simply copy user files to other logical partitions than the system partition. There is another category of users who believe that it is possible to copy the entire system partition to another location, and then, in case of failure, from this copy. Alas, both are wrong.

Of course, this technique is applicable to user files, but not everyone wants to clutter up another logical volume with a bunch of information or constantly keep an external drive like a USB HDD, a bunch of disks or flash drives, the capacity of which is clearly limited, at hand. And with large amounts of data, you should also take into account the time of copying from one volume to another. Backup and restore programs for both the system and partitions work a little differently. Of course, in most cases, removable media will be needed, but the created backup copy will take up many times less space.

Basic principle of operation and options for operation

As a rule, most of today's well-known and widely used utilities mainly use the principles of creating images and compressing copied data. At the same time, images are most often used specifically to create copies of the operating system, which allows you to later restore it after an unforeseen critical failure, and utilities for copying partitions or user files involve compression according to the type of archiving.

As for reservation options, there can be two of them. In principle, almost any system backup program offers to use external media (DVD, flash drive, etc.). This is due only to the fact that when restoring the system, you will have to boot not from the system partition, but from removable media. The image on the logical partition will not be recognized.

Another thing is disk backup software. In them, you can save the necessary information in other logical partitions or, again, use removable media. But what if the volume of the hard drive used is hundreds of gigabytes? No one will allow you to write down this information even in a compressed form. Alternatively, you can use an external HDD, if it is, of course, available.

As for choosing the right utility to save user files, the best solution is a scheduled file backup program. Such a utility is capable of performing this operation without user intervention, saving all changes made over a certain period of time. New data can be added to the backup, as well as old data can be removed from it. And all this in automatic mode! The advantage is obvious - after all, the user only needs to set the time interval between copy points in the settings, then everything happens without it.

Native Windows backup software

So, to begin with, let's dwell on the native tool of Windows systems. Many people think that the backup program built into the system Windows copy works, to put it mildly, not very well. Basically, they don’t want to use it only due to the fact that the utility spends too much time creating a copy, and the copy itself takes up a lot of space.

However, she has enough advantages. After all, who, if not Microsoft specialists, should know all the subtleties and nuances associated with the components that are essential for the correct recovery of Windows? And many users clearly underestimate the capabilities of the tool built into the system. After all, it is not for nothing that such a backup and recovery program is included in the main set of the system?

The easiest way to access this utility is from the standard "Control Panel", where the backup and restore section is selected. Three main points can be used here: creating an image, creating a disk, and copying settings. The first and second difficulties do not cause. But the third one is quite interesting. The system will prompt you to save a copy on removable media, having previously determined the device itself. But if you look at the parameters, you can save a copy on the network, which is perfect for "locals". So in some cases, such a system backup program will be a good tool for creating a backup with the possibility of later restoring Windows from this particular copy.

Most Popular Utilities

Now let's look at the utilities that, according to many experts, are the most popular among users today. We note right away that it is simply impossible to consider all backup programs, so we will focus on some of them, given the level of popularity and complexity of their use. An approximate list of such utilities might look like this:

Acronis True Image.
Norton Ghost.
back2zip.
Comodo BackUp.
Backup4all.
ABC Backup Pro.
Active Backup Expert Pro.
ApBackUP.
File Backup Watcher Free.
The copier.
Auto backup and many others.

Now let's try to look at the top five. Please note! At the moment, we are considering backup programs that are mainly used for workstations (user computers). Solutions for server systems and networks will be discussed separately.

Acronis True Image

Of course, this is one of the most powerful and popular utilities, enjoying the well-deserved success and trust of many users, although it belongs to entry-level programs. However, she has plenty of opportunities.

After launching the application, the user enters the main menu, where you can select several options for action. In this case, we are interested in the backup and restore section (there are additional utilities in the menu, which for obvious reasons will not be considered now). After entering, the "Master" is activated, which will help create a backup. In the process, you can choose what exactly you want to create a copy of (a system for restoring from scratch, files, settings, etc.). In "Copy Type" it is better to select "Incremental" because it will save space. If the amount of media is large enough, you can use full copy, and to create multiple copies - differential. When creating a copy of the system, you will be prompted to make a boot disk.

Here's what's interesting: the utility shows quite high rates in terms of backup copy creation speed, time, and compression. So, for example, it will take an average of 8-9 minutes to compress data of about 20 GB, and the size of the final copy will be a little more than 8 GB.

Norton Ghost

Before us is another powerful utility. As usual, after starting the program, the "Wizard" is launched, helping you go through all the steps.

This utility is remarkable in that it can be used to create a hidden partition on the hard drive where the copy will be stored (and both data and the system can be restored from it). In addition, many parameters can be changed in it: type of read control, write type, compression, number of points for simultaneous access, etc. As for performance, the same 20 GB application compresses to a size of just over 7.5 GB, which time takes about 9 minutes. In general, the result is quite good.

Back2zip

And here is the scheduled backup program. It differs in that its installation takes only a couple of seconds, and after launch, it automatically creates a new task and starts copying data, assuming that user files are stored in the My Documents folder. Unfortunately, this is also the main disadvantage.

At the start, the task must be deleted, and then the source destination folder must be selected. There is no "master" in the usual sense, everything is done from the main window. In the scheduler, you can set the copy interval from 20 minutes to 6 hours. All in all, the easiest solution for entry-level users.

Comodo Back Up

Before us is another interesting utility that can compete even with commercial products. Its main feature is the presence of as many as five modes of operation and a huge number of settings.

Interestingly, the utility is able to respond to changes in the files included in the backup in real time. As soon as the source file is changed and saved, the application immediately creates a copy of it, adding and replacing the final element in the backup. Not to mention the scheduler, one can separately note the start of creating copies either at the time of start or at exit.

Backup4all

Finally, let's look at another free utility that allows, so to speak, in one fell swoop to make backups for everything that you may need in the future at the same time.

This utility is interesting because it allows you to save copies not only on external or internal media, but also on networks, or even on FTP servers. There are quite a lot of editable parameters and settings, among which there are four copy methods, as well as support. In addition, the interface is very simple, and the display of folders and tasks is presented in a tree structure similar to "Explorer". Also, the user can divide the copied data into categories such as documents, drawings, etc., assign each project its own label. Naturally, there is also a "Task Scheduler", in which you can specify, for example, the creation of copies only at the time of low processor load.

Solutions for server systems

There are also specialized backup programs for server systems and networks. Among all this diversity, three of the most powerful can be whitened:

Symantec Backup Exec 11d System Recovery.
Yosemite Backup Standard Master Server.
Shadow Protect Small Business Server Edition.

It is believed that such utilities are good tool reservations for small businesses. At the same time, recovery from scratch can be performed from any workstation located on the network. But the most important thing is that the backup needs to be done only once, all subsequent changes will be saved automatically. All applications have a "Explorer" interface and support remote control from any terminal on the network.

Instead of an afterword

It remains to be added that far from all data backup / recovery programs that allow you to create backups of both systems and files, and then restore them from the created copies, were considered here. However, it seems that even brief information about the above programs will give many an idea of how it works, and why all this is needed. For obvious reasons, we leave the question of choosing the right software open, since it already depends on the preferences of the user or the system administrator.

The backup system can work like this

How is a corporate backup different from a home backup?
Scale - infrastructure up to a petabyte. The speed is thousands of transactions per second, so, for example, you need to be able to take a backup from the database on the fly without stopping the recording. A zoo of systems: work machines, mobile phones and tablets, people profiles in the cloud, copies of CRM / ERP databases, all this on different OS and in heavy branched systems.

Below I will talk about solutions from IBM, EMC, CommVault, Symantec and what they give both to the business as a whole and to the IT department. Plus some pitfalls.

Let's look at these backup features in ordinary Russian companies. Including those that are backed up only in case of equipment withdrawal.

We are starting an educational program. Do you need a backup at all?

Usually this question is asked by people who are far from IT. The right question is “what kind of backup do you need”? At the beginning of this year, I came across a report that, on average, data loss in the world costs up to a third of the company's value, in the US and Europe - up to half. Simply put, the lack of a fresh backup can in some cases mean leaving the market.

Why do you need a backup at all?

Of course, to protect against failures, attacks and human stupidity. In general, the question is a little naive, but still let's look at it in a little more detail.

First, it protects data from loss. The main reasons for the loss are equipment failures, the fall of remote sites (for example, during a fire in the data center), and the removal of equipment. Smaller cases are the loss of laptops and so on.
The backup also protects the integrity of the data: it insures against operator errors, for example. This is the second most common reason: a person can take and "screw up" important data with the wrong command.
Thirdly, in a corporate environment, a “hot” backup may be needed to quickly deploy services in an emergency, this is very important for those for whom the continuity of IT processes is especially critical, for example, telecom operators or banks.

How do you usually arrive at complex systems?

Everything is simple: with the growth of the company. First used simple means: manual copying, then scheduled scripts or utility settings, after which a server application appears that controls this. At this stage, requirements for the level of backup from security officers or the financial department (managing the risks of the company) are usually added - and that's when the implementation begins. Each task is classified by importance and evaluated, for example, billing should be rolled out 5 minutes after an accident to an active redundant system in another data center, and office staff data should be rolled out 2 hours later to pre-prepared but mothballed equipment. At this level, there is a need for tight integration with applications, and a little later - with hardware arrays for storage.

What does integration look like in practice?

As a rule, when our specialists come to install a total backup, a large company already has several backup subsystems. Most often, we are talking about already configured file backup applications and regular fingerprinting of databases (for example, a nightly backup of a 1C database) and storing them on a separate device. There are, of course, enchanting cases. For example, one retail network I didn’t make a backup of the databases about the availability of goods in the warehouse at all - and in case of a failure, I sent people to do an inventory.

Or here's another example - in a branch there is a copy of the database, which is used only for reading. All data that is created on its basis is temporary. In the event of a crash, a copy of this database is requested from the parent organization and takes three days. People sit and wait. It is clear that the data is not lost, but if there was a proper backup, they would be able to continue working in 20 minutes.

What is the most important thing in backup software?

Let's look at the main parameters.

Architecture
The architecture of the solution is undoubtedly important. Dividing the system into functional modules is a common practice for all enterprise backup solutions. An important point is the separation of the storage layer from the logical data management layer, as is done, for example, in CommVault Simpana - one backup job can use both disk and tape, or even cloud storage.

Backup Software Architecture Example (CommVault Simpana)

Centralized control functions.
It is important to manage all operations. Backing up large systems is quite complex, so it's important that the administrator understands exactly what's going on. With a branched structure, for example, in a large data center with hundreds of systems, you can’t “approach” each one and see if it has a backup copy or not. Here we need a system that can build a report, see that all data and applications are copied or not copied, what you need to pay attention to, notify the administrator about any problems.

Centralized management of the SRK

Market leaders have systems that allow you to see what is stored where, what types of data, what exactly can be optimized, and so on. You can make a forecast for the year ahead.

Concrete arrays and DB
The first is support for arrays, sharpening for specific databases. You need to get data from below and use it in more complex functions, like creating hardware snapshots. The backup systems themselves already know how to perform operations with arrays to ensure data protection, without affecting the production systems that work with these arrays, or minimizing the load on them,

Simply put, the system should be able to make a copy of the database that transactions are currently being made on the fly, and not request this copy from the server application. That is, it must competently and imperceptibly for the application and users to take data from the disk array.

For example, CommVault or EMC systems support almost all operating systems and commercial applications available on the corporate market (in particular, Oracle, Microsoft databases, CommVault also has support for PostgreSQL and MySQL, Documentum, SAP).

Deduplication - architecture
Good deduplication is essential. Good deduplication significantly reduces the price requirements for disk arrays and squeezes traffic very well. Roughly speaking, if the first backup of user data from virtual machines was 10 Gb, then each next one, per day, can be 50-60 Mb - due to the difference between system snapshots. At the same time, the leaders of the backup market (about them below) for external systems copies are visible as separate casts, that is, as if every time a total backup was made. This incredibly speeds up recovery.

I would like to emphasize that deduplication in modern systems is done at the source, that is, on the system from which the data is taken, which greatly reduces the load on the channels. This is very important for extensive networks, which do not always have a wide enough channel through which a full backup can be transmitted. The usual "serial" copy for complex SAP-level systems is only a couple of percent of the total database volume.

The deduplication subsystem, in a good way, should be conveniently scaled. Ideally, linearly with the addition of storage nodes by organizing some Grid or Cloud. At the same time, the nodes should not be separate islands with their own data sets, but connected into a single deduplication space. And it’s very good if these nodes parallelize the load and process it in parallel. I note that now many customers rush to measure deduplication coefficients when comparing products. But this is not entirely correct: modern SATA drives are already 4TB each. Give or take a couple of disks and all systems will be able to store the same amount of data - and it is better to buy one additional disk at the beginning than to rebuild the entire system if necessary for growth.

Load balancing
Such systems also have functions to ensure operation fault tolerance and load balancing, which is important in large data centers, when data volumes in one system can reach tens and hundreds of Tb. For example, a virtualization platform might have a very large amount of data and a large number of virtual machines. The system itself, in this case, should allow you to build a set of servers that will transfer data, receive it from the platform and write it to the storage, so that they can interact with each other, and in case of an increase or decrease in load, redistribute it automatically. The function is simple and obvious, but quite critical, because it affects the speed and efficiency of creating backups.

Important continuity. If any component fails, you can ensure that the jobs pass beyond the backup window (usually overnight). CommVault Simpana allows you to do this automatically in case of failure of media servers, deduplication databases. Other systems have limitations or require expensive hardware solutions. In the figure, you can see two servers with agents that work in conjunction and if one breaks down, the other comes into operation. At the same time, both of them write to the same disk, have a common deduplication base:

Physical storage

Most often, we are talking about storage on disk arrays, where it is provided additional protection data. The first layer - important data is necessarily stored on two independent remote sites (for example, in different data centers). The second layer - this data is stored on different drives. For example, a file of 10 blocks can be written to 11 drives - and if any of them fails, the rest will contain enough data to restore the missing link. Here is an example of one of these.

Disks and tape + "cloud"

So it turns out that tape drives are still in use. Most often, "hot" data (say, 10 percent of the most important) is stored on disks, from where they can be quickly retrieved, and already the second level is on tape. It's practical and cheap, plus the tape allows you to store data for almost decades without replacing equipment, they are simply removed and placed on a shelf. A common case is logs and other documents of banks that need to be kept for a certain period. The backup system is able to allocate such data on the disk, alienate them and archive them on a tape drive. At the same time, it is always possible to find this information and restore it in case of an accident. By the way, you can record both full copies and deduplicated ones - if necessary, smart system can put everything back together as if the last cast was complete.

But CommVault Simpana can also directly add a copy of data from corporate storage to the "cloud" (some of our customers do this with the "cloud" of CROC - we even carried out certification). This additional copy may be considered by the customer as a long-term archive. To store it, you do not need to think about the hardware. Another such copy can be used for disaster recovery systems. For example, one of the customers does this: a copy of all virtual machines is sent to our "cloud" for storage. In the event that the main data center of the customer falls, we can run all these virtual machines on our infrastructure. At the same time, payment before launch is only for capacity - that is, it turns out very economically.

Direct work with users

If you have not encountered a corporate backup, then you may get the impression that only the IT department rolls back the data, and does it manually. But, for example, in CommVault this is not entirely true.

In this situation, the user can enter the portal himself (in the picture below) and upload his own data specifically, if they were in a copy. Typically, such a portal also has a search engine for backups and archives (within user rights). The same archive can also be accessed by information security employees - this will significantly reduce the number of requests to the IT department with questions like: “Who had such and such a document.”

Yes, you understood correctly. If the user has lost a file, accidentally deleted an email, or wants to find an older version of a document for comparison, he just goes and does everything himself in a matter of seconds without unnecessary complications. And he doesn’t even call or write to the IT department.

Separately, it should be said about Search. All unstructured data (files, mail, SharePoint objects, etc.) that enters the system would be good to index and organize a search engine. Simpana knows how to do it. On the one hand, through the self-service console, users can find any object themselves by keywords. On the other hand, the security service can carry out targeted activities to analyze all this information, including to search for internal threats. Well, the system can set data retention periods depending on the content of this data.

How fast can everything be rolled back?

Let's say we have a complex system with an Oracle database as storage. The data is physically "smeared" across several servers in one data center. Used by CommVault.

The first case - the user took and deleted the data from his workstation. Either he or the administrator restores: he enters the interface, selects a site. Everything else is done by the system. The user sees a beautiful web interface, the administrator can work with it or with the console.
Now we are falling mail server Exchange. The script is still quite simple: again, either the user or the administrator determines what data needs to be restored, connects, logs in, opens the recovery console, selects the area, presses the "restore" button.
Now we are losing data from the database of our large commercial application for today. For example, all purchase and sale transactions. In this case, the backup system will knock on the RMAN mechanism that Oracle has (this is a kind of data recovery API). But since everything is already integrated with us, the administrator also only chooses what exactly needs to be restored. Further, RMAN itself, together with the backup system, decides what to do specifically: restore the entire database or some kind of TableSpace, i.e. separate table, and so on.
And now our data center explodes at night. In this case, the administrator selects another data center and rolls the latest copy onto the "clean" equipment. The system itself collects the most recent complete snapshot from the deduplicated data and gives the necessary information to each subsystem and application. Regular Users They probably don't even notice what happened. It may also be that in another data center some data is already there, replicated or simply restored on schedule, then everything is even easier and the restoration does not even take place on a clean system.

Development of systems from version to version

With the development of backup systems comes support for new commercial applications. We are talking about standard service packs within the framework of support. CommVault, for example, has a good policy of releasing compatibility updates to the current version rather than forcing you to buy the next release: this is convenient because the company's infrastructure is constantly evolving.

In new versions of the software, new features appear, such as copying in one pass, for example, with simultaneous transfer to the archive from file servers. Or relatively recently, archiving and backup operations have been combined in Exchange - now they are also done in one pass. Recently, the possibility of archiving virtual machines, which is pleasant for large cloud systems, has appeared: if the machine is not used for a long time or is turned off, then, in accordance with a set of rules, it can be deleted from the virtualization platform, and only a backup copy will remain.

Recently, clients for iOS and Android have appeared to manage copies of their workstation: handy if someone is away on a business trip and forgets a presentation, for example. Or when your laptop breaks down on the road. Here, too, you do not need to wake up the administrator at two in the morning: the user can do everything himself.

Vendors

According to the Gartner report, among the leaders with whom we actively work, in particular, IBM, Symantec, EMC and CommVault.

Gartner square: leaders top-right, niche players bottom-left.

IBM Tivoli Storage Manager (TSM) is quite flexible in terms of setting up and organizing an enterprise backup scheme. Combining various TSM components, the customer gets the opportunity to build the necessary functionality for their tasks. But, often, this requires more time for design and implementation. TSM is often used as part of end-to-end solutions based on hardware and software from IBM.

EMC. As a company producing not only software, but also hardware, it is primarily aimed at integrating all its solutions. Therefore, if the infrastructure is largely built on Clariion, VNX, data domain storage systems, it is worth looking at EMC backup products that will ensure a uniform system structure. By the way, the EMC Avamar product is also a software and hardware solution.

Symantec is represented in the backup market by its flagship product NetBackup, focused on the enterprise segment, and more "lightweight" BackupExec, traditionally used in environments built mainly on Microsoft products. NetBackup is famous for supporting a wide range of operating systems, DBMS and business applications, including those deployed in virtual environment. It also knows how to use the advanced capabilities of modern storage systems. NetBackup is good choice for an environment with a large proportion of UNIX systems. Since recently, Symantec products have been delivered not only as software, but also as a PACK, which speeds up their deployment and configuration.

CommVault. Perhaps most importantly, it is a holistic product that covers almost all potential customer needs. This is a unified platform that combines the functionality of copying, archiving and data access. Plus traditionally good integration with virtualization platforms, deduplication and integration with cloud storage. Well, as mentioned above, it greatly unloads the IT department due to a competent policy of user access rights to archive elements. In a number of deployments, CommVault is a good choice when you have a lot of heterogeneous software and hardware. In homogeneous environments based on *unix, it may be worth thinking about other products, but in heterogeneous ones, it immediately allows you to get rid of chaos and always be calm that the backup is fresh, and will quickly roll back, if anything. And this, as you probably know, is very nerve-racking.

In general, it is necessary to look, of course, in place. If you have questions about what to choose for your infrastructure, write to [email protected], we will help to evaluate all aspects and warn about possible pitfalls.

What are the users of modern information systems most afraid of? We will not conduct surveys and, on the basis of them, compile a list of nightmares that torment them. We are simply stating that one of the first places on this gloomy list is the threat of data loss. And if the loss of data on a home computer in most cases is annoying, then the loss of information on a corporate network can be fatal for both the employee and the company as a whole. But for whoever is responsible for the backup, the fatality of this loss is absolutely inevitable. However, how fair is this?

In modern information systems, the problem of backup is given priority. Companies are spending huge amounts of money on acquiring fault-tolerant disk arrays, specialized backup and storage devices, hire high-end professionals to maintain them - and still continue to lose data. Naturally, heads roll. However, often the problem lies in the misuse of perfectly debugged and tuned systems. Figuratively speaking, users try to hammer nails with a microscope.

In February of this year, a terrible thing happened in a large publishing holding: the data of one of the projects was lost. In this case, the following oddities were noted:

1. The folder structure of the project remained unchanged - only the files disappeared.

2. No files were found on the backup tape (which, by the way, was performed daily), although the folder structure was present in full.

Necessary measures to create a backup system

The backup system is one of the necessary conditions ensuring business continuity. According to Gartner, 43% of companies affected by disasters and experiencing a major irreversible loss corporate data were unable to continue their activities.

In order for the backup system to meet its purpose and work optimally, it is necessary to complete a full cycle of design work, which, however, is recommended for any system being created. A full cycle of work aimed at creating or upgrading a backup system, as a rule, includes the following steps:

Technical audit of the computer system for the creation or modernization of the backup system;

Development of the concept of a backup system - development of recommendations for the construction, modernization and development of a backup system. This type work is optional, but recommended for large, dynamic systems;

Backup system design - development of technical and working documentation;

Development of a plan-schedule for the transition from old system backup to a new one. This type of work is necessary when upgrading the backup system, which led to a significant change existing system;

Supply and configuration of equipment and software;

Development of operation procedures - organization of the processes of operation of the backup system, development of regulations and schedules for the backup system. This type of work is very important: without a properly organized operation process, no system will work effectively, including a backup system;

Drawing up a training program for the customer's personnel on data backup and recovery. For a backup system, staff training plays a special role. Since the purpose of the backup system is to restore data after failures, the personnel carrying out this procedure will work in an emergency situation and lack time to restore the system to working order. Therefore, the execution of data recovery operations should be brought to automatism by administrators, which is achieved only by regular practice.

The investigation, traditionally for Russia, went in two directions: identifying the perpetrators and taking measures to exclude the possibility of a recurrence of a similar situation in the future.

First of all, the complaints were brought against the backup software. The reason why this was done turned out to be very prosaic: it is the backup software that must go through the entire disk structure to copy information to tape, and therefore, in case of any failure in operation, it is theoretically capable of destroying files. Since this assumption came from the victims, the mere statement that this was impossible was clearly not enough. Leaving aside the possibility of such a unique failure in a certified and legally purchased software product, we were forced to find a simple and illustrative way to convince non-specialists of the absurdity of this assumption. This task is extremely difficult (and in most cases - impossible), but we succeeded. The fact is that the backup software uses one of the domain accounts when working with files; therefore, it is limited in its destructive capabilities by the rights of the account used. By default, the local administrator account is used, which allows you to get full access to all information stored on the server. On the one hand, this approach is justified by the fact that it excludes the situation when backup cannot be performed due to lack of access rights to the information being backed up. On the other hand, administrator rights imply full access, allowing you to delete information. In the situation under consideration, the backup software worked under a specially created account that had access to all information, but without the possibility of changing it (read-only access). It was this fact that allowed the IT department to prove that the backup software was not involved in the incident.

Thus, after the cessation of the panic that had arisen, an attempt was made to comprehend what had happened and find its most acceptable explanation. First of all, it was found that three months before the moment in question, the lost project folder was empty. This fact was reflected in the protocols of the backup software and was attached to the case. It was then determined that a completed project was stored on the server that had not been accessed for at least three months. As a result, after the information was deleted from the server, it was stored on the tapes for a month (the rotation period magnetic media in the backup scheme used), after which the tapes were overwritten, and this information was permanently lost.

Backup system requirements

Since any modern Information system is built on the basis of a network, the backup system must also be networked, that is, it must ensure the preservation of data coming from all network nodes. In general, the following functional requirements are put forward for a network backup system:

Building a system on the principle of "client-server". When applied to backup, the terminology "client-server" means the following: the component of the backup system that manages all processes and devices is called the server, and the component responsible for saving or restoring specific data is called the client. In particular, such a system should provide:

Management from dedicated computers of backups throughout the network;

Remote backup of data contained on servers and workstations;

Centralized use of backup devices.

Multiplatform. The modern information network is heterogeneous. Accordingly, the backup system must fully function in such a network, that is, it is assumed that its server part will work in various operating environments and support clients on a wide variety of hardware and software platforms.

Automation of typical operations. The backup process inevitably contains many cycles of various operations. For example, copying can be performed every day at a certain time.

Another example of a cycle is the process of overwriting information on backup media. If a daily backup is to be kept for a week, then after this period the corresponding media can be used again. This process of successively replacing backup media is called rotation.

Cyclic work also includes preventive maintenance of backup devices, for example, cleaning the tape drive units of the tape drive using a special cassette after a certain period of operation.

Thus, the backup system should perform cyclical work in automatic mode and minimize the number of manual operations. In particular, it must support:

Scheduled backups;

Media rotation;

Maintenance of backup devices according to the schedule.

It should be noted that work automation is one of the key conditions for reducing the cost of maintaining a backup system.

Support for various backup modes. Suppose that every day you need to back up a set of files, for example, those contained in a single directory. As a rule, during the working day, changes are made only to individual files, making daily copying of information that has not changed since the last backup was made unnecessary. Based on this, the system should provide various backup modes, that is, support the ability to save only the information that has been changed since the creation of the previous copy.

Quick Recovery network servers after a crash. A network server can fail for various reasons, for example, due to a failure of the system hard drive or due to software errors that led to the destruction of system information. In this case, its recovery requires reinstalling the OS, configuring devices, installing applications, restoring the file system and user accounts. All these operations are very laborious, and at any stage this process errors may occur.

Thus, in order to restore a server, it is necessary to have a backup copy of all information stored on it, including system data, in order to bring it to a working state as soon as possible.

Data backup in interactive (on-line) mode. Often, the information system includes various client-server applications that must operate around the clock. Examples of this are mail systems, collaboration systems (such as Lotus Notes), and SQL servers. It is impossible to back up the databases of such systems by conventional means, since they are always open. Therefore, they often have their own backup tools built in, but their use, as a rule, does not fit into the general technology adopted in the organization. Based on this, the backup system should ensure the preservation of databases of client-server applications in an interactive mode.

Advanced monitoring and management tools. To manage backup processes and monitor their status, the backup system must have graphic aids monitoring and control and a wide range of event alerts.

So, we have established the chronology of information loss. Now we face a very difficult task - to identify the perpetrators. On the one hand, the backup system did not cope with the task of saving information. On the other hand, this information was stored on tapes for a month and could be restored at the first request of the user. But this demand was not received, because the project was completed and no one worked with it. As a result, everyone is right, there are no guilty parties, just as there is no information. The situation - good example misuse of the right technology. Let's answer the question: what is the task of backup systems? The priority task is the prompt and most complete recovery of information in the event of a failure. Another thing is that in the example under consideration, the fact of a failure was not tracked - and, accordingly, the data was not restored. But this cannot in any way be blamed on the administration and backup service.

The situation under consideration is an example that clearly demonstrates the need to maintain at least a two-level backup system - daily backup of current information and separate backup of rarely used information (in our case, completed projects). Unfortunately, the need for such an approach to the problem of information security, as a rule, does not find understanding among the management.

How did this sad story end? But what:

1. It was decided to keep completed projects on DVD.

2. The magnetic media rotation period has been increased to three months.

3. A policy for storing and backing up information throughout the holding was developed and adopted.

P.S. The data was nevertheless found in one of the file deposits, of which there are many in any network.