The concept of the file archiving process. An Overview of Data Compression Methods What is File Compression Ratio
Purpose of archiving- ensuring a more compact arrangement of information on a disk, as well as reducing the time and, accordingly, the cost of transmitting information through communication channels in computer networks. In addition, archiving greatly simplifies the transfer of information from one computer to another, reduces the time it takes to copy it to external media, helps protect information from unauthorized access, and helps protect against computer viruses.
The main feature of archiving- this is information compression, i.e. converting it to a form that reduces redundancy in its representation and, accordingly, requires less memory for storage.
Both one and several files can be compressed, which in compressed form are placed in one so-called archive file or archive, from where they can be extracted in their original form.
Archive file (archive) is a specially organized file containing one or more files in a compressed or uncompressed form and service information about file names, date and time of their creation or modification, sizes, etc.
The process of writing files to an archive file is called archiving(archiving, packing), and extracting files from the archive - unzipping(unzipping, unpacking).
The degree of file compression during archiving depends on its format. Some formats (for example, graphic formats) rely on compression performed by programs that create files of these types, and therefore do not shrink when archived. Best of all, when archiving, text files and database files are compressed, files of executable programs and load modules are compressed less. The compression ratio is also affected by the compression method.
In addition to regular archive files, you can create continuous, multi-volume and self-extracting archives, as well as their combinations, for example: multi-volume self-extracting, multi-volume continuous, etc.
Solid archive is an archive packed in a special way, in which all compressed files are treated as one sequential data stream.
Continuous archiving greatly increases the compression ratio, especially when adding a large number small similar files. However, there are also disadvantages:
§ existing continuous archives are updated more slowly than regular archives;
§ encrypted non-stop archives cannot be modified;
§ To extract a single file from a continuous archive, it is necessary to analyze all previous archived files, so extracting individual files from the middle of a continuous archive is slower than extracting from a normal archive. However, if all or a few of the first files are extracted from a continuous archive, then in this case the decompression speed is almost the same as with ordinary archives;
§ If any file in a continuous archive is damaged, then all files following it will also fail to be extracted. Therefore, when saving a continuous archive on unreliable media, it is recommended to add recovery information.
Continuous archives are best used when:
§ the archive is rarely updated;
§ there is no need to frequently extract one or more files from the archive;
§ one is archived big file;
§ Compression ratio is more important than compression speed.
Files in continuous archives are usually sorted by extension, but the sort order can be changed.
Multi-volume archives are archives consisting of several parts (volumes). Typically, volumes are used to store a large archive on multiple floppy disks or other removable media.
The first volume in the sequence has the usual standard extension archiver program, and extensions of subsequent volumes - the first letter of the archiver extension and a serial number.
Files in existing volumes cannot be added, updated, or deleted.
Self-extracting (SFX, from English words SelF-eXtracting) archive is the archive to which the executable is attached. This module allows you to extract files by simply running the archive as regular program. Thus, no additional external programs are required to extract the contents of an SFX archive. SFX archives, like any other executable files, usually have the .EXE extension, but they can be handled just like any other archive.
SFX archives are useful in cases where you need to transfer the archive to someone, but you are not sure that the recipient has the appropriate archiver to extract files.
Multi-volume and self-extracting archives can also be continuous.
Programs that archive / unzip files are called archiving programs.
Archiving programs can be compared according to the following main parameters: interface, compression methods (determining the degree of file compression), types of archives created, speed, support for other archiver formats.
When creating an archive, the archiver program automatically assigns its own extension to the archive file, for example, zip, rar, etc.
The archiving program is managed in one of the following ways:
1. using command line;
2. using the built-in shell and dialog panels that allow you to control using menus and function keys.
3. using function key combinations in operating shells, which, as a rule, can offer a choice of several DOS archiving programs or the shell's own archiver.
4. using GUI elements.
Despite the many archiving programs, a modern user, as a rule, really works with two archive formats: ZIP and RAR.
The degree of information compression depends on several reasons:
First, the type of data being compressed is of great importance. Graphic and text files are best compressed. For them, the compression ratio can be from five to forty percent. Files of executable programs, boot modules, multimedia files are compressed worse.
Secondly, the compression method is of great importance.
Thirdly, it is also important which archiver is used. When choosing the type of archiver, they are usually guided by the following considerations: so that the compression ratio is as high as possible, and the time it takes to pack and unpack files is as short as possible.
Information compression programs
Compression occurs with the help of archiving programs. To date, the most common are four archivers - WinRar, WinAce, 7Zip and WinZip. As for the last program, it does not stand up to scrutiny.
Let's take a closer look at the archiver - WinRar This archiver can be associated with the following file types: RAR, ZIP, CAB, ARJ, LZH, ACE, 7-Zip, TAR, GZip, UUE, BZ2, JAR, ISO.
The program supports files of almost unlimited size (up to 8,589,934,591 GB). True, to work with files larger than 4 GB, you need to work in file system NTFS.
When choosing optimal settings There are a few things to keep in mind for compression:
Although WinRAR supports the ZIP format, it is recommended to choose RAR in most cases. This will provide more high level compression. You can compress files to ZIP if you are not sure that a program will be installed on the computer on which the files will be unpacked, with which you can unpack files in RAR format.
You need to decide which compression method is best to use. The higher the compression ratio, the more time it will take to archive, so here you need to consider for what purposes the data is archived. If this is long-term storage, of course, it makes sense to wait and get the archive with the maximum compression ratio, but if you just need to send a few documents by mail, the normal (Normal) compression ratio is fine for you.
If you need to achieve maximum file compression, use the Create solid archive option. However, it also has its drawbacks. Firstly, it will take more time to unpack such files than to extract from a regular archive. Imagine that you have two hundred files in your archive. If it's created in the usual way, you can easily extract one of the files. If you used solid archive, then it will matter how the file you need would be archived. If it was in the middle of the second hundred, then to unpack it, the program will need to unpack 150 files before it gets to it. Creating archives in this way can also entail great losses, because if the archive becomes corrupted, you will lose all the files that were in it. In the case of packing in the usual way, you can extract from the damaged archive, if not all, but most of the files.
If you need to create a large archive, this can take quite a long time. WinRar allows you to determine how much time it will take to complete a particular task. The Benchmark and hardware test option is intended for this. Another reason to use this option is to define possible errors, which may occur during archiving on a computer of a particular configuration due to a hardware failure.
Among other settings of WinRar "a, one can note the possibility of creating self-extracting archives with an indication of the unpacking path. Such files do not require an archiver program on the computer on which they are planned to be unzipped. Such archives are called SFX-archives. Their disadvantage compared to conventional archives files is a larger size, since they, in addition to the actual packed files, also contain the EXE executable module.
The contents of a RAR archive can be made invisible. To do this, in the program settings, in the Archiving with Password window, you need to check the box next to the Encrypt File Names line.
You can also set a password to open the archive. As a result of an error in transferring an archive over a local network or downloading it from the Internet, as well as due to a hardware failure or a virus attack, the archive may be damaged. WinRar allows you to determine the integrity of the data by testing the archive using the Test Archived Files option.
To minimize the chance of data loss, when creating WinRar archives it is recommended to use the Put Recovery Record option (this checkbox can be found on the General tab of the archive creation window).
If this has been done, then in case of damage to the archive, it can be restored.
In addition, in WinRar, you can reduce the likelihood of damage to a RAR archive by specifying the size of the information to be restored when creating it. To do this, you need to execute the Commands > Protect Archive From Damage command in the Winrar window. At the same time, the volume of the Recovery Record cannot exceed ten percent of the total size of the archive.
To repair damaged RAR archives, select the required file in the WinRar window and execute the Tools > Repair command.
WinRAR can be built into the context menu, and it supports not only the Explorer menu, but also other programs, such as the popular file manager Total Commander. This makes it possible to quickly archive files using the default settings and without opening the program window for this. By the way, the default settings can be changed, in accordance with what requirements you place on your archives. You can do this by opening the WinRar window and executing the Options > Settings command. In this window, go to the Compression tab and click the Create Default button. The settings specified in this window will be used for quick archiving. If you need to change the archiving settings, this can also be done using context menu. To do this, select the Add to Archive… command. Here you can set the format and compression ratio, specify the name of the archive, and select other archiving options.
WinRar allows you to save user-defined settings to a file with a Reg extension. Later, this file can be imported into the program to reuse the given configuration. This file stores information such as the history of archives that have been created recently, default compression settings, etc.
Another handy Winrar option is the ability to create your own bookmarks - Favorities. It is very often necessary to regularly back up the same folders on your hard drive. By bookmarking information about the location of these folders, you can quickly navigate to them in the program window and back up the necessary files and subdirectories.
General information about archiving files
Process Conceptarchiving files One of the most widely used types of service programs are archiving programs, designed for archiving, packaging files by compressing the information stored in them. Information compression - this is the process of converting information stored in a file to a form in which redundancy in its representation is reduced and, accordingly, less memory is required for storage. Information in files is compressed by eliminating redundancy in various ways, for example, by simplifying codes, excluding constants from them bits, or representing repeated symbols or a repeating sequence of symbols as a repetition factor and corresponding symbols. Various algorithms for such information compression are used. Both one and several files can be compressed, which are placed in a compressed form in the so-called archive file or archive. archive file- this is a specially organized file containing one or more files in compressed or uncompressed form and service information about file names, the date and time of their creation or modification, sizes, etc. The purpose of file packaging is usually to provide a more compact arrangement of information on disk, reducing the time and, accordingly, the cost of information transmission over communication channels in computer networks. In addition, packing a group of files into one archive file greatly simplifies their transfer from one computer to another, reduces the time it takes to copy files to disks, protects information from unauthorized access, and helps protect against computer viruses. File compression ratio characterized by the coefficient Ks, defined as the ratio of volume compressed file Vc to the size of the original file Vo, expressed as a percentage: Kc=(Vc/Vo)*100% The compression ratio varies depending on the program being used, the compression method, and the source file type. The files of graphic images, text files and data files are most well compressed, for which the compression ratio can reach 5 - 40%, the files of executable programs and load modules are compressed less - 60 - 90%. Archive files are almost not compressed. Archiving programs differ in the compression methods used, which accordingly affects the degree of compression. Archiving (packaging)- placing (loading) source files into an archive file in compressed or uncompressed form. Unzipping (unpacking) - the process of restoring files from an archive exactly as they were before they were loaded into the archive. When unpacking, the files are extracted from the archive and placed on disk or in RAM; Programs that pack and unpack files are called programs - archivers Large archive files can be placed on several disks (volumes). These archives are called multivolume. Tom is component multivolume archive. When creating an archive of several parts, you can write its parts to several floppy disks. The main types of archiving programs Currently, several dozen programs are used - archivers, which differ in the list of functions and operating parameters, but the best of them have approximately the same characteristics. Some of the most popular programs include: ARJ, PKPAK, LHA, ICE, HYPER, ZIP, RAK, ZOO, EXPAND developed abroad, as well as AIN and RAR developed in Russia. Usually, packing and unpacking files are performed by the same program, but in some cases this is done by different programs, for example, the PKZIP program packs files, and PKUNZIP unpacks files. these files do not require any programs, since the archive files themselves may contain an unpacking program. Such archive files are called self-extracting. Self-extracting archive file - this is a bootable, executable module that is capable of independently unzipping the files in it without using an archiver program. The self-extracting archive is called SFX - archive (SelF - eXtracting). Archives of this type in MS DOS are usually created in the form of an .EXE file. Many programs - archivers unpack files by uploading them to disk, but there are also those that are designed to create a packaged executable module (program). As a result of such packaging, a program file is created with the same name and extension, which, when loaded into RAM, self-extracts and immediately starts. However, it is also possible inverse transformation program file into an unpacked format. Such archivers include the PKLITE, LZEXE, UNP programs. The EXPAND program, which is part of the utilities operating system MS DOS and Windows shell used to decompress files software products supplied by Microsoft. Programs - RAR and AIN archivers, in addition to the usual compression mode, have a solid mode, in which archives with a high compression ratio and a special organization structure are created. In such archives, all files are compressed as one data stream, i.e. the search area for repeated character sequences is the entire set of files loaded into the archive, and therefore the unpacking of each file, if it is not the first one, is associated with the processing of others. It is preferable to use archives of this type for archiving a large number of files of the same type. Ways to manage the program - archiver The program - archiver is controlled in one of two ways:- using the MS DOS command line, which forms run command, containing the name of the archiver program, the control command and its configuration keys, as well as the names of the archive and source files; such management is typical for archivers ARJ, AIN, ZIP, RAK, LHA, etc.;
- using the built-in shell and dialog panels that appear after starting the program and allow you to control using the menu and function keys, which creates a more comfortable working environment for the user. Such control has a program - RAR archiver.
- create archive files from individual or all files of the current directory and its subdirectories, loading up to 32000 files into one archive;
- add and replace files in the archive;
- extract and delete archive files;
- protect each of the files placed in the archive with a 32-bit cyclic code, test the archive, checking the safety of information in it;
- receive work assistance in 3 international languages;
- enter comments to files in the archive;
- save paths to files in the archive;
- save several generations (versions) of the same file in the archive;
- reorder the archive file by file sizes, names, extensions, date and time of modification, compression ratio, etc.;
- search for strings in archived files;
- restore files from damaged archives;
- create self-extracting archives both on one volume and on several volumes;
- view the contents of text files contained in the archive;
- ensure the protection of information in the archive and access to files placed in the archive with a password.
Group number |
Command group |
Team |
Archive function |
Placement in the archive |
add files to archive |
||
replace files in the archive with new versions |
|||
add only new files to the archive |
|||
move files to archive |
|||
Extract from the archive |
extract files from archive to current directory |
||
extract files from the archive and place them in directories according to the access paths specified for them |
|||
Removing from the archive |
delete files from archive |
||
Service functions |
full archive testing |
||
displaying the contents of the archive without specifying the path to the files |
|||
displaying the contents of the archive with the path to the files |
|||
copy archive with new parameters |
|||
find text string in archive |
Purpose |
|
Adding files from the current directory and all its subdirectories, specifying the path to the files | |
Creating a multi-volume archive file | |
Protecting the created archive with a password: g<пароль>- the password is entered on the command line g? - enter an invisible password on execution |
|
Adding/replacing files, except for files whose names are specified after the key | |
Request to perform an operation for each file: to confirm, you must enter the character "Y" for refusal - character "N" |
|
Creating a self-extracting archive | |
Specifying the archiving method: m0 - no compression; m1 - normal compression (default); m2 - the highest compression; m3 - fast compression and less compression; m4 - fastest compression and least compression; |
|
"Yes" answer is expected for all archiver questions | |
Pause when viewing the archive content after the screen is full |
Modifier |
Assigning a modifier |
Specifies that the archive files of a multi-volume archive will take up all the free space on the disks (volumes) | |
Allows you to execute any number of DOS commands before creating a new volume, such as viewing, clearing or formatting the floppy disk on which the next archive file is to be written; after executing the commands, you must enter the EXIT command to continue archiving | |
Forbids sharing archived files between volumes | |
Provides for filing sound signal before installing the next volume | |
Allows you to reserve free space on the first volume; the number following the r indicates the size of this space | |
360, 720, 1200 |
Variants of modifiers for specifying archive volume sizes |
- the ability to work in two modes - full screen interactive interface and conventional command line interface;
- support for other types of archives; in full-screen mode, RAR provides the ability to work with archives of other types (.ZIP, .ARJ, LZH), view their contents, modify and convert them;
- using the highly efficient solid compression method to obtain a high compression ratio (10 - 50% higher than usual);
- the ability to create self-extracting and multi-volume archives;
- password protection of archives.
- password encryption;
- adding file and archive comments;
- the possibility of partial or complete recovery of damaged archives;
- protection of the archive from changes;
- the ability to add to the archive information about the creator of the archive, the time and date of the last changes made to the archive.
- in command line mode;
- in full screen mode.
Function name |
Purpose |
|
Add a file to the archive, if the archive does not exist it will be created | ||
View file | ||
Update files in the archive - only changed files are added, the old copies of which are in the archive | ||
Create archive volumes | ||
Transfer files to archive | ||
Add files that are not in the archive and update those whose old copies are already in the archive | ||
Restore corrupted archive | ||
Exit RAR. Key |
||
Create a continuous (solid) archive | ||
View file | ||
Create an archive split into SFX volumes | ||
Create solid - archive divided into volumes | ||
Create solid - archive split into SFX volumes |
Function name |
Purpose |
|
Displaying help information | ||
Test archive | ||
View file | ||
Extract file from archive with full paths | ||
Add a comment to the archive | ||
Extract files to current directory | ||
Convert to SFX - archive | ||
Delete files from archive | ||
Configuration/Save configuration | ||
Exit from the archive | ||
View the file with the built-in program if there is an external | ||
Extract files to a specified directory | ||
Add comments to files | ||
Block archive from changes |
work directory Data compression methods have a fairly long history of development, which began long before the advent of the first computer. This article will attempt to give a brief overview of the main theories, concepts of ideas and their implementations, which, however, does not claim to be absolute completeness. More detailed information can be found, for example, in Krichevsky R.E. , Ryabko B.Ya. , Witten I.H. , Rissanen J. , Huffman D.A., Gallager R.G. , Knuth D.E. , Vitter J.S. and etc. Information compression is a problem that has a fairly long history, much older than the history of the development of computer technology, which (history) usually went in parallel with the history of the development of the problem of encoding and encryption of information. All compression algorithms operate on an input stream of information, the minimum unit of which is a bit, and the maximum unit is several bits, bytes, or several bytes. The goal of the compression process, as a rule, is to obtain a more compact output stream of information units from some initially non-compact input stream using some transformation of them. The main technical characteristics of compression processes and the results of their work are: The degree of compression (compress rating) or the ratio (ratio) of the volumes of the source and resulting streams; Compression rate - the time spent on compressing a certain amount of information in the input stream until an equivalent output stream is obtained from it; Compression quality - a value showing how heavily packed the output stream is by applying re-compression to it using the same or another algorithm. There are several different approaches to the problem of information compression. Some have a very complex theoretical mathematical base, others are based on the properties of the information flow and are algorithmically quite simple. Any approach and algorithm that implements data compression or compression is designed to reduce the volume of the output information stream in bits using its reversible or irreversible transformation. Therefore, first of all, according to the criterion associated with the nature or format of the data, all compression methods can be divided into two categories: reversible and irreversible compression. Irreversible compression means such a transformation of the input data stream, in which the output stream, based on a certain information format, represents, from a certain point of view, an object that is quite similar in external characteristics to the input stream, but differs from it in volume. The degree of similarity of the input and output streams is determined by the degree of correspondence of some properties of the object (ie compressed and uncompressed information, in accordance with some specific data format) represented by this information stream. Such approaches and algorithms are used to compress, for example, raster graphic file data with a low byte repeat rate in the stream. This approach uses the property of the structure of the graphic file format and the ability to present a graphic image approximately similar in display quality (for perception by the human eye) in several (or rather n) ways. Therefore, in addition to the degree or magnitude of compression, the concept of quality arises in such algorithms, since Since the original image changes during the compression process, then quality can be understood as the degree of correspondence between the original and resulting images, which is subjectively assessed based on the information format. For graphic files, this correspondence is determined visually, although there are also corresponding intelligent algorithms and programs. Irreversible compression cannot be used in areas where it is necessary to have an exact match between the information structure of the input and output streams. This approach is implemented in popular formats for representing video and photo information, known as JPEG and JFIF algorithms and JPG and JIF file formats. Reversible compression always leads to a decrease in the volume of the output information flow without changing its information content, i.e. - without loss of information structure. Moreover, the input stream can be obtained from the output stream using a decompression or decompression algorithm, and the recovery process is called decompression or decompression, and only after the decompression process is the data suitable for processing in accordance with its internal format. In reversible algorithms, encoding as a process can be considered from a statistical point of view, which is even more useful, not only for constructing compression algorithms, but also for evaluating their effectiveness. For all reversible algorithms, there is a notion of coding cost. The coding cost is the average length of a code word in bits. The coding redundancy is equal to the difference between the cost and the coding entropy, and a good compression algorithm should always minimize the redundancy (recall that the entropy of information is understood as a measure of its disorder.). Shannon's fundamental theorem on encoding information says that "the cost of encoding is always not less than the entropy of the source, although it can be arbitrarily close to it." Therefore, for any algorithm, there is always some limit to the degree of compression, determined by the entropy of the input stream. Let us now proceed directly to the algorithmic features of reversible algorithms and consider the most important theoretical approaches to data compression related to the implementation of coding systems and methods of information compression. The most well-known simple approach and reversible compression algorithm is Run Length Encoding (RLE). The essence of the methods of this approach is to replace chains or series of repeated bytes or their sequences with one encoding byte and a counter for the number of their repetitions. The problem with all similar methods is only to determine the way in which the decompressing algorithm could distinguish the encoded series from other unencoded byte sequences in the resulting byte stream. The solution to the problem is usually achieved by placing labels at the beginning of the encoded chains. Such marks may be, for example, characteristic bit values in the first byte of a coded run, values of the first byte of a coded run, and the like. These methods, as a rule, are quite effective for compressing bitmap graphic images (BMP, PCX, TIF, GIF). the latter contain quite a few long series of repeating sequences of bytes. The disadvantage of the RLE method is a rather low compression ratio or the cost of encoding files with a small number of series and, even worse, with a small number of repeated bytes in series. The process of data compression without using the RLE method can be divided into two stages: modeling (modeling) and, in fact, encoding (encoding). These processes and their implementing algorithms are quite independent and diverse. Encoding is usually understood as the processing of a stream of characters (in our case, bytes or nibbles) in some alphabet, and the frequencies of occurrence of characters in the stream are different. The goal of encoding is to convert this stream into a bit stream of minimum length, which is achieved by reducing the entropy of the input stream by taking into account symbol frequencies. The length of the code representing characters from the stream alphabet must be proportional to the amount of information in the input stream, and the length of the stream characters in bits may not be a multiple of 8 or even variable. If the probability distribution of the frequencies of occurrence of characters from the alphabet of the input stream is known, then it is possible to construct an optimal coding model. However, due to the existence of a huge number of different file formats, the task becomes much more complicated. the data symbol frequency distribution is not known in advance. In that case, in general view, two approaches are used. The first one consists in viewing the input stream and building encoding based on the collected statistics (this requires two passes through the file - one for viewing and collecting statistical information, the second for encoding, which somewhat limits the scope of such algorithms, because, thus, , eliminates the possibility of one-pass on-the-fly coding used in telecommunication systems, where the amount of data is sometimes not known, and their retransmission or parsing may take an unreasonably long time). In such a case, the entropy scheme of the used coding is written to the output stream. This technique is known as static Huffman coding. All compression algorithms operate on the input information stream in order to obtain a more compact output stream using some kind of transformation. The main technical characteristics of compression processes and the results of their work are: · degree of compression - the relation of volumes of initial and resulting streams; · compression rate - the time spent on compressing a certain amount of information in the input stream, until an equivalent output stream is obtained from it; · compression quality - a value showing how heavily packed the output stream is when re-compressing it is applied to it using the same or another algorithm. Algorithms that eliminate the redundancy of data recording are called data compression algorithms, or archiving algorithms. Currently, there are a huge number of data compression programs based on several basic methods. All data compression algorithms are divided into: ) lossless compression algorithms, when using which the data at the receiving end is restored without the slightest change; ) lossy compression algorithms that remove information from the data stream that has little effect on the essence of the data, or is generally unperceivable by a person. There are two main lossless archiving methods: Huffman algorithm (eng. Huffman), focused on compressing sequences of bytes that are not interconnected, the Lempel-Ziv algorithm (eng. Lempel, Ziv), focused on compressing any kind of text, that is, using the fact of repeated repetition of "words" - sequences of bytes. Almost all popular programs lossless archiving (ARJ, RAR, ZIP, etc.) uses a combination of these two methods - the LZH algorithm. Huffman algorithm. The algorithm is based on the fact that some characters from the standard 256-character set in free text may occur more often than the average repetition period, while others, respectively, less often. Therefore, if $+o records common characters using short sequences of bits less than 8 long, and long ones to record rare characters, then the total file size will decrease. Lempel-Ziv algorithm. The classical Lempel-Ziv algorithm -LZ77, named after the year of its publication, is extremely simple. It is formulated as follows: if a similar sequence of bytes has already been encountered in the past output stream, and the record of its length and offset from the current position is shorter than this sequence itself, then the link (offset, length) is written to the output file, and not the sequence itself. Compression of information in archive files is performed by eliminating redundancy different ways, for example, by simplifying the codes, eliminating constant bits from them, or representing repeating symbols or a repeating sequence of symbols as a repetition factor and corresponding symbols. Algorithms for such information compression are implemented in special archiver programs (the most famous of which are arj / arjfolder, pkzip / pkunzip / winzip, rar / winrar) certain ones are used. Both one or several files can be compressed, which are placed in a compressed form in the so-called archive file or archive. The purpose of file packaging is usually to provide a more compact arrangement of information on a disk, to reduce the time and, accordingly, the cost of transferring information over communication channels in computer networks. Therefore, the main indicator of the effectiveness of a particular archiver program is the degree of file compression. The degree of file compression is characterized by the coefficient Kc, defined as the ratio of the volume of the compressed file Vc to the volume of the original file Vo, expressed as a percentage (some sources use the inverse ratio): Kc=(Vc/Vo)*100% The amount of compression depends on the program you are using, the compression method, and the type of source file. The files of graphic images, text files and data files are most well compressed, for which the compression ratio can reach 5 - 40%, the files of executable programs and load modules are compressed less Kc = 60 - 90%. Archive files are almost not compressed. This is easy to explain if you know that most archiving programs use variants of the LZ77 (Lempel-Ziv) algorithm for compression, the essence of which is a special encoding of repeating sequences of bytes (read - characters). The frequency of occurrence of such repetitions is highest in texts and scatter plots and practically reduced to zero in archives. In addition, archiving programs still differ in the implementations of compression algorithms, which accordingly affects the degree of compression. Some archiving programs additionally include tools aimed at reducing the compression ratio Kc. So in WinRAR program a mechanism for continuous (solid) archiving has been implemented, using which a 10 - 50% higher compression ratio can be achieved than conventional methods, especially if a significant number of small files of the same type of content are packed. Characteristics of archivers are inversely dependent values. That is, the higher the compression rate, the lower the compression ratio, and vice versa. There are many archivers on the computer market - each has its own set of supported formats, its pros and cons, its own circle of admirers who firmly believe that the archiver they use is the best. We will not dissuade anyone or anything - we will simply try to impartially evaluate the most popular archivers in terms of functionality and efficiency. These include WinZip, WinRAR, WinAce, 7-Zip - they are the leaders in terms of the number of downloads on software servers. It is hardly advisable to consider other archivers, since the percentage of users using them (judging by the number of downloads) is small.First menu item Configuration allows you to call the configuration dialog box for setting the main RAR parameters (Fig. 11.3). The window contains five groups of parameters: Interface options - interface settings; Sort names - setting options for sorting files; Include file mask - setting the file inclusion mask; Compression - setting the compression method; Other options - setting other parameters.Fig. 11.3. View of the RAR archiver configuration settings window A parameter marked with a cross means that the corresponding function is enabled. The transition from one parameter to another is carried out by pressing the arrow keys. To change the parameter value in the current field, press .Technology of work with the archiver Let's consider the sequence of actions when performing the most frequently performed archiving procedures after loading the RAR program to work in full screen mode. Creating a new archive from multiple files 1.Select a disk by pressing a key combination .
Series encoding compression
Compression without using the RLE method
The coding process and its methods
4. File compression ratio