Components

We evaluate the condition of hard drives using S.M.A.R.T.

A modern hard drive is a unique computer component. It is unique in that it stores service information, by studying which you can assess the “health” of the disk. This information contains the history of changes in many parameters monitored by the hard drive during operation. No longer does any component of the system unit provide the owner with statistics of its operation! Coupled with the fact that the HDD is one of the most unreliable components of a computer, such statistics can be very useful and help its owner avoid hassle and loss of money and time.

Information about the status of the disk is available thanks to a set of technologies collectively called S.M.A.R.T. (Self-Monitoring, Analisys and Reporting Technology, i.e. technology of self-monitoring, analysis and reporting). This complex is quite extensive, but we will talk about those aspects of it that allow you to look at the S.M.A.R.T. attributes displayed in any hard drive testing program and understand what is going on with the disk.

I note that the following applies to drives with SATA and PATA interfaces. SAS, SCSI and other server drives also have S.M.A.R.T., but its presentation is very different from SATA/PATA. And it’s usually not a person who monitors server disks, but a RAID controller, so we won’t talk about them.

So, if we open S.M.A.R.T. in any of the numerous programs, we will see approximately the following picture (the screenshot shows the S.M.A.R.T. of the Hitachi Deskstar 7K1000.C HDS721010CLA332 drive in HDDScan 3.3):

Each line displays a different S.M.A.R.T attribute. Attributes have more or less standardized names and a specific number, which do not depend on the model and manufacturer of the disk.

Each S.M.A.R.T. attribute has several fields. Each field belongs to a specific class from the following: ID, Value, Worst, Threshold and RAW. Let's look at each of the classes.

ID(may also be called Number) - identifier, attribute number in S.M.A.R.T technology. The name of the same attribute can be given differently by programs, but the identifier always uniquely identifies the attribute. This is especially useful in the case of programs that translate the generally accepted attribute name from English into Russian. Sometimes the result is such nonsense that you can understand what kind of parameter it is only by its identifier.
Value (Current)— the current value of the attribute in parrots (i.e., in values of unknown dimension). During the operation of the hard drive, it can decrease, increase and remain unchanged. Using the Value indicator, you cannot judge the “health” of an attribute without comparing it with the Threshold value of the same attribute. As a rule, the smaller the Value, the worse the state of the attribute (initially all value classes except RAW on the new disk have the maximum possible value, for example 100).
Worst— the worst value that Value reached during the entire life of the hard drive. It is also measured in “parrots”. During operation, it may decrease or remain unchanged. It is also impossible to clearly judge the health of an attribute; you need to compare it with Threshold.
Threshold— the value in “parrots” that the Value of the same attribute must reach in order for the attribute’s state to be considered critical. Simply put, Threshold is a threshold: if Value is greater than Threshold, the attribute is OK; if less or equal - with the problem attribute. It is according to this criterion that utilities that read S.M.A.R.T. issue a report on the state of the disk or an individual attribute like “Good” or “Bad”. At the same time, they do not take into account that even with a Value greater than Threshold, the disk may in fact already be dying from the user’s point of view, or even a walking dead man, so when assessing the health of a disk, it is still worth looking at another attribute class, and namely RAW. However, it is the Value value that falls below Threshold that can become a legitimate reason for replacing the disk under warranty (for the warranty providers themselves, of course) - who can speak more clearly about the health of the disk than himself, demonstrating the current attribute value is worse than the critical threshold? That is, with a Value value greater than Threshold, the disk itself considers that the attribute is healthy, and with a value less than or equal to it, that it is sick. Obviously, if Threshold=0, the attribute state will never be considered critical. Threshold is a constant parameter hardcoded into the disk by the manufacturer.
RAW (Data)- the most interesting, important and necessary indicator for evaluation. In most cases, it does not contain “parrots”, but real values expressed in various units of measurement, directly indicating the current state of the disk. Based on this indicator, the Value value is formed (but by what algorithm it is formed is already a secret of the manufacturer, shrouded in darkness). It is the ability to read and analyze the RAW field that makes it possible to objectively assess the condition of the hard drive.

This is what we will do now - we will analyze all the most used S.M.A.R.T. attributes, see what they say and what needs to be done if they are not in order.

Attributes S.M.A.R.T.

Before describing the attributes and acceptable values of their RAW field, I will clarify that attributes can have a RAW field of different types: current and accumulating. The current field contains the value of the attribute at the moment; it is characterized by periodic changes (for some attributes - occasionally, for others - many times per second; another thing is that such rapid changes are not displayed in S.M.A.R.T. readers). Accumulation field - contains statistics, usually it contains the number of occurrences of a particular event since the disk was first started.

The current type is typical for attributes for which there is no point in summing their previous readings. For example, the disk temperature display is current: its purpose is to show the current temperature, not the sum of all previous temperatures. The accumulating type is characteristic of attributes for which their whole purpose is to provide information over the entire “life” of the hard drive. For example, an attribute characterizing the operating time of a disk is cumulative, i.e., it contains the number of units of time worked by the drive over its entire history.

Let's start looking at attributes and their RAW fields.

Attribute: 01 Raw Read Error Rate

All Seagate, Samsung (starting with the SpinPoint F1 family (inclusive)) and Fujitsu 2.5″ drives have huge numbers in these fields.

For other Samsung drives and all WD drives, this field is set to 0.

For Hitachi disks, this field is characterized by 0 or periodic changes in the field ranging from 0 to several units.

Such differences are due to the fact that all Seagate hard drives, some Samsung and Fujitsu consider the values of these parameters differently than WD, Hitachi and other Samsung. When any hard drive operates, errors of this kind always arise, and it overcomes them on its own, this is normal, it’s just that on disks that contain 0 or a small number in this field, the manufacturer did not consider it necessary to indicate the true number of these errors.

Thus, a non-zero parameter on WD and Samsung drives up to SpinPoint F1 (not inclusive) and a large parameter value on Hitachi drives may indicate hardware problems with the drive. Note that utilities may display multiple values contained in the RAW field of this attribute as one, and it will appear quite large, although this will not be correct (see below for details).

On Seagate, Samsung (SpinPoint F1 and newer) and Fujitsu drives, you can ignore this attribute.

Attribute: 02 Throughput Performance

The parameter does not provide any information to the user and does not indicate any danger for any of its values.

Attribute: 03 Spin-Up Time

The acceleration time may vary for different disks (and for disks from the same manufacturer too) depending on the spin-up current, the weight of the plates, the rated spindle speed, etc.

By the way, Fujitsu hard drives always have a one in this field if there are no problems with spindle spinning.

It says practically nothing about the health of the disk, so when assessing the condition of the hard drive, you can ignore this parameter.

Attribute: 04 Number of Spin-Up Times (Start/Stop Count)

When assessing health, ignore the attribute.

Attribute: 05 Reallocated Sector Count

Let us explain what a “reassigned sector” actually is. When a disk encounters an unreadable/hard-to-read/unwritable/hard-to-write sector during operation, it may consider it irreparably damaged. Especially for such cases, the manufacturer provides a reserve area on each disk (on some models - in the center (logical end) of the disk, on some - at the end of each track, etc.). If there is a damaged sector, the disk marks it as unreadable and uses the sector in the spare area instead, making the appropriate notes in a special list of surface defects - G-list. This operation of assigning a new sector to the role of an old one is called remap or reassignment, and the sector used instead of the damaged one is reassigned. The new sector receives the logical LBA number of the old one, and now when software accesses a sector with this number (programs do not know about any reassignments!) the request will be redirected to the reserve area.

Thus, even though the sector has failed, the disk capacity does not change. It is clear that it does not change for the time being, since the volume of the reserve area is not infinite. However, the spare area may well contain several thousand sectors, and allowing it to run out would be very irresponsible - the disk will need to be replaced long before that.

By the way, repairmen say that Samsung drives very often do not want to perform sector reassignment.

Opinions vary regarding this attribute. Personally, I think that if it reaches 10, the disk must be changed - after all, this means a progressive process of degradation of the state of the surface of either pancakes, or heads, or something else hardware, and there is no way to stop this process. By the way, according to people close to Hitachi, Hitachi itself considers a disk to be replaced when it already has 5 reassigned sectors. Another question is whether this information is official, and whether service centers follow this opinion. Something tells me no :)

Another thing is that service center employees may refuse to recognize the disk as faulty if the disk manufacturer’s proprietary utility writes something like “S.M.A.R.T. Status: Good" or the values of the Value or Worst attribute will be greater than Threshold (in fact, the manufacturer’s utility itself can evaluate by this criterion). And formally they will be right. But who needs a disk with constant deterioration of its hardware components, even if such deterioration is consistent with the nature of the hard drive, and hard drive technology tries to minimize its consequences by allocating, for example, a spare area?

Attribute: 07 Seek Error Rate

The description of the formation of this attribute almost completely coincides with the description for attribute 01 Raw Read Error Rate, with the exception that for Hitachi hard drives the normal value of the RAW field is only 0.

Thus, do not pay attention to the attribute on Seagate, Samsung SpinPoint F1 and newer and Fujitsu 2.5″ drives; on other Samsung models, as well as on all WD and Hitachi drives, a non-zero value indicates problems, for example, with a bearing, etc. .

Attribute: 08 Seek Time Performance

It does not provide any information to the user and does not indicate any danger regardless of its value.

Attribute: 09 Power On Hours Count (Power-on Time)

Doesn't say anything about the health of the drive.

Attribute: 10 (0A - hexadecimal) Spin Retry Count

Most often it does not indicate the health of the disk.

The main reasons for increasing the parameter are poor contact of the disk with the power supply or the inability of the power supply to supply the required current to the power line of the disk.

Ideally, it should be equal to 0. If the attribute value is 1-2, you can ignore it. If the value is higher, first of all you should pay close attention to the condition of the power supply, its quality, the load on it, check the contact of the hard drive with the power cable, check the power cable itself.

Surely the disk may not start immediately due to problems with itself, but this happens very rarely, and this possibility should be considered last.

Attribute: 11 (0B) Calibration Retry Count (Recalibration Retries)

A non-zero, or especially a growing value of the parameter may indicate problems with the disk.

Attribute: 12 (0C) Power Cycle Count

Not related to the disk state.

Attribute: 183 (B7) SATA Downshift Error Count

Does not indicate the health of the drive.

Attribute: 184 (B8) End-to-End Error

A non-zero value indicates disk problems.

Attribute: 187 (BB) Reported Uncorrected Sector Count (UNC Error)

A non-zero attribute value clearly indicates that the disk state is abnormal (in combination with a non-zero attribute value of 197) or that it previously was (in combination with a zero attribute value of 197).

Attribute: 188 (BC) Command Timeout

Such errors can occur due to poor quality cables, contacts, adapters used, extension cords, etc., as well as due to the incompatibility of the drive with a specific SATA/PATA controller on the motherboard (or a discrete one). Due to errors of this kind, BSODs are possible in Windows.

A non-zero attribute value indicates a potential disk disease.

Attribute: 189 (BD) High Fly Writes

In order to say why such cases occur, you need to be able to analyze S.M.A.R.T. logs, which contain information specific to each manufacturer, which is not currently implemented in publicly available software - therefore, the attribute can be ignored.

Attribute: 190 (BE) Airflow Temperature

Does not indicate the condition of the disk.

Attribute: 191 (BF) G-Sensor Shock Count (Mechanical Shock)

Relevant for mobile hard drives. On Samsung disks you can often ignore this, because they may have a very sensitive sensor that, figuratively speaking, almost reacts to the movement of air from the wings of a fly flying in the same room as the disk.

In general, the activation of the sensor is not a sign of an impact. It can even grow from positioning the BMG with the disk itself, especially if it is not secured. The main purpose of the sensor is to stop the recording operation when there is vibration to avoid errors.

Doesn't indicate disk health.

Attribute: 192 (C0) Power Off Retract Count (Emergency Retry Count)

Does not allow you to judge the condition of the disk.

Attribute: 193 (C1) Load/Unload Cycle Count

Doesn't indicate disk health.

Attribute: 194 (C2) Temperature (HDA Temperature, HDD Temperature)

The attribute does not indicate the state of the disk, but allows you to control one of the most important parameters. My opinion: when working, try not to allow the temperature of the hard drive to rise above 50 degrees, although the manufacturer usually declares a maximum temperature limit of 55-60 degrees.

Attribute: 195 (C3) Hardware ECC Recovered

The features inherent in this attribute on different disks fully correspond to those of attributes 01 and 07.

Attribute: 196 (C4) Reallocated Event Count

Indirectly speaks about the health of the disk. The higher the value, the worse. However, it is impossible to unambiguously judge the health of a disk based on this parameter without considering other attributes.

This attribute is directly related to attribute 05. When 196 grows, 05 most often grows as well. If when attribute 196 grows, attribute 05 does not grow, it means that when trying to remap, the candidate for bad blocks turned out to be a soft bad (see details below), and the disk corrected it so that the sector was considered healthy and no reassignment was necessary.

If attribute 196 is less than attribute 05, it means that during some remapping operations, several bad sectors were transferred in one go.

If attribute 196 is greater than attribute 05, it means that during some reassignment operations, soft bads were discovered that were subsequently corrected.

Attribute: 197 (C5) Current Pending Sector Count

When encountering a “bad” sector during operation (for example, the sector checksum does not match the data in it), the disk marks it as a candidate for reassignment, adds it to a special internal list and increases parameter 197. It follows that the disk may have damaged sectors, which he does not yet know about - after all, there may well be areas on the plates that the hard drive does not use for some time.

When attempting to write to a sector, the disk first checks to see if the sector is on the candidate list. If the sector is not found there, recording proceeds as usual. If found, this sector is tested by writing and reading. If all test operations pass normally, then the disk considers the sector to be healthy. (That is, there was a so-called “soft bad” - the erroneous sector arose not due to the fault of the disk, but for other reasons: for example, at the time of recording the information, the electricity went out, and the disk interrupted the recording, parking the BMG. As a result, the data in sector will be unwritten, and the sector checksum, which depends on the data in it, will generally remain old. There will be a discrepancy between it and the data in the sector.) In this case, the disk performs the originally requested write and removes the sector from the list of candidates. In this case, attribute 197 is reduced, and attribute 196 can also be increased.

If testing fails, the disk performs a reassignment operation, decreasing attribute 197, increasing 196 and 05, and also makes notes in the G-list.

So, a non-zero value of the parameter indicates a problem (however, it cannot indicate whether the problem is with the disk itself).

If the value is non-zero, you must start sequential reading of the entire surface in the Victoria or MHDD programs with the option remap. Then, when scanning, the disk will definitely come across a bad sector and try to write to it (in the case of Victoria 3.5 and the option Advanced remap— the disk will try to write the sector up to 10 times). Thus, the program will trigger the “treatment” of the sector, and as a result, the sector will either be fixed or reassigned.

If reading fails, both with remap, so with Advanced remap, it’s worth trying to run sequential recording in the same Victoria or MHDD. Keep in mind that the write operation erases data, so be sure to make a backup before using it!

Sometimes the following manipulations can help prevent a remap from being performed: remove the disk electronics board and clean the hard drive contacts connecting it to the board - they may be oxidized. Be careful when performing this procedure - it may void your warranty!

The impossibility of a remap may be due to another reason - the disk has exhausted the reserve area, and it simply has nowhere to reassign sectors.

If the value of attribute 197 is not reduced to 0 by any manipulation, you should think about replacing the disk.

Attribute: 198 (C6) Offline Uncorrectable Sector Count (Uncorrectable Sector Count)

This parameter changes only under the influence of offline testing; no program scans affect it. For operations during self-test, the behavior of the attribute is the same as attribute 197.

A non-zero value indicates problems with the disk (just like 197, without specifying who is to blame).

Attribute: 199 (C7) UltraDMA CRC Error Count

In the vast majority of cases, the causes of errors are a poor-quality data transfer cable, overclocking of the PCI/PCI-E buses of the computer, or poor contact in the SATA connector on the disk or on the motherboard/controller.

Errors during transmission over the interface and, as a result, an increasing value of the attribute can lead to the operating system switching the operating mode of the channel on which the drive is located to PIO mode, which entails a sharp drop in the read/write speed when working with it and processor load to 100% (visible in Windows Task Manager).

In the case of Hitachi hard drives of the Deskstar 7K3000 and 5K3000 series, a growing attribute may indicate incompatibility between the disk and the SATA controller. To correct the situation, you need to force the drive to switch to SATA 3 Gb/s mode.

My opinion: if there are errors, reconnect the cable at both ends; if their number grows and it is more than 10, throw away the cable and replace it with a new one or remove the overclock.

Attribute: 200 (C8) Write Error Rate (MultiZone Error Rate)

Attribute: 202 (CA) Data Address Mark Error

Attribute: 203 (CB) Run Out Cancel

The health effects are unknown.

Attribute: 220 (DC) Disk Shift

The health effects are unknown.

Attribute: 240 (F0) Head Flying Hours

The health effects are unknown.

Attribute: 254 (FE) Free Fall Event Count

The health effects are unknown.

Let us summarize the description of the attributes. Non-zero values:

When analyzing attributes, keep in mind that some S.M.A.R.T. Several values of this parameter can be stored: for example, for the penultimate startup of the disk and for the last one. Such multi-byte parameters are logically composed of multiple values that are smaller in number of bytes - for example, a parameter that stores two values for the last two runs, each with 2 bytes allocated, would be 4 bytes long. Programs that interpret S.M.A.R.T. are often unaware of this, and show this parameter as one number rather than two, which sometimes leads to confusion and anxiety for the owner of the disk. For example, "Raw Read Error Rate" storing the penultimate value of "1" and the last value of "0" would look like 65536.

It should be noted that not all programs can display such attributes correctly. Many people translate an attribute with several values into the decimal number system as one huge number. The correct way to display such content is either with a breakdown by value (then the attribute will consist of several separate numbers), or in a hexadecimal number system (then the attribute will look like one number, but its components will be easily distinguishable at first glance), or both , and something else at the same time. Examples of correct programs are HDDScan, CrystalDiskInfo, Hard Disk Sentinel.

Let's demonstrate the differences in practice. This is what the instantaneous value of attribute 01 looks like on one of my Hitachi HDS721010CLA332 without taking into account the Victoria 4.46b feature of this attribute:

And this is what it looks like in the “correct” HDDScan 3.3:

The advantages of HDDScan in this context are obvious, aren’t they?

If you analyze S.M.A.R.T. on different disks, you may notice that the same attributes may behave differently. For example, some S.M.A.R.T. parameters Hitachi hard drives are reset to zero after a certain period of disk inactivity; parameter 01 has features on Hitachi, Seagate, Samsung and Fujitsu drives, 03 - on Fujitsu. It is also known that after flashing the disk, some parameters may be set to 0 (for example, 199). However, such forced resetting of the attribute will in no way mean that the problems with the disk have been resolved (if there were any). After all, a growing critical attribute is consequence problems, not cause.

When analyzing multiple datasets, S.M.A.R.T. It becomes obvious that the set of attributes for disks from different manufacturers and even for different models of the same manufacturer may differ. This is due to the so-called vendor specific attributes (i.e., attributes used to monitor their disks by a specific manufacturer) and should not be a cause for concern. If monitoring software can read such attributes (for example, Victoria 4.46b), then on disks for which they are not intended, they can have “terrible” (huge) values, and you simply do not need to pay attention to them. This is how, for example, Victoria 4.46b displays RAW values of attributes that are not intended for monitoring on the Hitachi HDS721010CLA332:

There is often a problem when programs cannot calculate S.M.A.R.T. disk. In the case of a working hard drive, this can be caused by several factors. For example, very often S.M.A.R.T. is not displayed. when connecting a drive in AHCI mode. In such cases, it is worth trying different programs, in particular HDD Scan, which has the ability to work in this mode, although it does not always succeed, or it is worth temporarily switching the disk to IDE compatibility mode, if possible. Further, on many motherboards, the controllers to which the hard drives are connected are not built into the chipset or south bridge, but are implemented on separate chips. In this case, the DOS version of Victoria, for example, will not see the hard drive connected to the controller, and it will need to force it to be specified by pressing the [P] key and entering the number of the channel with the disk. S.M.A.R.T.s are often not read. for USB drives, which is explained by the fact that the USB controller simply does not pass commands to read S.M.A.R.T. Almost never read S.M.A.R.T. for disks operating as part of a RAID array. Here, too, it makes sense to try different programs, but in the case of hardware RAID controllers this is useless.

If, after purchasing and installing a new hard drive, any programs (HDD Life, Hard Drive Inspector and others like them) show that: the disk has 2 hours left to live; its productivity is 27%; health - 19.155% (select according to your taste) - then there is no need to panic. Understand this. Firstly, you need to look at the S.M.A.R.T. indicators, and not at the health and productivity numbers that came from nowhere (however, the principle of their calculation is clear: the worst indicator is taken). Secondly, any program when assessing S.M.A.R.T. parameters. looks at the deviation of the values of various attributes from previous readings. When you first launch a new disk, the parameters are not constant; it takes some time to stabilize them. The program that evaluates S.M.A.R.T. sees that the attributes are changing, makes calculations, it turns out that if they change at this rate, the drive will soon fail, and it begins to signal: “Save the data!” Some time will pass (up to a couple of months), the attributes will stabilize (if everything is really in order with the disk), the utility will collect data for statistics, and the timing of the death of the disk as S.M.A.R.T. stabilizes. will be transported further and further into the future. Evaluation of Seagate and Samsung drives by programs is a completely different matter. Due to the peculiarities of attributes 1, 7, 195, programs, even for an absolutely healthy disk, usually give the conclusion that it is wrapped in a sheet and crawling to the cemetery.

Please note that the following situation is possible: all S.M.A.R.T. attributes. - normal, but in fact the disk has problems, although this is not noticeable by anything yet. This is explained by the fact that S.M.A.R.T technology. It works only “after the fact”, i.e. the attributes change only when the disk encounters problem areas during operation. And until he comes across them, he doesn’t know about them and, therefore, in S.M.A.R.T. he has nothing to record.

So S.M.A.R.T. is a useful technology, but it must be used wisely. Additionally, even if S.M.A.R.T. your disk is perfect, and you constantly check the disk - do not rely on the fact that your disk will “live” for many years to come. Winchesters tend to break so quickly that S.M.A.R.T. it simply does not have time to display its changed state, and it also happens that there are obvious problems with the disk, but in S.M.A.R.T. - Everything is fine. You could say that a good S.M.A.R.T. does not guarantee that everything is fine with the drive, but bad S.M.A.R.T. guaranteed to indicate problems. Moreover, even with bad S.M.A.R.T. utilities may indicate that the disk status is “healthy” due to the fact that critical attributes have not reached threshold values. Therefore, it is very important to analyze S.M.A.R.T. yourself, without relying on “verbal” evaluation of programs.

Although S.M.A.R.T. technology and it works, hard drives and the concept of “reliability” are so incompatible that they are considered simply consumables. Well, like cartridges in a printer. Therefore, to avoid losing valuable data, make periodic backups of it to another medium (for example, another hard drive). It is optimal to make two backup copies on two different media, not counting the hard drive with the original data. Yes, this leads to additional costs, but believe me: the cost of restoring information from a broken HDD will cost you many times - if not an order of magnitude - more. But data cannot always be restored even by professionals. That is, the only way to ensure reliable storage of your data is to backup it.

Finally, I will mention some programs that are well suited for S.M.A.R.T analysis. and hard drive testing: HDDScan (Windows, DOS, free), MHDD (DOS, free).