Perusal, Synthesis, Bliss

May 21, 2017: trying to repair an external hard disk Toshiba MQ01ABD075 (750 GB)

Introduction

I use a Toshiba MQ01ABD075 (750 GB) in an external USB box for backup purpose (see item of October 27, 2012): I did not manage to stop the load/unload cycles, so I removed it from my laptop). It has recently shown I/O errors during writing operations. It already happened to me that a hard drive is repaired with a low-level formatting operation (see item of November 18, 2012: Seagate Momentus 5400.6 (500 GB)); on the contrary it also happened that it did not work (see item of November 14, 2015: Samsung G2 HM502JX (500 GB)). I am going to try the same procedure on this one. Note that for the time being, no Seagate hard drive has failed me completely, contrary to Toshiba (see item of October 27, 2012), Western Digital (see item of July 13, 2013), and Samsung (see item of November 14, 2015).

Possible important note about Seagate hard drives

I have ordered on 23/10/2015 on GrosBill:
DISQUE_DUR SEAGATE Momentus SpinPoint M9T - 2 To - Compatible PS4 2.5 pouces - 5400 tours min - 8 Mo - SATA II - 9.5mm - version OEM (sans boîte ni manuel)
But on it it is written:
Samsung Spinpoint
Momentus
ST2000LM003
PN: HN-M201RAD/Z4
HDD Mfg by Seagate Technology LLC
So, it seems that it is not 100% a Seagate product, it has some Samsung genes. On the contrary, the Seagate Momentus 5400.6 mentioned above does not show any mention of Samsung. This may be explained by that:
Samsung a vendu sa division "disques durs" à Seagate en avril 2011. Ce rachat est validé par la Commission européenne le 20 octobre 2011, la Commission ne considérant pas le rachat de cette division comme susceptible de troubler le jeu de la concurrence sur le marché du stockage3.
So we may expect that though it is written "Samsung", it is really a Seagate disk under the hood. In fact:
En 2012, Seagate rachète LaCie4.
[...]
Ses principaux concurrents sont Western Digital, Hitachi GST (racheté le 7 mars 2011 par Western Digital), Toshiba et Fujitsu.
And indeed on the LDLC website:
HDD_brands.png
According to here, the HGST (Hitachi GST) are by far the most reliable hard disks. I never owned one; it could be my choice in the future, since some people are not satisfied with Seagate here. But after some search on internet: Hitachi models are almost all 3.5’’, are very expensive, and not so frequent (only one model on LDCL, as shown above). So I will rather choose a Seagate one, being quiet.
HDD_failure.png

Procedure application

$ sudo badblocks /dev/sdc1
[sudo] password for jscordi: 
572523040
572523041
572523042
572523043
$ sudo mke2fs -cc -t ext4 /dev/sdc1
mke2fs 1.43.4 (31-Jan-2017)
/dev/sdc1 contains a crypto_LUKS file system
Proceed anyway? (y,N) y
Creating filesystem with 183143168 4k blocks and 45793280 inodes
Filesystem UUID: c2a9e383-f324-4e1b-ba2f-39d504f8aefe
Superblock backups stored on blocks: 
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
        4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 
        102400000
Testing with pattern 0xaa: done                                                 
Reading and comparing: done                                                 
Testing with pattern 0x55: done                                                 
Reading and comparing: done                                                 
Testing with pattern 0xff: done                                                 
Reading and comparing: done                                                 
Testing with pattern 0x00: done                                                 
Reading and comparing: done                                                 
Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (262144 blocks): done
Writing superblocks and filesystem accounting information: done
$ sudo badblocks /dev/sdc1
$
Then creation of an encrypted partition with OpenSuse disk partitioning tool. Then a new badblock detection:
$ sudo badblocks /dev/sdc1
$
Then backup of my computer, and again running badblocks:
$ sudo badblocks /dev/sdc1
$
No problem, contrary to what I got with the same procedure with other external hard disks. Let us cross our fingers and wait.
Update two backups later: no problem.

Update on November 27, 2017: IO errors appear again on backup. I make the procedure again:

$ sudo badblocks /dev/sdc1
[sudo] password for julien-scordia: 
1773500
1773501
1773504
1773505
1773506
1773508
1773509
[...]
$ sudo mke2fs -cc -t ext4 /dev/sdc1
mke2fs 1.43.5 (04-Aug-2017)
/dev/sdc1 contains a crypto_LUKS file system
Proceed anyway? (y,N) y
Creating filesystem with 183143168 4k blocks and 45793280 inodes
Filesystem UUID: a6aeca49-8dfd-45bc-8763-b1a5e8b3ff6c
Superblock backups stored on blocks: 
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
        4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 
        102400000
Testing with pattern 0xaa: done                                                 
Reading and comparing: done                                                 
Testing with pattern 0x55: done                                                 
Reading and comparing: done                                                 
Testing with pattern 0xff: done                                                 
Reading and comparing: done                                                 
Testing with pattern 0x00: done                                                 
Reading and comparing: done                                                 
Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (262144 blocks): done
Writing superblocks and filesystem accounting information: done     

But this time, bad blocks appear as soon as mke2fs has finished; fewer than before, but still some are remaining:
$ sudo badblocks /dev/sdc1
[sudo] password for julien-scordia: 
639636616
639636617
639636618
639636619
713033924
713033925
713033926
713033927
713033928
713033929
713033930
713033931
713041240
713041241
713041242
713041243
$
Therefore I consider the disk as dead. I have tried to make the "partial" backup (i.e. without the download directory) on the Seagate Momentus 5400.6 (500 GB), see item of November 18, 2012), but it is too small. Therefore I need to buy a new backup disk.

Choosing a new hard disk

Today there are special disks made for NAS, i.e. to work all the time: here.
NAS_vs_DAS.png
My use is backup every 3 days, i.e. NOT work all the time. But I may take advantage of this situation to change my 1To 2.5’’ internal hard disk to a better one (more quiet, more capacity).
According to here, 5400 tr/min are more quiet. There are benchmarks with noise levels, but rather for 3.5’’ hard drives: here. The last one was ordered on July 15, 2013:
disque_dur SEAGATE Momentus SpinPoint M8 - 2.5 pouces - 1 To - 5400 tours/min - 8 Mo - SATA II - 9.5mm
But it was not so quiet. There are sometimes 2.5’’ benchmarks, but no noise measures: here.
Arguments to choose the disk format: a priori we may think that 3.5’’ HD are more reliable than 2.5’’ ones. Moreover I could maybe find a Hitachi HD in this format more easily than a 2.5’’ one. But:
So it seems far better to buy a Seagate 2.5’’ one. Note that there is no need to buy too much capacity (Seagate Barracuda 2.5’’ exists up to 5To at 200€), since it would not be used, and the disk may crash in 3 years, though I shall only use 2 To at this time. Taking the size just above the capacity used today seems better; I use a bit more than 1To for all my data and films, so 2 To or 3To is OK. I take 3To:
Seagate BarraCuda 3 To (ST3000LM024) Disque dur 2.5" 3 To 5400 RPM 128 Mo Serial ATA 6 Gb/s
It is 5400 tr/min, so should be more quiet than 7200 tr/min (but other factors are important for noise). Update: it is too thick (two times the thickness of a 2To one): I have sent it back to LDLC and bought a 2 To model instead:
Seagate BarraCuda 2 To (ST2000LM015) Disque dur 2.5" 7mm 2 To 5400 RPM 128 Mo Serial ATA 6 Gb/s
This one works perfectly. I have used the OpenSUSE YAST partitioner to make an encrypted partition, as usual. It proposes XFS as default file system; I did not choose it but ext4 since according to some readings there is possibly data lost on power failure with XFS. To install the hard drive in my computer, I have used with success the procedure given in §14↑. The hard disk seems to be new as expected; indeed after one week of use I get:
$ sudo smartctl -a /dev/sdb
[...]
=== START OF INFORMATION SECTION ===
Device Model:     ST2000LM015-2E8174
Serial Number:    ZDZ06JE5
LU WWN Device Id: 5 000c50 0a404ab0d
[...]
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   076   064   006    Pre-fail  Always       -       44556308
  3 Spin_Up_Time            0x0003   099   099   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       33
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   063   060   045    Pre-fail  Always       -       1935914
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       59 (86 179 0)
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       31
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   045   035   040    Old_age   Always   In_the_past 55 (Min/Max 21/56 #94)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       2
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       62
194 Temperature_Celsius     0x0022   055   065   000    Old_age   Always       -       55 (0 14 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       46 (43 115 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       1998867945
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       976311577
254 Free_Fall_Sensor        0x0032   100   100   000    Old_age   Always       -       0

According to here, important values to estimate ageing are the values of Reallocated_Sector_Ct and Current_Pending_Sector. According to here:
une valeur THRESH (représente la valeur limite avant une dégradation des performances et un risque de panne élevé :
si l’indice VALUE est inférieur ou égal à l’indice THRESH alors le disque risque de tomber en panne. WORST représente la plus petite valeur de VALUE enregistrée.)
though the true value of the attribute is raw_value:
une valeur brute RAW_VALUE (La valeur brute représente la valeur mesurée de l’attribut. Dans le cas de l’attribut « Temperature », elle représente la température du disque dur.)
Here we have 62 cycle counts (so no problem of very frequent "parcage de tête" i.e. head parking), and no reallocated sectors.
Here I have thresh = 36, value = 100 for Reallocated_Sector_Ct, thresh = 0, value = 100 for Load_Cycle_Count and Current_Pending_Sector.
The meaning of Old_age and Pre-fail is the following:
Le type d’attribut Old-age indique que :
si l’indice VALUE est inférieur à THRESH alors cela indique que le produit est en fin de vie du fait d’une usure normale.
L’attribut Pre-Fail indique que :
si l’indice VALUE est inférieur à THRESH alors une panne est imminente, il faut prévoir un remplacement.
On my previous hard drive (now put in an external box for backup), bought in 2012:
$ sudo smartctl -a /dev/sdc 
[...]
Model Family:     Seagate Samsung SpinPoint M8 (AF)
Device Model:     ST1000LM024 HN-M101MBB
Serial Number:    S2TPJ9GC401022
LU WWN Device Id: 5 0004cf 2073359cc
[...]
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   051    Pre-fail  Always       -       93
  2 Throughput_Performance  0x0026   252   252   000    Old_age   Always       -       0
  3 Spin_Up_Time            0x0023   086   074   025    Pre-fail  Always       -       4470
  4 Start_Stop_Count        0x0032   097   097   000    Old_age   Always       -       3162
  5 Reallocated_Sector_Ct   0x0033   252   252   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   252   252   051    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0024   252   252   015    Old_age   Offline      -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       15774
 10 Spin_Retry_Count        0x0032   252   252   051    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       13
 12 Power_Cycle_Count       0x0032   097   097   000    Old_age   Always       -       3136
191 G-Sense_Error_Rate      0x0022   100   100   000    Old_age   Always       -       29
192 Power-Off_Retract_Count 0x0022   252   252   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0002   064   032   000    Old_age   Always       -       18 (Min/Max 14/68)
195 Hardware_ECC_Recovered  0x003a   100   100   000    Old_age   Always       -       0
196 Reallocated_Event_Count 0x0032   252   252   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   252   252   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   252   252   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0036   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x002a   100   100   000    Old_age   Always       -       12542
223 Load_Retry_Count        0x0032   100   100   000    Old_age   Always       -       13
225 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       9354
So no more problem on this HDD than on the new one.
Note: I have not been able to make smartctl work on a old (more than 10 years) IDE disk:
 $ sudo smartctl -a /dev/sdc     
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.13.0-17-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
Read Device Identity failed: scsi error unsupported scsi opcode
A mandatory SMART command failed: exiting. To continue, add one or more ’-T permissive’ options.
But garbage is obtained when using the permissive option (see also here):
$ sudo smartctl -T permissive -a /dev/sdc     
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.13.0-17-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
Read Device Identity failed: scsi error unsupported scsi opcode
=== START OF INFORMATION SECTION ===
Device Model:     ����k�h_D=��B�$��P��I���nhj�
(>$��5�h�
Serial Number:    yٴ>�‘�#�r
                           0���*��e
Add. Product Id:  "�Pקj�
Firmware Version: ���m��$Q
Rotation Rate:    21134 rpm
Form Factor:      Unknown (0xcbca)
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   Unknown(0x0abf) (unknown minor revision code: 0x7339)
Transport Type:   Unknown (0x6a27)
Local Time is:    Sun Dec 24 20:49:34 2017 CET
SMART support is: Ambiguous - ATA IDENTIFY DEVICE words 82-83 don’t show if SMART supported.
SMART support is: Ambiguous - ATA IDENTIFY DEVICE words 85-87 don’t show if SMART is enabled.
A mandatory SMART command failed: exiting. To continue, add one or more ’-T permissive’ options.