Non-bootable Windows recovery (Part 3)

Context

This is the third post in a series dedicated to documenting a systematic approach of recovering a non-bootable Windows machine with Linux and open source tools. Make sure you read the first and second to familiarize yourself with the context and previous activities.

In a nutshell, the investigation was narrowed down to a disk that fails S.M.A.R.T. extended self test, and misreports Reallocated Sector Count. An initial invocation of ddrescue copied most of the unhealthy disk's content to a healthy destination in almost 7 hours, and a second pass ran for more than 40 hours to focus on the unreadable areas only, which has successfully squeezed a few more sectors out of the failing disk.

It was demonstrated, how to:

calculate the remaining bad sector ranged from the map file of ddrescue
identify affected partitions
convert raw offsets to partition-relative sectors
find out, what one bad sector range is used for by the filesystem, in this case, NTFS

Previous posts stressed the importance of understanding, what kind of data a bad sector range stored (such as file content, directory structure, journal, superblock or unused area) in order to evaluate the impact of permanently loosing that piece of data.

During the investigation of the bad sector range there were errors reading 4 MFT entries 222148-222151. Addressing this potential file system corruption was postponed to a later phase of the analysis.

Carry on with identifying affected files

A total of 1081 kilobytes of bad sectory remained, spread across 16 areas of the disk.


[root@sysresccd /mnt/REPO]# cat bad-sector-ranges-sdb4.txt 
3733376-3733383
9217344-9217351
9220448-9220455
9240544-9240551
9324120-9324127
10061520-10061527
16845920-16845927
17032080-17032081
17032113
17032808-17032815
42682272-42682287
47500880-47500927
89082000-89082175
103508872-103509823
103707120-103707887
139004680-139004815

[root@sysresccd /mnt/REPO]# cat bad-sector-ranges-sdb4.txt | while read range; do ntfscluster -s $range /dev/sda4; done > affected-files-sdb4.txt
[root@sysresccd /mnt/REPO]# cat affected-files-sdb4.txt 
Searching for sector range 3733376-3733383
Inode 112588 /ProgramData/Norton/{0C55C096-0F1D-4F28-AAA2-85EF591126E7}/NS_22.6.0.142/NCW/hlinks/ncwperfm.db.data/$DATA
* one inode found
Searching for sector range 9217344-9217351
* no inode found
Searching for sector range 9220448-9220455
* no inode found
Searching for sector range 9240544-9240551
Inode 248863 /Windows/ServiceProfiles/LocalService/AppData/Roaming/PeerNetworking/91e9854e5d4d7ac47cd0bf3ad8003817/922c235eb3d1d91e502cb02599cc24c7/grouping/edb02861.log/$DATA
* one inode found
Searching for sector range 9324120-9324127
Inode 96681 /$Extend/$UsnJrnl/$DATA($J)
* one inode found
Searching for sector range 10061520-10061527
Inode 0 /$MFT/$DATA
* one inode found
Searching for sector range 16845920-16845927
Inode 112588 /ProgramData/Norton/{0C55C096-0F1D-4F28-AAA2-85EF591126E7}/NS_22.6.0.142/NCW/hlinks/ncwperfm.db.data/$DATA
* one inode found
Searching for sector range 17032080-17032081
Inode 216164 /Windows/WinSxS/Catalogs/824be8ecd9b27e0ee92936dfba7c9f759df9d2e2999542e22d6f3bacab5b6140.cat/$DATA
* one inode found
Searching for sector 17032113
Inode 133052 /Program Files (x86)/Common Files/Adobe/OOBE/PDApp/P6/ZStringResources/es_LA/stringtable.xml/$DATA
* one inode found
Searching for sector range 17032808-17032815
Inode 75215 /Windows/WinSxS/Manifests/amd64_microsoft-windows-c..rintscan-deployment_31bf3856ad364e35_6.3.9600.16384_none_5cfa2773ab911611.manifest/$DATA
* one inode found
Searching for sector range 42682272-42682287
* no inode found
Searching for sector range 47500880-47500927
* no inode found
Searching for sector range 89082000-89082175
* no inode found
Searching for sector range 103508872-103509823
* no inode found
Searching for sector range 103707120-103707887
* no inode found
Searching for sector range 139004680-139004815
* no inode found

Interpreting the results and taking decisions

The impact of the unreadable sectors could be summarized as follows, without going into detail:

Norton Antivirus is affected, a single file at two distant areas on the disk
Windows networking related file is affected
Two of Windows Side-by-Side assemblies are affected
Adobe OOBE Latin American Spanish language file is affected
The NTFS journal and Master File Table are affected, the damaged MFT suggests that the list of affected files might be incomplete
Many large bad areas are reported as unallocated by the file system, but this is not a hard fact since the MFT is damaged

In order to make further progress with data recover within a realistic timeframe, the decision was made to take calculated risks and nuke the last 4 bad areas, which are the largest by size. This carries risks described in the previous post, and the next post will show a more risk-avoiding approach to narrowing down the the area the recovery of which will be attempted.

The advantage of nuking the last four bad sector ranges is the ability to obverse how Reallocated_Sector_Ct and Current_Pending_Sector are updated in response to the forced reallocation, and prove the Western Digital firmware bug.

Nuking the selected bad sector ranges

The first and last sector relative to the partition was used to calculate the lenght of the bad area and then the offset of the partition was added to the offset of the first sector of the sector range within the partition to calculate the starting position within the disk. Prior experience has shown that operating on the disk /dev/sdb is preferred to operating on the partition./dev/sdb4. Further, output flag direct should be used skip any caching logic, and a block size one sector, that is, 512 bytes has to be applied.


[root@sysresccd /mnt/REPO]# echo $((139004815-139004680))
135
[root@sysresccd /mnt/REPO]# echo $((139004680+1083392))
140088072
[root@sysresccd /mnt/REPO]# dd seek=140088072 count=135 oflag=direct if=/dev/zero of=/dev/sdb
135+0 records in
135+0 records out
69120 bytes (69 kB, 68 KiB) copied, 0.427308 s, 162 kB/s
[root@sysresccd /mnt/REPO]# smartctl -a /dev/sdb
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.32-1-lts] (local build)
...
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
...
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   199   197   000    Old_age   Always       -       275
198 Offline_Uncorrectable   0x0030   199   198   000    Old_age   Offline      -       285
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   198   000    Old_age   Offline      -       289
...

[root@sysresccd /mnt/REPO]# dd skip=140088072 count=135 iflag=direct if=/dev/sdb of=/dev/null
135+0 records in
135+0 records out
69120 bytes (69 kB, 68 KiB) copied, 0.0161237 s, 4.3 MB/s

It can be seen that no error was encoutered when overwriting the given sectors. The number of pendig sectors decreased from 292 ro 275, however, both Reallocated_Event_Count and Reallocated_Sector_Ct remained 0. With that, the false reporting of reallocations on this Western Digital Blue disk is confirmed. The careful reader should have noticed, that a read test was also done on the freshly overwritten disk area to confirm successful reallocation.

Next, the remaining three bad sector ranges were nuked.


[root@sysresccd /mnt/REPO]# echo count $((103707887-103707120))
count 767
[root@sysresccd /mnt/REPO]# echo first sector without offset $((103707120+1083392))
first sector without offset 104790512
[root@sysresccd /mnt/REPO]# dd seek=104790512 count=767 oflag=direct if=/dev/zero of=/dev/sdb
767+0 records in
767+0 records out
392704 bytes (393 kB, 384 KiB) copied, 0.454273 s, 864 kB/s
[root@sysresccd /mnt/REPO]# dd skip=104790512 count=767 iflag=direct if=/dev/sdb of=/dev/null
767+0 records in
767+0 records out
392704 bytes (393 kB, 384 KiB) copied, 0.0849093 s, 4.6 MB/s
[root@sysresccd /mnt/REPO]# echo count $((103508872-103509823))
count -951
[root@sysresccd /mnt/REPO]# echo first sector without offset $((103508872+1083392))
first sector without offset 104592264
[root@sysresccd /mnt/REPO]# dd seek=104592264 count=951 oflag=direct if=/dev/zero of=/dev/sdb
951+0 records in
951+0 records out
486912 bytes (487 kB, 476 KiB) copied, 0.464146 s, 1.0 MB/s
[root@sysresccd /mnt/REPO]# dd skip=104592264 count=951 iflag=direct if=/dev/sdb of=/dev/null
951+0 records in
951+0 records out
486912 bytes (487 kB, 476 KiB) copied, 0.10581 s, 4.6 MB/s
[root@sysresccd /mnt/REPO]# echo count $((89082000-89082175))
count -175
[root@sysresccd /mnt/REPO]# echo first sector without offset $((89082000+1083392))
first sector without offset 90165392
[root@sysresccd /mnt/REPO]# dd seek=90165392 count=175 oflag=direct if=/dev/zero of=/dev/sdb
175+0 records in
175+0 records out
89600 bytes (90 kB, 88 KiB) copied, 0.428108 s, 209 kB/s
[root@sysresccd /mnt/REPO]# dd skip=90165392 count=175 iflag=direct if=/dev/sdb of=/dev/null
175+0 records in
175+0 records out
89600 bytes (90 kB, 88 KiB) copied, 0.0196124 s, 4.6 MB/s
[root@sysresccd /mnt/REPO]# smartctl -A /dev/sdb
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.32-1-lts] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   199   183   051    Pre-fail  Always       -       116548
  3 Spin_Up_Time            0x0027   161   125   021    Pre-fail  Always       -       2908
  4 Start_Stop_Count        0x0032   099   099   000    Old_age   Always       -       1423
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   086   086   000    Old_age   Always       -       10670
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       1423
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       83
193 Load_Cycle_Count        0x0032   177   177   000    Old_age   Always       -       71063
194 Temperature_Celsius     0x0022   112   107   000    Old_age   Always       -       31
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   197   000    Old_age   Always       -       38
198 Offline_Uncorrectable   0x0030   199   198   000    Old_age   Offline      -       285
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   198   000    Old_age   Offline      -       289

Westerd Digital BLUE firmware plead guilty

After nuking the four bad sector ranges, Current_Pending_Sector went down to 38 from the initial 292, but the attributes counting reallocation events and the count of reallocated sectors remained 0. This is an objective hard evidence that the firmware on this disk has either a bug or a fishy built-in misbehavior.

Eventually, one could overwrite the whole disk and zero out the pending sector count, without leaving any trace of past reallocations, thereby letting the disk look much more healthy that it is in reality. Would selling this disk after zeroing out the pending sector count knowing that the reallocation counters will remain zero, be an act of fraud? What about producing and selling disks which expose such behavior?

Second round

I turned my attention back to data recovery, to see how nuking had influenced the recovery with ddrescue.


[root@sysresccd /mnt/REPO]# mv disk-sdb-to-sda.map disk-sdb-to-sda.map.20200517T1426
[root@sysresccd ~]# ddrescue --ask --verbose --binary-prefixes --idirect --retry=20 --force /dev/sdb /dev/sda disk-sdb-to-sda.map
GNU ddrescue 1.25
About to copy 953869 MiBytes
from '/dev/sdb' [UNKNOWN] (1_000_204_886_016)
  to '/dev/sda' [UNKNOWN] (1_000_204_886_016)
Proceed (y/N)? y
    Starting positions: infile = 0 B,  outfile = 0 B
    Copy block size: 128 sectors       Initial skip size: 19584 sectors
Sector size: 512 Bytes

Press Ctrl-C to interrupt
Initial status (read from mapfile)
rescued: 953868 MiB, tried: 1081 KiB, bad-sector: 1081 KiB, bad areas: 16

Current status
     ipos:   2351 MiB, non-trimmed:        0 B,  current rate:       0 B/s
     opos:   2351 MiB, non-scraped:        0 B,  average rate:      88 B/s
non-tried:        0 B,  bad-sector:    68608 B,    error rate:     128 B/s
  rescued: 953869 MiB,   bad areas:       15,        run time:  3h 15m 57s
pct rescued:   99.99%, read errors:     2550,  remaining time:         n/a
                              time since last successful read:      3h 55s
Finished

[root@sysresccd /mnt/REPO]# mkdir round-2
[root@sysresccd /mnt/REPO]# mv *.txt round-2/
[root@sysresccd /mnt/REPO]# mv disk-sdb-to-sda.map.20200517T1426 round-2/

It can be clearly seen in the output that bad sectors have significantly decreased (from 1081 KiB to 68608 B) and the number of bad areas decreased from 16 to 15, and execution time has drastically reduced from 40 hours to around 3. Although the results improved, it should be noted that we permanently lost the content of the four bad sector ranges which were most likely unallocated, but due to the damaged MFT, there is no hard evindence for this. We took a calculated risk in order pinpoint the firmware misbehavior.

The next post will cover a more advanced approach to progress data recovery even further, without overwriting any sector on the failing disk. Stay tuned!

Tuxicate - linux tweaks

Wednesday, 10 June 2020