Context
This is the third post in a series dedicated to documenting a systematic approach of recovering a non-bootable Windows machine with Linux and open source tools. Make sure you read the first and second to familiarize yourself with the context and previous activities.
In a nutshell, the investigation was narrowed down to a disk that fails S.M.A.R.T. extended self test, and misreports Reallocated Sector Count
. An initial invocation of ddrescue
copied most of the unhealthy disk's content to a healthy destination in almost 7 hours, and a second pass ran for more than 40 hours to focus on the unreadable areas only, which has successfully squeezed a few more sectors out of the failing disk.
It was demonstrated, how to:
- calculate the remaining bad sector ranged from the map file of
ddrescue
- identify affected partitions
- convert raw offsets to partition-relative sectors
- find out, what one bad sector range is used for by the filesystem, in this case, NTFS
Previous posts stressed the importance of understanding, what kind of data a bad sector range stored (such as file content, directory structure, journal, superblock or unused area) in order to evaluate the impact of permanently loosing that piece of data.
During the investigation of the bad sector range there were errors reading 4 MFT entries 222148-222151. Addressing this potential file system corruption was postponed to a later phase of the analysis.
Carry on with identifying affected files
A total of 1081 kilobytes of bad sectory remained, spread across 16 areas of the disk.
[root@sysresccd /mnt/REPO]# cat bad-sector-ranges-sdb4.txt
3733376-3733383
9217344-9217351
9220448-9220455
9240544-9240551
9324120-9324127
10061520-10061527
16845920-16845927
17032080-17032081
17032113
17032808-17032815
42682272-42682287
47500880-47500927
89082000-89082175
103508872-103509823
103707120-103707887
139004680-139004815
[root@sysresccd /mnt/REPO]# cat bad-sector-ranges-sdb4.txt | while read range; do ntfscluster -s $range /dev/sda4; done > affected-files-sdb4.txt
[root@sysresccd /mnt/REPO]# cat affected-files-sdb4.txt
Searching for sector range 3733376-3733383
Inode 112588 /ProgramData/Norton/{0C55C096-0F1D-4F28-AAA2-85EF591126E7}/NS_22.6.0.142/NCW/hlinks/ncwperfm.db.data/$DATA
* one inode found
Searching for sector range 9217344-9217351
* no inode found
Searching for sector range 9220448-9220455
* no inode found
Searching for sector range 9240544-9240551
Inode 248863 /Windows/ServiceProfiles/LocalService/AppData/Roaming/PeerNetworking/91e9854e5d4d7ac47cd0bf3ad8003817/922c235eb3d1d91e502cb02599cc24c7/grouping/edb02861.log/$DATA
* one inode found
Searching for sector range 9324120-9324127
Inode 96681 /$Extend/$UsnJrnl/$DATA($J)
* one inode found
Searching for sector range 10061520-10061527
Inode 0 /$MFT/$DATA
* one inode found
Searching for sector range 16845920-16845927
Inode 112588 /ProgramData/Norton/{0C55C096-0F1D-4F28-AAA2-85EF591126E7}/NS_22.6.0.142/NCW/hlinks/ncwperfm.db.data/$DATA
* one inode found
Searching for sector range 17032080-17032081
Inode 216164 /Windows/WinSxS/Catalogs/824be8ecd9b27e0ee92936dfba7c9f759df9d2e2999542e22d6f3bacab5b6140.cat/$DATA
* one inode found
Searching for sector 17032113
Inode 133052 /Program Files (x86)/Common Files/Adobe/OOBE/PDApp/P6/ZStringResources/es_LA/stringtable.xml/$DATA
* one inode found
Searching for sector range 17032808-17032815
Inode 75215 /Windows/WinSxS/Manifests/amd64_microsoft-windows-c..rintscan-deployment_31bf3856ad364e35_6.3.9600.16384_none_5cfa2773ab911611.manifest/$DATA
* one inode found
Searching for sector range 42682272-42682287
* no inode found
Searching for sector range 47500880-47500927
* no inode found
Searching for sector range 89082000-89082175
* no inode found
Searching for sector range 103508872-103509823
* no inode found
Searching for sector range 103707120-103707887
* no inode found
Searching for sector range 139004680-139004815
* no inode found
Interpreting the results and taking decisions
The impact of the unreadable sectors could be summarized as follows, without going into detail:
- Norton Antivirus is affected, a single file at two distant areas on the disk
- Windows networking related file is affected
- Two of Windows Side-by-Side assemblies are affected
- Adobe OOBE Latin American Spanish language file is affected
- The NTFS journal and Master File Table are affected, the damaged MFT suggests that the list of affected files might be incomplete
- Many large bad areas are reported as unallocated by the file system, but this is not a hard fact since the MFT is damaged
In order to make further progress with data recover within a realistic timeframe, the decision was made to take calculated risks and nuke the last 4 bad areas, which are the largest by size. This carries risks described in the previous post, and the next post will show a more risk-avoiding approach to narrowing down the the area the recovery of which will be attempted.
The advantage of nuking the last four bad sector ranges is the ability to obverse how Reallocated_Sector_Ct
and Current_Pending_Sector
are updated in response to the forced reallocation, and prove the Western Digital firmware bug.
Nuking the selected bad sector ranges
The first and last sector relative to the partition was used to calculate the lenght of the bad area and then the offset of the partition was added to the offset of the first sector of the sector range within the partition to calculate the starting position within the disk. Prior experience has shown that operating on the disk /dev/sdb
is preferred to operating on the partition./dev/sdb4
. Further, output flag direct
should be used skip any caching logic, and a block size one sector, that is, 512 bytes has to be applied.
[root@sysresccd /mnt/REPO]# echo $((139004815-139004680))
135
[root@sysresccd /mnt/REPO]# echo $((139004680+1083392))
140088072
[root@sysresccd /mnt/REPO]# dd seek=140088072 count=135 oflag=direct if=/dev/zero of=/dev/sdb
135+0 records in
135+0 records out
69120 bytes (69 kB, 68 KiB) copied, 0.427308 s, 162 kB/s
[root@sysresccd /mnt/REPO]# smartctl -a /dev/sdb
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.32-1-lts] (local build)
...
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
...
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 199 197 000 Old_age Always - 275
198 Offline_Uncorrectable 0x0030 199 198 000 Old_age Offline - 285
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 198 000 Old_age Offline - 289
...
[root@sysresccd /mnt/REPO]# dd skip=140088072 count=135 iflag=direct if=/dev/sdb of=/dev/null
135+0 records in
135+0 records out
69120 bytes (69 kB, 68 KiB) copied, 0.0161237 s, 4.3 MB/s
It can be seen that no error was encoutered when overwriting the given sectors. The number of pendig sectors decreased from 292 ro 275, however, both Reallocated_Event_Count
and Reallocated_Sector_Ct
remained 0. With that, the false reporting of reallocations on this Western Digital Blue disk is confirmed. The careful reader should have noticed, that a read test was also done on the freshly overwritten disk area to confirm successful reallocation.
Next, the remaining three bad sector ranges were nuked.
[root@sysresccd /mnt/REPO]# echo count $((103707887-103707120))
count 767
[root@sysresccd /mnt/REPO]# echo first sector without offset $((103707120+1083392))
first sector without offset 104790512
[root@sysresccd /mnt/REPO]# dd seek=104790512 count=767 oflag=direct if=/dev/zero of=/dev/sdb
767+0 records in
767+0 records out
392704 bytes (393 kB, 384 KiB) copied, 0.454273 s, 864 kB/s
[root@sysresccd /mnt/REPO]# dd skip=104790512 count=767 iflag=direct if=/dev/sdb of=/dev/null
767+0 records in
767+0 records out
392704 bytes (393 kB, 384 KiB) copied, 0.0849093 s, 4.6 MB/s
[root@sysresccd /mnt/REPO]# echo count $((103508872-103509823))
count -951
[root@sysresccd /mnt/REPO]# echo first sector without offset $((103508872+1083392))
first sector without offset 104592264
[root@sysresccd /mnt/REPO]# dd seek=104592264 count=951 oflag=direct if=/dev/zero of=/dev/sdb
951+0 records in
951+0 records out
486912 bytes (487 kB, 476 KiB) copied, 0.464146 s, 1.0 MB/s
[root@sysresccd /mnt/REPO]# dd skip=104592264 count=951 iflag=direct if=/dev/sdb of=/dev/null
951+0 records in
951+0 records out
486912 bytes (487 kB, 476 KiB) copied, 0.10581 s, 4.6 MB/s
[root@sysresccd /mnt/REPO]# echo count $((89082000-89082175))
count -175
[root@sysresccd /mnt/REPO]# echo first sector without offset $((89082000+1083392))
first sector without offset 90165392
[root@sysresccd /mnt/REPO]# dd seek=90165392 count=175 oflag=direct if=/dev/zero of=/dev/sdb
175+0 records in
175+0 records out
89600 bytes (90 kB, 88 KiB) copied, 0.428108 s, 209 kB/s
[root@sysresccd /mnt/REPO]# dd skip=90165392 count=175 iflag=direct if=/dev/sdb of=/dev/null
175+0 records in
175+0 records out
89600 bytes (90 kB, 88 KiB) copied, 0.0196124 s, 4.6 MB/s
[root@sysresccd /mnt/REPO]# smartctl -A /dev/sdb
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.32-1-lts] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 199 183 051 Pre-fail Always - 116548
3 Spin_Up_Time 0x0027 161 125 021 Pre-fail Always - 2908
4 Start_Stop_Count 0x0032 099 099 000 Old_age Always - 1423
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 086 086 000 Old_age Always - 10670
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 1423
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 83
193 Load_Cycle_Count 0x0032 177 177 000 Old_age Always - 71063
194 Temperature_Celsius 0x0022 112 107 000 Old_age Always - 31
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 197 000 Old_age Always - 38
198 Offline_Uncorrectable 0x0030 199 198 000 Old_age Offline - 285
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 198 000 Old_age Offline - 289
Westerd Digital BLUE firmware plead guilty
After nuking the four bad sector ranges, Current_Pending_Sector
went down to 38 from the initial 292, but the attributes counting reallocation events and the count of reallocated sectors remained 0. This is an objective hard evidence that the firmware on this disk has either a bug or a fishy built-in misbehavior.
Eventually, one could overwrite the whole disk and zero out the pending sector count, without leaving any trace of past reallocations, thereby letting the disk look much more healthy that it is in reality. Would selling this disk after zeroing out the pending sector count knowing that the reallocation counters will remain zero, be an act of fraud? What about producing and selling disks which expose such behavior?
Second round
I turned my attention back to data recovery, to see how nuking had influenced the recovery with ddrescue
.
[root@sysresccd /mnt/REPO]# mv disk-sdb-to-sda.map disk-sdb-to-sda.map.20200517T1426
[root@sysresccd ~]# ddrescue --ask --verbose --binary-prefixes --idirect --retry=20 --force /dev/sdb /dev/sda disk-sdb-to-sda.map
GNU ddrescue 1.25
About to copy 953869 MiBytes
from '/dev/sdb' [UNKNOWN] (1_000_204_886_016)
to '/dev/sda' [UNKNOWN] (1_000_204_886_016)
Proceed (y/N)? y
Starting positions: infile = 0 B, outfile = 0 B
Copy block size: 128 sectors Initial skip size: 19584 sectors
Sector size: 512 Bytes
Press Ctrl-C to interrupt
Initial status (read from mapfile)
rescued: 953868 MiB, tried: 1081 KiB, bad-sector: 1081 KiB, bad areas: 16
Current status
ipos: 2351 MiB, non-trimmed: 0 B, current rate: 0 B/s
opos: 2351 MiB, non-scraped: 0 B, average rate: 88 B/s
non-tried: 0 B, bad-sector: 68608 B, error rate: 128 B/s
rescued: 953869 MiB, bad areas: 15, run time: 3h 15m 57s
pct rescued: 99.99%, read errors: 2550, remaining time: n/a
time since last successful read: 3h 55s
Finished
[root@sysresccd /mnt/REPO]# mkdir round-2
[root@sysresccd /mnt/REPO]# mv *.txt round-2/
[root@sysresccd /mnt/REPO]# mv disk-sdb-to-sda.map.20200517T1426 round-2/
It can be clearly seen in the output that bad sectors have significantly decreased (from 1081 KiB to 68608 B) and the number of bad areas decreased from 16 to 15, and execution time has drastically reduced from 40 hours to around 3. Although the results improved, it should be noted that we permanently lost the content of the four bad sector ranges which were most likely unallocated, but due to the damaged MFT, there is no hard evindence for this. We took a calculated risk in order pinpoint the firmware misbehavior.
The next post will cover a more advanced approach to progress data recovery even further, without overwriting any sector on the failing disk. Stay tuned!
No comments:
Post a Comment