Non-bootable Windows recovery (Part 2)

Context

This is the second post in a series dedicated to documenting a systematic approach of recovering a non-bootable Windows machine with Linux and open source tools. Make sure you read the first post to familiarize yourself with the context and previous activities.

In a nutshell, the investigation was narrowed down to a disk that fails S.M.A.R.T. extended self test, and misreports Reallocated Sector Count and potentially other S.M.A.R.T attributes due to a firmware bug or intentional "vendor feature".

The LBA (logical block address) of the first read error is known from self test log, and various implications of manually overwriting bad sectors via dd have been covered.

The necessity of understanding, what kind of data that particular sector stored (such as file content, directory structure, journal, superblock or unused area) is key to understand the risk of nuking that sector in order to force reallocation
The chance that repetitively reading the bad sector could eventually succeed, but at the same time stressing a mechanically failing disk for a long duration might render it unusable and make further recovery impossible
The risk associated with continued production use of a disk that had bad sectors fixed manually in the past but seems to be healthy, as opposed to decommissioning drives right after recovery

First round

The decision has been taken to clone the 1TB disk holding medical/business data to a new 2TB disk, and re-purposing the 1TB disk during the recovery process. Rather than working with disk images, this particular case allowed recovering data from the failing 1TB drive /dev/sdb to directly this known to be good disk. Time pressure, the lack of spare disk capacity and previous testing of the disks justified this shortcut.


[root@sysresccd ~]# ddrescue --ask --verbose --binary-prefixes --force /dev/sdb /dev/sda disk-sdb-to-sda.map
GNU ddrescue 1.25
About to copy 953869 MiBytes
from '/dev/sdb' [UNKNOWN] (1_000_204_886_016)
  to '/dev/sda' [UNKNOWN] (1_000_204_886_016)
Proceed (y/N)? y
    Starting positions: infile = 0 B,  outfile = 0 B
    Copy block size: 128 sectors       Initial skip size: 19584 sectors
Sector size: 512 Bytes

Press Ctrl-C to interrupt
     ipos:  68402 MiB, non-trimmed:        0 B,  current rate:       0 B/s
     opos:  68402 MiB, non-scraped:        0 B,  average rate: 39780 KiB/s
non-tried:        0 B,  bad-sector:   1094 KiB,    error rate:     102 B/s
  rescued: 953868 MiB,   bad areas:       25,        run time:  6h 49m 14s
pct rescued:   99.99%, read errors:     2225,  remaining time:         n/a
                              time since last successful read:  2h 44m 54s
Finished                                  

[root@sysresccd ~]# ddrescue --ask --verbose --binary-prefixes --idirect --retry=20 --force /dev/sdb /dev/sda disk-sdb-to-sda.map
GNU ddrescue 1.25
About to copy 953869 MiBytes
from '/dev/sdb' [UNKNOWN] (1_000_204_886_016)
  to '/dev/sda' [UNKNOWN] (1_000_204_886_016)
Proceed (y/N)? y
    Starting positions: infile = 0 B,  outfile = 0 B
    Copy block size: 128 sectors       Initial skip size: 19584 sectors
Sector size: 512 Bytes

Press Ctrl-C to interrupt
Initial status (read from mapfile)
rescued: 953868 MiB, tried: 1094 KiB, bad-sector: 1094 KiB, bad areas: 25

Current status
     ipos:  51070 MiB, non-trimmed:        0 B,  current rate:       0 B/s
     opos:  51070 MiB, non-scraped:        0 B,  average rate:       0 B/s
non-tried:        0 B,  bad-sector:   1081 KiB,    error rate:       0 B/s
  rescued: 953868 MiB,   bad areas:       16,        run time:  1d 17h 14m
pct rescued:   99.99%, read errors:    31116,  remaining time:         n/a
                              time since last successful read:     59m 58s
Retrying bad sectors... Retry 15 (forwards)^C  
  Interrupted by user

The first invocation of ddrescue executed a rather conservative swipe through the disk, copying as much data as possible without retrying bad sectors. It is worth to read the algorithm which defines how the disk is processed.

The second invocation reused the map file created by the first invocation, which allows focusing efforts on the failed areas only, used --idirect to bypass kernel caches, that is, force direct disk access on each attempt to read, and defined a limit of 20 retries. The execution of this second invocation was aborted after having spent a total of 48 hours with these two invocations.

Looking at the output, it can be seen that some progress was made during the second invocation, decreasing bad areas from 25 to 16, and from a total of 1094 Kilobytes of bad sectors to 1081 KiB. My expectation at this time was that any further long running process would be exponentially less and less efficient, so it is time to understand what pragmatic options we have to get the best results before the available time window closes.

Evaluating the impact of our bad sectors

As indicated above, it is key to understand what the disk area containing bad sectors is mapped to, or used by the system. This information will allow us to further reduce the scope of recovery by eliminating recovery of unallocated blocks, and prioritize used blocks based on the type of information they store.

The remaining free space on the 2TB disk was used to create an Ext4 partition to serve as an image and file repository during the recovery process, it has been mounted to /mnt/REPO.


[root@sysresccd ~]# cp disk-sdb-to-sda.map /mnt/REPO/disk-sdb-to-sda.map
[root@sysresccd ~]# cd /mnt/REPO/
[root@sysresccd /mnt/REPO]# ddrescuelog --binary-prefixes -A disk-sdb-to-sda.map | grep -
# Command line: ddrescuelog --binary-prefixes -A disk-sdb-to-sda.map
# Start time:   2020-05-17 13:35:55
0xC77EA1C00     -               15 #  51070 MiB
0x92FF0000  0x00001000  - #    2351 MiB      4096  
0x13A5A8000  0x00001000  - #    5029 MiB      4096  
0x13A72C000  0x00001000  - #    5031 MiB      4096  
0x13B0FC000  0x00001000  - #    5040 MiB      4096  
0x13D9CB000  0x00001000  - #    5081 MiB      4096  
0x1541DA000  0x00001000  - #    5441 MiB      4096  
0x22328C000  0x00001000  - #    8754 MiB      4096  
0x228D72000  0x00000400  - #    8845 MiB      1024  
0x228D76200  0x00000200  - #    8845 MiB       512  
0x228DCD000  0x00001000  - #    8845 MiB      4096  
0x5379F4000  0x00002000  - #   21369 MiB      8192  
0x5CAACA000  0x00006000  - #   23722 MiB     24576  
0xABFA12000  0x00016000  - #   44026 MiB     90112  
0xC77E71000  0x00077000  - #   51070 MiB    487424  
0xC7DF3E000  0x00060000  - #   51167 MiB    393216  
0x10B3261000  0x00011000  - #   68402 MiB     69632  

[root@sysresccd /mnt/REPO]# ddrescuelog --list-blocks=- disk-sdb-to-sda.map > bad-sectors-sdb.txt

The command ddrescuelog was used to read the map file and enumerate bad areas along with their starting position and length. It can be seen, that in most of the cases, bad areas are 4K blocks. The last 5 bad areas are much larger than others, so eliminating some of those would significantly improve the efficiency of the recovery process, and thereby our chances for a success recovery.

In addition to displaying the byte based offset and size of bad areas, the same command was used to create a simple list of bad sectors which was then used to calculate the offset of bad sector ranges within the affected partitions. In fact, a single partition as it turned out.


[root@sysresccd /mnt/REPO]# gdisk -l /dev/sda
GPT fdisk (gdisk) version 1.0.5

Partition table scan:
  MBR: protective
  BSD: not present
  APM: not present
  GPT: present

Found valid GPT with protective MBR; using GPT.
Disk /dev/sda: 1953525168 sectors, 931.5 GiB
Model: WDC WD10EFRX-68F
Sector size (logical/physical): 512/4096 bytes
Disk identifier (GUID): 848E364C-322B-45A8-8685-073A18109019
Partition table holds up to 128 entries
Main partition table begins at sector 2 and ends at sector 33
First usable sector is 34, last usable sector is 1953525134
Partitions will be aligned on 2048-sector boundaries
Total free space is 1533013357 sectors (731.0 GiB)

Number  Start (sector)    End (sector)  Size       Code  Name
   1            2048          616447   300.0 MiB   2700  Basic data partition
   2          616448          821247   100.0 MiB   EF00  EFI system partition
   3          821248         1083391   128.0 MiB   0C01  Microsoft reserved ...
   4         1083392       210798591   100.0 GiB   0700  Basic data partition
   5       210798592       420513791   100.0 GiB   0700  
[root@sysresccd /mnt/REPO]# cat bad-sectors-sdb.txt | while read lba; do echo $(($lba - 1083392)); done >> bad-sectors-sdb4.txt

As the careful reader will have noticed, partition information and data is read from the physically healthy /dev/sda.

Based on the first and last bad sector it was confirmed that bad areas are within partition /dev/sdb4, which is the "C:" partition of Windows. The last line of the command listing above shows how to translate the raw sector based disk offsets to relative offsets within the partition.

Identifying files in bad sector ranges

Identifying which files, or what part of file sysem is allocated to a given sector range could typically be done using dumpe2fs and debuge2fs in tandem in the case of Linux Ext file systems, however, in this case we are dealing with NTFS. A classic universal option would have been using ifind from The Sleuth Kit, an open source collection of forensics tools, but I decided to stick to utilities that are part of the already running SystemResqueCd as much as possible, to avoid any delay in my workflow. I ended up using ntfscluster to achieve the goal.


[root@sysresccd /mnt/REPO]# cat bad-sector-ranges-sdb4.txt 
3733376-3733383
9217344-9217351
9220448-9220455
9240544-9240551
9324120-9324127
10061520-10061527
16845920-16845927
17032080-17032081
17032113
17032808-17032815
42682272-42682287
47500880-47500927
89082000-89082175
103508872-103509823
103707120-103707887
139004680-139004815

[root@sysresccd /mnt/REPO]# ntfscluster -s 139004680-139004815 /dev/sda4
...
Inode 222099 is an extent of inode 285447.
Inode 222116 is an extent of inode 285450.
Inode 222120 is an extent of inode 285452.
Inode 222132 is an extent of inode 285454.
ntfs_mst_post_read_fixup_warn: magic: 0x788627c2  size: 1024   usa_ofs: 5592  usa_count: 63093: Invalid argument
Record 222148 has no FILE magic (0x788627c2)
ntfs_mst_post_read_fixup_warn: magic: 0x788627c2  size: 1024   usa_ofs: 5592  usa_count: 63093: Invalid argument
Record 222148 has no FILE magic (0x788627c2)
Error reading inode 222148.
ntfs_mst_post_read_fixup_warn: magic: 0x1dcddcd1  size: 1024   usa_ofs: 15035  usa_count: 45642: Invalid argument
Record 222149 has no FILE magic (0x1dcddcd1)
ntfs_mst_post_read_fixup_warn: magic: 0x1dcddcd1  size: 1024   usa_ofs: 15035  usa_count: 45642: Invalid argument
Record 222149 has no FILE magic (0x1dcddcd1)
Error reading inode 222149.
ntfs_mst_post_read_fixup_warn: magic: 0x3e9913ab  size: 1024   usa_ofs: 36947  usa_count: 6616: Invalid argument
Record 222150 has no FILE magic (0x3e9913ab)
ntfs_mst_post_read_fixup_warn: magic: 0x3e9913ab  size: 1024   usa_ofs: 36947  usa_count: 6616: Invalid argument
Record 222150 has no FILE magic (0x3e9913ab)
Error reading inode 222150.
ntfs_mst_post_read_fixup_warn: magic: 0xb9942bfd  size: 1024   usa_ofs: 52375  usa_count: 32401: Invalid argument
Record 222151 has no FILE magic (0xb9942bfd)
ntfs_mst_post_read_fixup_warn: magic: 0xb9942bfd  size: 1024   usa_ofs: 52375  usa_count: 32401: Invalid argument
Record 222151 has no FILE magic (0xb9942bfd)
Error reading inode 222151.
Inode 222177 is an extent of inode 285458.
Inode 222188 is an extent of inode 58416.
Inode 222191 is an extent of inode 241635.
...
Inode 389780 is an extent of inode 386451.
* no inode found

The command ntfscluster was run to display information on the use of the last bad sector range. According to the output, no file data stream allocation could be found within that range, however, there were errors reading 4 MFT entries 222148-222151. At this point, it is uncertain whether or not files are allocated to the disk area in question, but there is certainly an issue with file system integrity which needs to be addressed. This issue was put on the parking lane in order to assemble a list of files certainly affected by the other bad sector ranges.

Read on, and check out the next post in the series.

Tuxicate - linux tweaks

Tuesday 9 June 2020