Context
This is the second post in a series dedicated to documenting a systematic approach of recovering a non-bootable Windows machine with Linux and open source tools. Make sure you read the first post to familiarize yourself with the context and previous activities.
In a nutshell, the investigation was narrowed down to a disk that fails S.M.A.R.T. extended self test, and misreports Reallocated Sector Count
and potentially other S.M.A.R.T attributes due to a firmware bug or intentional "vendor feature".
The LBA (logical block address) of the first read error is known from self test log, and various implications of manually overwriting bad sectors via dd
have been covered.
- The necessity of understanding, what kind of data that particular sector stored (such as file content, directory structure, journal, superblock or unused area) is key to understand the risk of nuking that sector in order to force reallocation
- The chance that repetitively reading the bad sector could eventually succeed, but at the same time stressing a mechanically failing disk for a long duration might render it unusable and make further recovery impossible
- The risk associated with continued production use of a disk that had bad sectors fixed manually in the past but seems to be healthy, as opposed to decommissioning drives right after recovery
First round
The decision has been taken to clone the 1TB disk holding medical/business data to a new 2TB disk, and re-purposing the 1TB disk during the recovery process. Rather than working with disk images, this particular case allowed recovering data from the failing 1TB drive /dev/sdb
to directly this known to be good disk. Time pressure, the lack of spare disk capacity and previous testing of the disks justified this shortcut.
[root@sysresccd ~]# ddrescue --ask --verbose --binary-prefixes --force /dev/sdb /dev/sda disk-sdb-to-sda.map
GNU ddrescue 1.25
About to copy 953869 MiBytes
from '/dev/sdb' [UNKNOWN] (1_000_204_886_016)
to '/dev/sda' [UNKNOWN] (1_000_204_886_016)
Proceed (y/N)? y
Starting positions: infile = 0 B, outfile = 0 B
Copy block size: 128 sectors Initial skip size: 19584 sectors
Sector size: 512 Bytes
Press Ctrl-C to interrupt
ipos: 68402 MiB, non-trimmed: 0 B, current rate: 0 B/s
opos: 68402 MiB, non-scraped: 0 B, average rate: 39780 KiB/s
non-tried: 0 B, bad-sector: 1094 KiB, error rate: 102 B/s
rescued: 953868 MiB, bad areas: 25, run time: 6h 49m 14s
pct rescued: 99.99%, read errors: 2225, remaining time: n/a
time since last successful read: 2h 44m 54s
Finished
[root@sysresccd ~]# ddrescue --ask --verbose --binary-prefixes --idirect --retry=20 --force /dev/sdb /dev/sda disk-sdb-to-sda.map
GNU ddrescue 1.25
About to copy 953869 MiBytes
from '/dev/sdb' [UNKNOWN] (1_000_204_886_016)
to '/dev/sda' [UNKNOWN] (1_000_204_886_016)
Proceed (y/N)? y
Starting positions: infile = 0 B, outfile = 0 B
Copy block size: 128 sectors Initial skip size: 19584 sectors
Sector size: 512 Bytes
Press Ctrl-C to interrupt
Initial status (read from mapfile)
rescued: 953868 MiB, tried: 1094 KiB, bad-sector: 1094 KiB, bad areas: 25
Current status
ipos: 51070 MiB, non-trimmed: 0 B, current rate: 0 B/s
opos: 51070 MiB, non-scraped: 0 B, average rate: 0 B/s
non-tried: 0 B, bad-sector: 1081 KiB, error rate: 0 B/s
rescued: 953868 MiB, bad areas: 16, run time: 1d 17h 14m
pct rescued: 99.99%, read errors: 31116, remaining time: n/a
time since last successful read: 59m 58s
Retrying bad sectors... Retry 15 (forwards)^C
Interrupted by user
The first invocation of ddrescue
executed a rather conservative swipe through the disk, copying as much data as possible without retrying bad sectors. It is worth to read the algorithm which defines how the disk is processed.
The second invocation reused the map file created by the first invocation, which allows focusing efforts on the failed areas only, used --idirect
to bypass kernel caches, that is, force direct disk access on each attempt to read, and defined a limit of 20 retries. The execution of this second invocation was aborted after having spent a total of 48 hours with these two invocations.
Looking at the output, it can be seen that some progress was made during the second invocation, decreasing bad areas from 25 to 16, and from a total of 1094 Kilobytes of bad sectors to 1081 KiB. My expectation at this time was that any further long running process would be exponentially less and less efficient, so it is time to understand what pragmatic options we have to get the best results before the available time window closes.
Evaluating the impact of our bad sectors
As indicated above, it is key to understand what the disk area containing bad sectors is mapped to, or used by the system. This information will allow us to further reduce the scope of recovery by eliminating recovery of unallocated blocks, and prioritize used blocks based on the type of information they store.
The remaining free space on the 2TB disk was used to create an Ext4 partition to serve as an image and file repository during the recovery process, it has been mounted to /mnt/REPO
.
[root@sysresccd ~]# cp disk-sdb-to-sda.map /mnt/REPO/disk-sdb-to-sda.map
[root@sysresccd ~]# cd /mnt/REPO/
[root@sysresccd /mnt/REPO]# ddrescuelog --binary-prefixes -A disk-sdb-to-sda.map | grep -
# Command line: ddrescuelog --binary-prefixes -A disk-sdb-to-sda.map
# Start time: 2020-05-17 13:35:55
0xC77EA1C00 - 15 # 51070 MiB
0x92FF0000 0x00001000 - # 2351 MiB 4096
0x13A5A8000 0x00001000 - # 5029 MiB 4096
0x13A72C000 0x00001000 - # 5031 MiB 4096
0x13B0FC000 0x00001000 - # 5040 MiB 4096
0x13D9CB000 0x00001000 - # 5081 MiB 4096
0x1541DA000 0x00001000 - # 5441 MiB 4096
0x22328C000 0x00001000 - # 8754 MiB 4096
0x228D72000 0x00000400 - # 8845 MiB 1024
0x228D76200 0x00000200 - # 8845 MiB 512
0x228DCD000 0x00001000 - # 8845 MiB 4096
0x5379F4000 0x00002000 - # 21369 MiB 8192
0x5CAACA000 0x00006000 - # 23722 MiB 24576
0xABFA12000 0x00016000 - # 44026 MiB 90112
0xC77E71000 0x00077000 - # 51070 MiB 487424
0xC7DF3E000 0x00060000 - # 51167 MiB 393216
0x10B3261000 0x00011000 - # 68402 MiB 69632
[root@sysresccd /mnt/REPO]# ddrescuelog --list-blocks=- disk-sdb-to-sda.map > bad-sectors-sdb.txt
The command ddrescuelog
was used to read the map file and enumerate bad areas along with their starting position and length. It can be seen, that in most of the cases, bad areas are 4K blocks. The last 5 bad areas are much larger than others, so eliminating some of those would significantly improve the efficiency of the recovery process, and thereby our chances for a success recovery.
In addition to displaying the byte based offset and size of bad areas, the same command was used to create a simple list of bad sectors which was then used to calculate the offset of bad sector ranges within the affected partitions. In fact, a single partition as it turned out.
[root@sysresccd /mnt/REPO]# gdisk -l /dev/sda
GPT fdisk (gdisk) version 1.0.5
Partition table scan:
MBR: protective
BSD: not present
APM: not present
GPT: present
Found valid GPT with protective MBR; using GPT.
Disk /dev/sda: 1953525168 sectors, 931.5 GiB
Model: WDC WD10EFRX-68F
Sector size (logical/physical): 512/4096 bytes
Disk identifier (GUID): 848E364C-322B-45A8-8685-073A18109019
Partition table holds up to 128 entries
Main partition table begins at sector 2 and ends at sector 33
First usable sector is 34, last usable sector is 1953525134
Partitions will be aligned on 2048-sector boundaries
Total free space is 1533013357 sectors (731.0 GiB)
Number Start (sector) End (sector) Size Code Name
1 2048 616447 300.0 MiB 2700 Basic data partition
2 616448 821247 100.0 MiB EF00 EFI system partition
3 821248 1083391 128.0 MiB 0C01 Microsoft reserved ...
4 1083392 210798591 100.0 GiB 0700 Basic data partition
5 210798592 420513791 100.0 GiB 0700
[root@sysresccd /mnt/REPO]# cat bad-sectors-sdb.txt | while read lba; do echo $(($lba - 1083392)); done >> bad-sectors-sdb4.txt
As the careful reader will have noticed, partition information and data is read from the physically healthy /dev/sda
.
Based on the first and last bad sector it was confirmed that bad areas are within partition /dev/sdb4
, which is the "C:" partition of Windows. The last line of the command listing above shows how to translate the raw sector based disk offsets to relative offsets within the partition.
Identifying files in bad sector ranges
Identifying which files, or what part of file sysem is allocated to a given sector range could typically be done using dumpe2fs
and debuge2fs
in tandem in the case of Linux Ext file systems, however, in this case we are dealing with NTFS. A classic universal option would have been using ifind from The Sleuth Kit, an open source collection of forensics tools, but I decided to stick to utilities that are part of the already running SystemResqueCd as much as possible, to avoid any delay in my workflow. I ended up using ntfscluster
to achieve the goal.
[root@sysresccd /mnt/REPO]# cat bad-sector-ranges-sdb4.txt
3733376-3733383
9217344-9217351
9220448-9220455
9240544-9240551
9324120-9324127
10061520-10061527
16845920-16845927
17032080-17032081
17032113
17032808-17032815
42682272-42682287
47500880-47500927
89082000-89082175
103508872-103509823
103707120-103707887
139004680-139004815
[root@sysresccd /mnt/REPO]# ntfscluster -s 139004680-139004815 /dev/sda4
...
Inode 222099 is an extent of inode 285447.
Inode 222116 is an extent of inode 285450.
Inode 222120 is an extent of inode 285452.
Inode 222132 is an extent of inode 285454.
ntfs_mst_post_read_fixup_warn: magic: 0x788627c2 size: 1024 usa_ofs: 5592 usa_count: 63093: Invalid argument
Record 222148 has no FILE magic (0x788627c2)
ntfs_mst_post_read_fixup_warn: magic: 0x788627c2 size: 1024 usa_ofs: 5592 usa_count: 63093: Invalid argument
Record 222148 has no FILE magic (0x788627c2)
Error reading inode 222148.
ntfs_mst_post_read_fixup_warn: magic: 0x1dcddcd1 size: 1024 usa_ofs: 15035 usa_count: 45642: Invalid argument
Record 222149 has no FILE magic (0x1dcddcd1)
ntfs_mst_post_read_fixup_warn: magic: 0x1dcddcd1 size: 1024 usa_ofs: 15035 usa_count: 45642: Invalid argument
Record 222149 has no FILE magic (0x1dcddcd1)
Error reading inode 222149.
ntfs_mst_post_read_fixup_warn: magic: 0x3e9913ab size: 1024 usa_ofs: 36947 usa_count: 6616: Invalid argument
Record 222150 has no FILE magic (0x3e9913ab)
ntfs_mst_post_read_fixup_warn: magic: 0x3e9913ab size: 1024 usa_ofs: 36947 usa_count: 6616: Invalid argument
Record 222150 has no FILE magic (0x3e9913ab)
Error reading inode 222150.
ntfs_mst_post_read_fixup_warn: magic: 0xb9942bfd size: 1024 usa_ofs: 52375 usa_count: 32401: Invalid argument
Record 222151 has no FILE magic (0xb9942bfd)
ntfs_mst_post_read_fixup_warn: magic: 0xb9942bfd size: 1024 usa_ofs: 52375 usa_count: 32401: Invalid argument
Record 222151 has no FILE magic (0xb9942bfd)
Error reading inode 222151.
Inode 222177 is an extent of inode 285458.
Inode 222188 is an extent of inode 58416.
Inode 222191 is an extent of inode 241635.
...
Inode 389780 is an extent of inode 386451.
* no inode found
The command ntfscluster
was run to display information on the use of the last bad sector range. According to the output, no file data stream allocation could be found within that range, however, there were errors reading 4 MFT entries 222148-222151. At this point, it is uncertain whether or not files are allocated to the disk area in question, but there is certainly an issue with file system integrity which needs to be addressed. This issue was put on the parking lane in order to assemble a list of files certainly affected by the other bad sector ranges.
Read on, and check out the next post in the series.
No comments:
Post a Comment