aboutsummaryrefslogtreecommitdiff
path: root/custom/RESCUE-GUIDE.txt
blob: 95873813e332a207a01ab5e0962b5f3028edae55 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
================================================================================
                         ARCHZFS RESCUE GUIDE
================================================================================

This guide covers common rescue and recovery scenarios. For quick command
reference, use: tldr <command>

Table of Contents:
  1. ZFS Recovery
  2. Data Recovery
  3. Boot Repair
  4. Windows Recovery
  5. Hardware Diagnostics
  6. Disk Operations
  7. Network Troubleshooting

================================================================================
1. ZFS RECOVERY
================================================================================

QUICK REFERENCE
---------------
  tldr zfs          # ZFS filesystem commands
  tldr zpool        # ZFS pool commands
  man zfs           # Full ZFS manual
  man zpool         # Full zpool manual

SCENARIO: Import a pool from another system
-------------------------------------------
List pools available for import:

  zpool import

Import a specific pool:

  zpool import poolname

If the pool was not cleanly exported (e.g., system crash):

  zpool import -f poolname

Import with a different name (to avoid conflicts):

  zpool import oldname newname


SCENARIO: Pool won't import - "pool may be in use"
--------------------------------------------------
Force import (use when you know it's safe):

  zpool import -f poolname

If that fails, try recovery mode:

  zpool import -F poolname

Last resort - import read-only to recover data:

  zpool import -o readonly=on poolname


SCENARIO: Check pool health and repair
--------------------------------------
Check pool status:

  zpool status poolname

Start a scrub (checks all data, can take hours):

  zpool scrub poolname

Check scrub progress:

  zpool status poolname

Clear transient errors after fixing hardware:

  zpool clear poolname


SCENARIO: Recover from snapshot / Rollback
------------------------------------------
List all snapshots:

  zfs list -t snapshot

Rollback to a snapshot (destroys changes since snapshot):

  zfs rollback poolname/dataset@snapshot

For snapshots with intermediate snapshots, use -r:

  zfs rollback -r poolname/dataset@snapshot


SCENARIO: Copy data from ZFS pool
---------------------------------
Mount datasets if not auto-mounted:

  zfs mount -a

Or mount specific dataset:

  zfs set mountpoint=/mnt/recovery poolname/dataset
  zfs mount poolname/dataset

Copy with rsync (preserves permissions, shows progress):

  rsync -avP --progress /mnt/recovery/ /destination/


SCENARIO: Send/Receive snapshots (backup/migrate)
-------------------------------------------------
Create a snapshot first:

  zfs snapshot poolname/dataset@backup

Send to a file (local backup):

  zfs send poolname/dataset@backup > /path/to/backup.zfs

Send with progress indicator:

  zfs send poolname/dataset@backup | pv > /path/to/backup.zfs

Send to another pool locally:

  zfs send poolname/dataset@backup | zfs recv newpool/dataset

Send to remote system over SSH:

  zfs send poolname/dataset@backup | ssh user@remote zfs recv pool/dataset

With progress and buffering for network transfers:

  zfs send poolname/dataset@backup | pv | mbuffer -s 128k -m 1G | \
    ssh user@remote "mbuffer -s 128k -m 1G | zfs recv pool/dataset"


SCENARIO: Encrypted pool - unlock and mount
-------------------------------------------
Load the encryption key (will prompt for passphrase):

  zfs load-key poolname

Or for all encrypted datasets:

  zfs load-key -a

Then mount:

  zfs mount -a


SCENARIO: Replace failed drive in mirror/raidz
----------------------------------------------
Check which drive failed:

  zpool status poolname

Replace the drive (assuming /dev/sdc is new drive):

  zpool replace poolname /dev/old-drive /dev/sdc

Monitor resilver progress:

  zpool status poolname


SCENARIO: See what's using a dataset (before unmount)
-----------------------------------------------------
Check what processes have files open:

  lsof /mountpoint

Or for all ZFS mounts:

  lsof | grep poolname


USEFUL ZFS COMMANDS
-------------------
  zpool status              # Pool health overview
  zpool list                # Pool capacity
  zpool history poolname    # Command history
  zfs list                  # All datasets
  zfs list -t snapshot      # All snapshots
  zfs get all poolname      # All properties
  zdb -l /dev/sdX           # Low-level pool label info


================================================================================
2. DATA RECOVERY
================================================================================

QUICK REFERENCE
---------------
  tldr ddrescue     # Clone failing drives
  tldr testdisk     # Partition/file recovery
  tldr photorec     # Recover deleted files by type
  tldr smartctl     # Check drive health

FIRST: Assess drive health before recovery
------------------------------------------
Check if drive is failing (SMART data):

  smartctl -H /dev/sdX              # Quick health check
  smartctl -a /dev/sdX              # Full SMART report

Key things to look for:
  - "PASSED" vs "FAILED" health status
  - Reallocated_Sector_Ct - bad sectors remapped (increasing = dying)
  - Current_Pending_Sector - sectors waiting to be remapped
  - Offline_Uncorrectable - sectors that couldn't be read

If SMART shows problems, STOP and use ddrescue immediately.
Do not run fsck or other tools that write to a failing drive.


SCENARIO: Clone a failing drive (CRITICAL - do this first!)
------------------------------------------------------------
Golden rule: NEVER work directly on a failing drive.
Clone it first, then recover from the clone.

Clone to an image file (safest):

  ddrescue -d -r3 /dev/sdX /path/to/image.img /path/to/logfile.log

  -d    = direct I/O, bypass cache
  -r3   = retry bad sectors 3 times
  logfile = allows resuming if interrupted

Clone to another drive:

  ddrescue -d -r3 /dev/sdX /dev/sdY /path/to/logfile.log

Monitor progress (ddrescue shows its own progress, but for pipes):

  ddrescue -d /dev/sdX - 2>/dev/null | pv > /path/to/image.img

Resume an interrupted clone:

  ddrescue -d -r3 /dev/sdX /path/to/image.img /path/to/logfile.log

The log file tracks what's been copied. Same command resumes.

If drive is very bad, do a quick pass first, then retry bad sectors:

  ddrescue -d -n /dev/sdX image.img logfile.log     # Fast pass, skip errors
  ddrescue -d -r3 /dev/sdX image.img logfile.log   # Retry bad sectors


SCENARIO: Recover deleted files (PhotoRec)
------------------------------------------
PhotoRec recovers files by their content signatures, not filesystem.
Works even if filesystem is damaged or reformatted.

Run PhotoRec (included with testdisk):

  photorec /dev/sdX            # From device
  photorec image.img           # From disk image

Interactive steps:
  1. Select the disk/partition
  2. Choose filesystem type (usually "Other" for FAT/NTFS/exFAT)
  3. Choose "Free" (unallocated) or "Whole" (entire partition)
  4. Select destination folder for recovered files
  5. Wait (can take hours for large drives)

Recovered files are named by type (e.g., f0001234.jpg) in recup_dir.*/


SCENARIO: Recover lost partition / Fix partition table
------------------------------------------------------
TestDisk can find and recover lost partitions.

Run TestDisk:

  testdisk /dev/sdX            # From device
  testdisk image.img           # From disk image

Interactive steps:
  1. Select disk
  2. Select partition table type (usually Intel/PC for MBR, EFI GPT)
  3. Choose "Analyse" to scan for partitions
  4. "Quick Search" finds most partitions
  5. "Deeper Search" if quick search misses any
  6. Review found partitions, select ones to recover
  7. "Write" to save new partition table (or just note the info)

TestDisk can also:
  - Recover deleted files from FAT/NTFS/ext filesystems
  - Repair FAT/NTFS boot sectors
  - Rebuild NTFS MFT


SCENARIO: Recover specific file types (Foremost)
------------------------------------------------
Foremost carves files based on headers/footers.
Useful when PhotoRec doesn't find what you need.

Basic usage:

  foremost -t all -i /dev/sdX -o /output/dir
  foremost -t all -i image.img -o /output/dir

Specific file types:

  foremost -t jpg,png,gif -i image.img -o /output/dir
  foremost -t pdf,doc,xls -i image.img -o /output/dir

Supported types: jpg, gif, png, bmp, avi, exe, mpg, wav, riff,
wmv, mov, pdf, ole (doc/xls/ppt), doc, zip, rar, htm, cpp, all


SCENARIO: Can't mount filesystem - try repair
----------------------------------------------
WARNING: Only run fsck on a COPY, not the original failing drive!

For ext2/ext3/ext4:

  fsck.ext4 -n /dev/sdX        # Check only, no changes (safe)
  fsck.ext4 -p /dev/sdX        # Auto-repair safe problems
  fsck.ext4 -y /dev/sdX        # Say yes to all repairs (risky)

For NTFS:

  ntfsfix /dev/sdX             # Fix common NTFS issues

For XFS:

  xfs_repair -n /dev/sdX       # Check only
  xfs_repair /dev/sdX          # Repair

For FAT32:

  fsck.fat -n /dev/sdX         # Check only
  fsck.fat -a /dev/sdX         # Auto-repair


SCENARIO: Mount a disk image for file access
---------------------------------------------
Mount a full disk image (find partitions first):

  fdisk -l image.img           # List partitions and offsets

Note the "Start" sector of the partition you want, multiply by 512:

  mount -o loop,offset=$((START*512)) image.img /mnt/recovery

Or use losetup to set up loop devices for all partitions:

  losetup -P /dev/loop0 image.img
  mount /dev/loop0p1 /mnt/recovery

For NTFS images:

  mount -t ntfs-3g -o loop,offset=$((START*512)) image.img /mnt/recovery


SCENARIO: Low-level recovery from very bad drives (safecopy)
------------------------------------------------------------
Safecopy is more aggressive than ddrescue for very damaged media.
Use when ddrescue can't make progress.

  safecopy /dev/sdX image.img

With multiple passes (increasingly aggressive):

  safecopy --stage1 /dev/sdX image.img    # Quick pass
  safecopy --stage2 /dev/sdX image.img    # Retry errors
  safecopy --stage3 /dev/sdX image.img    # Maximum recovery


DATA RECOVERY TIPS
------------------
1. STOP using a failing drive immediately - every access risks more damage
2. Clone first, recover from clone - never work on original
3. Keep the log file from ddrescue - allows resuming
4. Recover to a DIFFERENT drive - never same drive
5. For deleted files on working drive, unmount immediately to prevent
   overwriting the deleted data
6. If drive makes clicking/grinding noises, consider professional recovery
7. For SSDs, TRIM may have already zeroed deleted blocks - recovery harder

================================================================================
3. BOOT REPAIR
================================================================================

QUICK REFERENCE
---------------
  tldr grub-install     # Install GRUB bootloader
  tldr efibootmgr       # Manage UEFI boot entries
  tldr arch-chroot      # Chroot into installed system
  man mkinitcpio        # Rebuild initramfs

FIRST: Identify your boot mode
------------------------------
Check if system is UEFI or Legacy BIOS:

  ls /sys/firmware/efi       # If exists, you're in UEFI mode

If booting from this rescue USB in UEFI mode, you need to fix UEFI.
If booting in Legacy mode, you need to fix MBR/Legacy boot.


SCENARIO: Chroot into broken system (preparation for most repairs)
------------------------------------------------------------------
This is the foundation for most boot repairs.

1. Find your partitions:

  lsblk -f                    # Shows filesystems and labels

2. Mount the root filesystem:

  mount /dev/sdX2 /mnt        # Replace with your root partition

   For ZFS root:

     zpool import -R /mnt zroot
     zfs mount -a

3. Mount required system directories:

  mount /dev/sdX1 /mnt/boot   # EFI partition (if separate)
  mount --bind /dev /mnt/dev
  mount --bind /proc /mnt/proc
  mount --bind /sys /mnt/sys
  mount --bind /sys/firmware/efi/efivars /mnt/sys/firmware/efi/efivars

   Or use arch-chroot (handles mounts automatically):

     arch-chroot /mnt

4. Now you can run commands as if booted into the system.


SCENARIO: Reinstall GRUB (UEFI)
-------------------------------
After chrooting into the system:

  grub-install --target=x86_64-efi --efi-directory=/boot --bootloader-id=GRUB

If EFI partition is mounted elsewhere:

  grub-install --target=x86_64-efi --efi-directory=/boot/efi --bootloader-id=GRUB

Regenerate GRUB config:

  grub-mkconfig -o /boot/grub/grub.cfg


SCENARIO: Reinstall GRUB (Legacy BIOS/MBR)
------------------------------------------
After chrooting into the system:

  grub-install --target=i386-pc /dev/sdX    # Note: device, not partition

Regenerate GRUB config:

  grub-mkconfig -o /boot/grub/grub.cfg


SCENARIO: Fix UEFI boot entries
-------------------------------
List current boot entries:

  efibootmgr -v

Delete a broken entry (replace XXXX with boot number):

  efibootmgr -b XXXX -B

Create a new boot entry:

  efibootmgr --create --disk /dev/sdX --part 1 --label "Arch Linux" \
    --loader /EFI/GRUB/grubx64.efi

Change boot order (comma-separated boot numbers):

  efibootmgr -o 0001,0002,0003

Set next boot only:

  efibootmgr -n 0001


SCENARIO: Rebuild initramfs (kernel panic, missing modules)
-----------------------------------------------------------
After chrooting into the system:

List available presets:

  ls /etc/mkinitcpio.d/

Rebuild for specific kernel:

  mkinitcpio -p linux          # Standard kernel
  mkinitcpio -p linux-lts      # LTS kernel

Rebuild all:

  mkinitcpio -P

Check mkinitcpio.conf for ZFS:

  grep "^HOOKS" /etc/mkinitcpio.conf

For ZFS, HOOKS should include 'zfs':
  HOOKS=(base udev autodetect modconf block zfs filesystems keyboard fsck)


SCENARIO: GRUB not detecting Windows (dual-boot)
------------------------------------------------
After chrooting into the system:

Enable os-prober in GRUB config:

  echo 'GRUB_DISABLE_OS_PROBER=false' >> /etc/default/grub

Mount the Windows EFI partition if not already mounted.

Regenerate GRUB config:

  grub-mkconfig -o /boot/grub/grub.cfg

os-prober should find Windows and add it to the menu.


SCENARIO: Restore Windows MBR (remove GRUB, restore Windows boot)
-----------------------------------------------------------------
If you need to remove Linux and restore Windows-only MBR:

  ms-sys -w /dev/sdX           # Write Windows 7+ MBR

Other options:
  ms-sys -7 /dev/sdX           # Windows 7 MBR specifically
  ms-sys -i /dev/sdX           # Show current MBR type


SCENARIO: Install syslinux (lightweight alternative to GRUB)
------------------------------------------------------------
For Legacy BIOS:

  syslinux-install_update -i -a -m

For UEFI, copy the EFI binary:

  cp /usr/lib/syslinux/efi64/* /boot/EFI/syslinux/

Create /boot/syslinux/syslinux.cfg with boot entries.


SCENARIO: Can't boot - kernel panic with ZFS
--------------------------------------------
Common causes:
1. ZFS module not in initramfs - rebuild with mkinitcpio
2. Pool name changed - check zpool.cache
3. hostid mismatch - regenerate hostid

After chrooting:

Check if ZFS hook is present:

  grep zfs /etc/mkinitcpio.conf

Regenerate hostid if needed:

  zgenhostid $(hostid)

Rebuild initramfs:

  mkinitcpio -P


SCENARIO: Emergency boot from GRUB command line
-----------------------------------------------
If GRUB loads but config is broken, press 'c' for command line:

For Linux (non-ZFS):

  set root=(hd0,gpt2)
  linux /boot/vmlinuz-linux root=/dev/sda2
  initrd /boot/initramfs-linux.img
  boot

For Linux with ZFS root:

  set root=(hd0,gpt1)
  linux /vmlinuz-linux-lts root=ZFS=zroot/ROOT/default
  initrd /initramfs-linux-lts.img
  boot

Tab completion works in GRUB command line!


BOOT REPAIR TIPS
----------------
1. Always backup your current EFI partition before making changes
2. Use 'efibootmgr -v' to see full paths and verify entries
3. Some UEFI firmwares are picky about the bootloader path -
   try /EFI/BOOT/BOOTX64.EFI as a fallback
4. If all else fails, most UEFI has a boot menu (F12, F8, Esc at POST)
5. GRUB reinstall usually fixes most boot issues
6. For ZFS, the initramfs must include the zfs hook

================================================================================
4. WINDOWS RECOVERY
================================================================================

[To be added]

================================================================================
5. HARDWARE DIAGNOSTICS
================================================================================

[To be added]

================================================================================
6. DISK OPERATIONS
================================================================================

[To be added]

================================================================================
7. NETWORK TROUBLESHOOTING
================================================================================

[To be added]

================================================================================
                              END OF GUIDE
================================================================================