Add hardware diagnostics tools and rescue guide section

Packages added: - memtester: userspace memory testing - stress-ng: CPU/memory/IO stress testing - lm_sensors: temperature/fan/voltage monitoring - lshw: detailed hardware inventory - dmidecode: SMBIOS/DMI system information - nvme-cli: NVMe drive management - hdparm: HDD/SSD parameter tuning Rescue guide Section 5 covers: - SMART disk health monitoring - Memory testing with memtester - System stress testing - Temperature monitoring with sensors - Hardware inventory commands - Disk benchmarking - Bad block checking
author: Craig Jennings <c@cjennings.net> 2026-01-18 14:56:15 -0600
committer: Craig Jennings <c@cjennings.net> 2026-01-18 14:56:15 -0600
commit: 752400ff7ba075efc5849725d7282a01ce3d9cd4 (patch)
tree: a2fbc2825f35f4432900d3abb7ac78ca64282558 /custom
parent: e17830103bfebd1f2ec73395abe2d8026cc11b21 (diff)
download: archangel-752400ff7ba075efc5849725d7282a01ce3d9cd4.tar.gz
archangel-752400ff7ba075efc5849725d7282a01ce3d9cd4.zip
1 files changed, 209 insertions, 1 deletions
diff --git a/custom/RESCUE-GUIDE.txt b/custom/RESCUE-GUIDE.txt
index d1de465..57753d3 100644
--- a/custom/RESCUE-GUIDE.txt
+++ b/custom/RESCUE-GUIDE.txt
@@ -842,7 +842,215 @@ WINDOWS RECOVERY TIPS
 5. HARDWARE DIAGNOSTICS
 ================================================================================
 
-[To be added]
+QUICK REFERENCE
+---------------
+  tldr smartctl       # Check drive health
+  tldr lshw           # List hardware
+  tldr hdparm         # Disk info and benchmarks
+  man memtester       # Memory testing
+  man stress-ng       # Stress testing
+
+SCENARIO: Check if a drive is failing (SMART)
+---------------------------------------------
+Quick health check:
+
+  smartctl -H /dev/sdX
+
+Full SMART report:
+
+  smartctl -a /dev/sdX
+
+For NVMe drives:
+
+  smartctl -a /dev/nvme0n1
+  nvme smart-log /dev/nvme0n1
+
+Key SMART attributes to watch:
+  - Reallocated_Sector_Ct: Bad sectors remapped (increasing = dying)
+  - Current_Pending_Sector: Sectors waiting to be remapped
+  - Offline_Uncorrectable: Unreadable sectors
+  - UDMA_CRC_Error_Count: Cable/connection issues
+  - Wear_Leveling_Count: SSD wear (lower = more worn)
+
+Run a self-test:
+
+  smartctl -t short /dev/sdX    # Quick test (~2 min)
+  smartctl -t long /dev/sdX     # Thorough test (~hours)
+
+Check test results:
+
+  smartctl -l selftest /dev/sdX
+
+
+SCENARIO: Test RAM for errors
+-----------------------------
+Option 1: Memtest86+ (from boot menu)
+  - Restart and select "Memtest86+" from the boot menu
+  - Most thorough test, runs before OS loads
+  - Let it run for at least 1-2 passes (can take hours)
+
+Option 2: memtester (from running system)
+  - Tests available RAM while system is running
+  - Can't test RAM used by kernel/programs
+
+Test 1GB of RAM (adjust based on free memory):
+
+  free -h                       # Check available memory
+  memtester 1G 1                # Test 1GB, 1 iteration
+  memtester 2G 5                # Test 2GB, 5 iterations
+
+Note: memtester can only test free RAM. For thorough testing,
+use Memtest86+ from the boot menu.
+
+
+SCENARIO: Monitor temperatures, fans, voltages
+----------------------------------------------
+First, detect and load sensor modules:
+
+  sensors-detect --auto         # Auto-detect sensors
+
+Then view readings:
+
+  sensors                       # Show all sensor data
+
+Continuous monitoring:
+
+  watch -n 1 sensors            # Update every second
+
+If sensors shows nothing, modules may need loading:
+
+  modprobe coretemp             # Intel CPU temps
+  modprobe k10temp              # AMD CPU temps
+  modprobe nct6775              # Common motherboard chip
+
+
+SCENARIO: Stress test hardware (verify stability)
+-------------------------------------------------
+Useful for:
+  - Testing used/refurbished hardware
+  - Verifying overclocking stability
+  - Burn-in testing before deployment
+  - Reproducing intermittent issues
+
+CPU stress test:
+
+  stress-ng --cpu $(nproc) --timeout 300s     # All cores, 5 min
+
+Memory stress test:
+
+  stress-ng --vm 2 --vm-bytes 1G --timeout 300s
+
+Combined CPU + memory:
+
+  stress-ng --cpu $(nproc) --vm 2 --vm-bytes 1G --timeout 600s
+
+Disk I/O stress:
+
+  stress-ng --hdd 2 --timeout 300s
+
+Monitor during stress test (in another terminal):
+
+  watch -n 1 sensors            # Watch temperatures
+  htop                          # Watch CPU/memory usage
+
+
+SCENARIO: Get detailed hardware information
+-------------------------------------------
+Full hardware report:
+
+  lshw                          # All hardware (verbose)
+  lshw -short                   # Summary view
+  lshw -html > hardware.html    # HTML report
+
+Specific components:
+
+  lshw -class processor         # CPU info
+  lshw -class memory            # RAM info
+  lshw -class disk              # Disk info
+  lshw -class network           # Network adapters
+
+BIOS/motherboard info:
+
+  dmidecode                     # All DMI tables
+  dmidecode -t bios             # BIOS info
+  dmidecode -t system           # System/motherboard
+  dmidecode -t memory           # Memory slots and modules
+  dmidecode -t processor        # CPU socket info
+
+Quick system overview:
+
+  inxi -Fxz                     # If inxi is installed
+  cat /proc/cpuinfo             # CPU details
+  cat /proc/meminfo             # Memory details
+
+
+SCENARIO: Test disk speed / benchmark
+-------------------------------------
+Basic read speed test:
+
+  hdparm -t /dev/sdX            # Buffered read speed
+  hdparm -T /dev/sdX            # Cached read speed
+
+More accurate test (run 3 times, average):
+
+  hdparm -tT /dev/sdX
+  hdparm -tT /dev/sdX
+  hdparm -tT /dev/sdX
+
+Get drive information:
+
+  hdparm -I /dev/sdX            # Detailed drive info
+
+For NVMe drives:
+
+  nvme list                     # List NVMe drives
+  nvme id-ctrl /dev/nvme0n1     # Controller info
+  nvme smart-log /dev/nvme0n1   # SMART/health data
+
+
+SCENARIO: Check for bad blocks (surface scan)
+---------------------------------------------
+WARNING: This is read-only but takes a long time on large drives.
+
+  badblocks -sv /dev/sdX
+
+For faster progress indication:
+
+  badblocks -sv -b 4096 /dev/sdX
+
+Note: For modern drives, SMART is usually more informative.
+badblocks is useful for older drives without good SMART support.
+
+
+SCENARIO: Identify unknown hardware / find drivers
+--------------------------------------------------
+List PCI devices:
+
+  lspci                         # All PCI devices
+  lspci -v                      # Verbose (with drivers)
+  lspci -k                      # Show kernel drivers
+
+List USB devices:
+
+  lsusb                         # All USB devices
+  lsusb -v                      # Verbose
+
+Find what driver a device is using:
+
+  lspci -k | grep -A3 "Network"    # Network adapter driver
+  lspci -k | grep -A3 "VGA"        # Graphics driver
+
+
+HARDWARE DIAGNOSTICS TIPS
+-------------------------
+1. Run SMART checks regularly - drives often show warning signs
+2. Memtest86+ (from boot menu) is more thorough than memtester
+3. Stress test new/used hardware before trusting it with data
+4. High temperatures during stress test = cooling problem
+5. Random crashes/errors often indicate RAM or power issues
+6. SMART "Reallocated Sector Count" increasing = drive dying
+7. Back up immediately if SMART shows any warnings
+8. SSDs have limited write cycles - check Wear_Leveling_Count
 
 ================================================================================
 6. DISK OPERATIONS
author	Craig Jennings <c@cjennings.net>	2026-01-18 14:56:15 -0600
committer	Craig Jennings <c@cjennings.net>	2026-01-18 14:56:15 -0600
commit	752400ff7ba075efc5849725d7282a01ce3d9cd4 (patch)
tree	a2fbc2825f35f4432900d3abb7ac78ca64282558 /custom
parent	e17830103bfebd1f2ec73395abe2d8026cc11b21 (diff)
download	archangel-752400ff7ba075efc5849725d7282a01ce3d9cd4.tar.gz archangel-752400ff7ba075efc5849725d7282a01ce3d9cd4.zip