🐧 Day 05 — Linux Troubleshooting Drill
When something breaks on a Linux server, don’t guess and don’t panic.
Follow a runbook — a small checklist of commands that help you quickly understand:
Is the system overloaded?
Is memory full?
Is disk space low?
Is the network working?
Are there errors in logs?
Think of this like a doctor checkup for your server 🩺
Today we’ll troubleshoot the SSH service step by step.
📘 What Is a Runbook?
A runbook is a step-by-step troubleshooting routine.
It tells you:
✔ What to check
✔ Which commands to run
✔ What results mean
✔ What to do next if things get worse
This helps you stay calm during real incidents.
🖥 Step 1 — Check System Basics
Before troubleshooting, know your environment.
uname -a
➡️ Shows Linux kernel version and system architecture.
cat /etc/os-release
➡️ Shows Linux distribution name and version (Ubuntu, CentOS, etc.).
⚙️ Step 2 — Check CPU & Memory Health
We check if the system is under heavy load.
top
➡️ Live view of CPU and memory usage.
Look for high CPU (above 80%) or memory almost full.
free -h
➡️ Shows total and used RAM in a human-readable format.
ps -o pid,pcpu,pmem,comm -p $(pgrep sshd)
➡️ Shows how much CPU and memory the SSH process is using.
💾 Step 3 — Check Disk & Storage
Low disk space can crash services.
df -h
➡️ Shows disk space usage for all mounted drives.
du -sh /var/log
➡️ Shows total size of the log directory.
Large logs can fill up the disk.
🌐 Step 4 — Check Network Status
We confirm that the service is listening and reachable.
ss -tulpn | grep ssh
➡️ Shows if SSH is listening on port 22.
curl -I http://localhost
➡️ Sends a quick request to a web service to check if it responds.
📜 Step 5 — Check Logs (Most Important Step)
Logs tell us why something failed.
journalctl -u ssh -n 50
➡️ Shows the last 50 log entries for the SSH service.
tail -n 50 /var/log/syslog
➡️ Shows the latest system log messages.
🔎 Example Quick Findings
After running all checks, you might find:
✔ CPU usage is normal
✔ Memory is healthy
✔ Disk space is safe
✔ SSH service is running and listening
✔ No recent errors in logs
That means the server is healthy.
🚨 If This Worsens (Next Steps)
If problems continue:
sudo systemctl restart ssh
➡️ Restart SSH service safely.
sudo journalctl -u ssh -f
➡️ Watch logs live while testing connections.
strace -p <PID>
➡️ Capture detailed system calls if the service hangs.