Skip to main content

Command Palette

Search for a command to run...

🐧 Day 05 — Linux Troubleshooting Drill

Published
3 min read

When something breaks on a Linux server, don’t guess and don’t panic.

Follow a runbook — a small checklist of commands that help you quickly understand:

  • Is the system overloaded?

  • Is memory full?

  • Is disk space low?

  • Is the network working?

  • Are there errors in logs?

Think of this like a doctor checkup for your server 🩺

Today we’ll troubleshoot the SSH service step by step.


📘 What Is a Runbook?

A runbook is a step-by-step troubleshooting routine.

It tells you:
✔ What to check
✔ Which commands to run
✔ What results mean
✔ What to do next if things get worse

This helps you stay calm during real incidents.


🖥 Step 1 — Check System Basics

Before troubleshooting, know your environment.

uname -a

➡️ Shows Linux kernel version and system architecture.

cat /etc/os-release

➡️ Shows Linux distribution name and version (Ubuntu, CentOS, etc.).


⚙️ Step 2 — Check CPU & Memory Health

We check if the system is under heavy load.

top

➡️ Live view of CPU and memory usage.
Look for high CPU (above 80%) or memory almost full.

free -h

➡️ Shows total and used RAM in a human-readable format.

ps -o pid,pcpu,pmem,comm -p $(pgrep sshd)

➡️ Shows how much CPU and memory the SSH process is using.


💾 Step 3 — Check Disk & Storage

Low disk space can crash services.

df -h

➡️ Shows disk space usage for all mounted drives.

du -sh /var/log

➡️ Shows total size of the log directory.
Large logs can fill up the disk.


🌐 Step 4 — Check Network Status

We confirm that the service is listening and reachable.

ss -tulpn | grep ssh

➡️ Shows if SSH is listening on port 22.

curl -I http://localhost

➡️ Sends a quick request to a web service to check if it responds.


📜 Step 5 — Check Logs (Most Important Step)

Logs tell us why something failed.

journalctl -u ssh -n 50

➡️ Shows the last 50 log entries for the SSH service.

tail -n 50 /var/log/syslog

➡️ Shows the latest system log messages.


🔎 Example Quick Findings

After running all checks, you might find:

✔ CPU usage is normal
✔ Memory is healthy
✔ Disk space is safe
✔ SSH service is running and listening
✔ No recent errors in logs

That means the server is healthy.


🚨 If This Worsens (Next Steps)

If problems continue:

sudo systemctl restart ssh

➡️ Restart SSH service safely.

sudo journalctl -u ssh -f

➡️ Watch logs live while testing connections.

strace -p <PID>

➡️ Capture detailed system calls if the service hangs.