Introduction
Running out of disk space on a server can be a frustrating experience. It leads to application malfunctions, system instability, and even crashes. Recently, I encountered a critical situation where my server’s root partition (/dev/vda3
) and multiple Docker overlay filesystems were reporting 100% utilization. This article outlines the steps I took to diagnose and resolve the issue, providing a practical guide for anyone facing a similar disk space crisis.
The Problem: 100% Disk Usage
The issue became apparent when monitoring tools alerted me to critically low disk space. Running the command df -h
confirmed the problem:
[root@iZbp15wv3kw8nmk6nxi8y4Z ~]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 912M 0 912M 0% /dev
tmpfs 930M 0 930M 0% /dev/shm
tmpfs 930M 95M 836M 11% /run
tmpfs 930M 0 930M 0% /sys/fs/cgroup
/dev/vda3 40G 40G 20K 100% /
/dev/vda2 100M 5.8M 95M 6% /boot/efi
overlay 40G 40G 20K 100% /var/lib/docker/overlay2/1d2d1bfe1ffc4f6db4e6e094cbc5fea573fabb2d3ea595aef4dad90fed80d10d/merged
overlay 40G 40G 20K 100% /var/lib/docker/overlay2/fef67938518265be3b5e3097ab46c925693d918a6a87e106f9062fe08cc98ae3/merged
overlay 40G 40G 20K 100% /var/lib/docker/overlay2/8a63356428e0a5dab3f05e2457812a39b1867844e772a9ac6d202c4e1ac89b4a/merged
overlay 40G 40G 20K 100% /var/lib/docker/overlay2/8cbf2ca23251fb29e1c23b4dbc530d0f10e346eb671e6ca254ae9912051b93f9/merged
overlay 40G 40G 20K 100% /var/lib/docker/overlay2/66c638bf6da59051457776c6ade9949173f83ffb0f4394a055460ce54dc7811d/merged
overlay 40G 40G 20K 100% /var/lib/docker/overlay2/8b34ba9c8f3527a51635ce0918d52577e6cb7ccd33c163c3c795e8d4908c9270/merged
overlay 40G 40G 20K 100% /var/lib/docker/overlay2/49ababe15627beb203b86ca6ba9fe59ffea504dfc55496c735f471ee48cf2891/merged
overlay 40G 40G 20K 100% /var/lib/docker/overlay2/8e3a3e4496a2a486d399c0f520f5a808608e482bf0d50176b7fe980ef199a474/merged
tmpfs 186M 0 186M 0% /run/user/0
As you can see, /dev/vda3
(the root partition) and numerous Docker overlay
filesystems were completely full. This meant no space was available for new files or even temporary operations, causing severe system instability.
Solution: A Step-by-Step Approach
Resolving this issue required a systematic approach, starting with the most likely and least disruptive solutions:
1. Reclaiming Space from Docker (Highly Recommended)
Docker is notorious for accumulating unused resources over time. These can include stopped containers, unused images (including dangling ones), volumes, and networks. Here’s how to clean them up:
- Prune Stopped Containers: Bash
docker container prune
- Prune Unused Images (including dangling images): Bash
docker image prune -a
Caution: The-a
flag removes all unused images, not just dangling ones. Ensure you don’t need any of them before proceeding. - Prune Unused Volumes: Bash
docker volume prune
- Prune Unused Networks: Bash
docker network prune
- One-Command Cleanup (Use with Caution): Bash
docker system prune -a
This powerful command reclaims space from all of the above. Remember the-a
flag removes all unused images, potentially including ones you might want to keep.
After running these commands, use df -h
again to check if sufficient space has been freed.
2. Identifying and Removing Large Files/Directories
If cleaning up Docker resources doesn’t solve the problem, the next step is to locate large files or directories that can be safely removed.
- Using
du
andsort
: Bashdu -h / | sort -rh | head -n 20
This command lists the top 20 largest files and directories under the root (/
) directory. You can adjust the number afterhead -n
as needed. - Using
ncdu
(Recommended for Interactive Navigation): Installncdu
if it’s not already present:- Debian/Ubuntu:
sudo apt-get install ncdu
- CentOS/RHEL/Fedora:
sudo yum install ncdu
Then run: Bash
ncdu /
ncdu
provides an interactive interface for browsing directory sizes. You can navigate using arrow keys, delete files/directories with thed
key (be extremely careful!), and quit withq
. - Debian/Ubuntu:
- Common Targets for Cleanup:
- Log Files: Check
/var/log
for large log files. Consider usinglogrotate
to manage log sizes. - Core Dumps: Look in
/var/crash
or similar directories for core dump files, which can often be safely removed. - Temporary Files: Examine
/tmp
. While usually cleaned on reboot, manual cleanup might be necessary. - Database Files: If running a database, check the size of its data files.
- Downloaded Files: Check your
Downloads
or similar directories.
- Log Files: Check
3. Resizing the /dev/vda3
Partition (Advanced and Risky)
If removing files doesn’t free up enough space, you might need to resize the root partition. This is an advanced procedure that carries a risk of data loss and should only be attempted as a last resort.
- Virtual Machines: If your server is a VM, you can usually expand the disk size through your virtualization management platform. After expanding the disk, you’ll need to extend the filesystem within the operating system.
- Physical Servers: Resizing partitions on physical servers typically involves using a partitioning tool like
gparted
from a Live CD/USB. This often requires data migration and carries a higher risk of data loss.
Always back up your important data before attempting any partition resizing.
4. Checking for Malware
While less common, it’s possible that a malicious program is consuming disk space by constantly writing data.
- Monitor Processes: Use
top
orhtop
to monitor CPU and memory usage for suspicious processes. - Monitor Disk I/O: Use
iotop
to identify processes with high disk write activity. - Run a Virus Scan: Install and run a reputable antivirus scanner to check for malware.
Conclusion
Running out of disk space can be a critical issue, but with a systematic approach, it’s often solvable. Start by cleaning up Docker resources, then identify and remove large, unnecessary files. Only consider resizing partitions as a last resort and after backing up your data. Regularly monitoring disk usage and performing routine cleanup will help prevent such issues from recurring. Remember to stay vigilant and keep your systems healthy!