Your Disk Is Full but `du` Says It's Empty

dev.to

Your server is slow. You check the disk:

$df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1       100G   95G  0.5G  99% /
Enter fullscreen mode Exit fullscreen mode

Ninety-five gigabytes used. So you go looking for what's eating it:

$du -sh /* 2>/dev/null
...
20G  /var
2G   /home
1G   /usr
Enter fullscreen mode Exit fullscreen mode

Add it all up and you get maybe 25 gigabytes. So where are the other 70? You can't find the files. You can't free the space. And the service is about to fall over.

This isn't a bug, and it isn't magic. It's how Linux handles files, and once you've seen it you'll recognize it for the rest of your career. Here's why df and du disagree, and how to fix it in two commands.

Why df and du look at different things

df and du measure the same disk from opposite ends.

df asks the filesystem directly: how many blocks are allocated, how many are free? That number lives in the filesystem's own bookkeeping (the superblock), and it updates the instant anything changes.

du does something more naive. It walks the directory tree, finds every file it can reach, and adds up their sizes. The key word is reach. du only counts files that still have a name in some directory.

That gap — between "blocks the filesystem says are used" and "files du can still find by name" — is where your missing gigabytes are hiding.

A file you deleted can still be eating your disk

In Linux, a file isn't really the name you see in a directory. The name is just a pointer. The actual file is an inode: a small kernel record holding the permissions, the owner, the size, and — most importantly — the list of disk blocks where the data lives.

When a process opens a file, the kernel hands it a file descriptor: a live handle onto that inode. And here's the rule that explains everything:

The kernel keeps a file's blocks on disk as long as at least one thing still points to it — either a name in a directory, or an open file descriptor in a running process.

So when you rm a file, you don't necessarily delete it. You remove its name. If no process has it open, the link count drops to zero, and the kernel frees the blocks. But if a process is still holding the file open, the link count never reaches zero. The name is gone, but the data — and the disk space — stays.

BEFORE rm:
  /var/log/app.log ──▶ inode 12345 ──▶ [ blocks on disk: 2 GB ]
                          ▲
                          │ open fd 7 (nginx, pid 1234)

AFTER rm:
  (no name)        ──▶ inode 12345 ──▶ [ blocks on disk: 2 GB ]  ← still allocated!
                          ▲
                          │ open fd 7 (nginx, pid 1234)  ← keeps it alive
Enter fullscreen mode Exit fullscreen mode

du walks /var/log, sees no app.log, and counts nothing. df looks at the blocks, sees them occupied, and counts them. Both are telling the truth. They're just looking at different things.

The classic cause: log rotation that uses rm

The most common way to hit this is log rotation.

A rotation job deletes the old log and the app starts a new one — except the app already opened the old file at startup and is still writing to that open descriptor. The name is gone, but the process keeps appending. Blocks keep filling. df watches the disk shrink; du finds nothing. This can run for hours, quietly eating gigabytes, until either the disk fills or someone restarts the process.

(Deleted-but-open files are the classic, most dramatic cause of a df/du mismatch. Smaller gaps can also come from reserved root blocks or sparse files — but when the gap is large and growing, it's almost always a held-open deleted file.)

Finding the culprit: lsof +L1

lsof lists open files. The +L1 flag means "show files whose link count is less than 1" — in other words, files that have been unlinked from the filesystem but are still held open:

$sudo lsof +L1
COMMAND  PID  USER  FD  TYPE DEVICE  SIZE/OFF    NLINK NODE  NAME
nginx   1234  www   7w  REG    8,1  2147483648    0    12345 /var/log/app.log (deleted)
java    5678  app   3w  REG    8,1   536870912    0    67890 /var/log/service.log (deleted)
Enter fullscreen mode Exit fullscreen mode

Reading the columns that matter:

  • PID — the process holding the file open (1234).
  • FD — the descriptor number and mode. 7w means descriptor 7, open for writing.
  • SIZE/OFF — the size. There are your missing gigabytes: 2147483648 bytes is 2 GB.
  • NLINK — the link count. 0 means no directory anywhere still names this file.
  • NAME — the path, helpfully tagged (deleted).

Now you know exactly who's responsible: nginx, pid 1234, sitting on a deleted 2 GB log.

No lsof? Read /proc directly

Minimal images and stripped-down containers often don't ship lsof. You don't need it — the same information is sitting in /proc. Every process's open descriptors live as symlinks under /proc/<pid>/fd/, and a deleted target is spelled out right there:

$ls -l /proc/1234/fd | grep deleted
lrwx------ 1 www www 64 Jun  9 02:14 7 ->/var/log/app.log (deleted)
Enter fullscreen mode Exit fullscreen mode

To sweep every process at once, walk them all and keep the deleted ones:

$sudo ls -l /proc/*/fd 2>/dev/null | grep deleted
Enter fullscreen mode Exit fullscreen mode

Each hit is a held-open deleted file. You can still read its size through the descriptor — stat -L /proc/1234/fd/7 follows the handle to the deleted inode and reports how big it is. Same answer as lsof, zero extra tools.

Freeing the space without a restart

Restarting nginx would close the descriptor and release the blocks. But a restart isn't always on the table — live connections, production traffic, a change window you don't have.

You can reclaim the space without touching the process, by emptying the file through its descriptor in /proc:

$sudo truncate -s 0 /proc/1234/fd/7
Enter fullscreen mode Exit fullscreen mode

/proc is a virtual filesystem the kernel exposes in memory. /proc/1234/fd/ holds one entry per open descriptor of process 1234, each a link straight to the real file behind it. Writing to /proc/1234/fd/7 reaches the file through the still-open handle — bypassing the directory entry that no longer exists. truncate -s 0 sets its length to zero. The kernel frees the blocks, df shows the space back immediately, and the process keeps running and writing to the same descriptor as if nothing happened.

If you prefer a shell one-liner, the redirect form does the same thing:

$: > /proc/1234/fd/7
Enter fullscreen mode Exit fullscreen mode

One caution: only do this to a log or other append-only file you're happy to lose the contents of. Don't truncate a database write-ahead log or anything a process depends on for recovery — you'll corrupt it.

When it happens inside a container

Containers hit this constantly, and it's more confusing there, because df and du end up disagreeing across a namespace boundary. An app inside a container deletes its own log or temp file but keeps writing to the open descriptor. The blocks pile up in the container's writable layer, du run inside the container finds nothing, and the host's free space just quietly drops.

The trick is to look from the host, not from inside. The host sees every process in every container, so lsof +L1 (or the /proc sweep above) finds the culprit by its real host PID — even though the process thinks it's PID 1 inside its own namespace:

$sudo lsof +L1
COMMAND   PID  USER  FD   TYPE  ...  NAME
node    28417  root  19w  REG   ...  /app/logs/out.log (deleted)
Enter fullscreen mode Exit fullscreen mode

The fix is identical: sudo truncate -s 0 /proc/28417/fd/19 using the host PID, or restart the container if you can afford to. This is worth calling out because the usual reaction is to docker exec in, run du, see nothing, and conclude the host is lying. It isn't — you're looking from inside the box at a file the box can't see.

"But I switched to dust / gdu — surely those catch it?"

Plain old du and df are getting replaced on a lot of machines by a wave of faster, prettier rewrites in Rust and Go. They're genuinely nice. But it's worth knowing what they do and don't change here, because it's a great way to understand the bug.

Tool Language Works like Sees deleted-but-open files?
dust Rust du (walks the tree) No
gdu Go du (walks the tree) No
diskus Rust du (walks the tree) No
dua Rust du (walks the tree) No
ncdu C / Zig du (walks the tree) No
duf Go df (reads filesystem stats) Shows the space, can't name the file
dysk Rust df (reads filesystem stats) Shows the space, can't name the file

The split is the whole story. dust, gdu, diskus, dua, and ncdu are all tree walkers — like du, they enumerate files by name. A deleted-but-open file has no name to enumerate, so they're blind to it exactly the way du is. They'll happily tell you your tree adds up to 25 GB.

duf and dysk are filesystem readers — like df, they ask the kernel for block counts. So they'll show you the disk is full, same as df, but they can't point at the file, because the file has no entry to point at.

Faster and prettier doesn't change the model. Walking the tree can't find a thing that left the tree. When df and du disagree by a lot, the answer is still lsof +L1, every time. That's the one tool here that looks at open descriptors instead of names.

Stop it from happening again

The root problem is a rotation that deletes the file out from under a process that keeps writing. Two clean fixes:

Use copytruncate in logrotate. Instead of deleting the file, it copies the contents to an archive and then truncates the original in place — the same trick we did by hand. The process keeps its descriptor; the blocks get freed:

/var/log/app.log {
    daily
    rotate 7
    compress
    copytruncate
}
Enter fullscreen mode Exit fullscreen mode

The tradeoff: there's a tiny window between the copy and the truncate where a few log lines can slip through and be lost. Fine for most logs.

Or have the app reopen its log on a signal. If losing lines isn't acceptable, rotate by moving the file and telling the process to reopen. nginx does this with nginx -s reopen; many daemons reopen on SIGHUP. The descriptor gets pointed at the new file cleanly, with no lost lines.

And to catch it before 3 a.m., add a check to monitoring — count deleted-but-open files over 100 MB:

$lsof +L1 -F s | awk '/^s/ && substr($0,2)+0 > 104857600 {c++} END {print c+0}'
Enter fullscreen mode Exit fullscreen mode

If that number climbs while free space drops, you've found this exact problem before it pages you.

What to take away

  • df reads the filesystem's block accounting; du walks the directory tree. They disagree when a file has been deleted but a process still holds it open.
  • rm removes a name, not the data. The blocks stay until the last descriptor closes.
  • Find it with lsof +L1. Read the PID, the FD, and the size.
  • Reclaim it without a restart: truncate -s 0 /proc/<pid>/fd/<fd> (not on a database file).
  • Prevent it: copytruncate in logrotate, or have the app reopen on SIGHUP.
  • The new Rust and Go tools are faster, but they walk the tree or read the filesystem just like du and df — so they share the same blind spot. lsof is what sees the file nobody else can.

The mechanism is identical everywhere, from a Raspberry Pi to a server with terabyte disks. Once you've watched df and du argue and known who was right, you'll never be confused by a "full" empty disk again.

Source: dev.to

arrow_back Back to News