How to safely remove a Django model field: finding every real reference before you delete

python dev.to

Every Django project has at least one of these. A model with an old field that's probably not used anymore. "Probably" is the scary part. If something in production is still referencing it, deleting the column breaks the app. AttributeError, in production.

So you don't delete it. You want to, but you can't figure out how to check safely, so it just sits there. The column takes up space in every query. The field clutters the model definition. And it keeps sitting there, month after month, because the cost of confirming it's unused feels higher than the cost of leaving it.

Think about what happens when AttributeError: 'Article' object has no attribute 'summary' hits production. Users get 500 errors every time they open the page. Logs flood. Slack lights up. "Was it that deploy we just pushed?" Already considering a rollback. And the cause was deleting a field you thought was unused.

That's why nobody deletes anything. You can't be sure, so you don't. That's the right call. The problem is there was no way to get sure.

I tried the obvious thing: searching for the field name in VS Code. Hundreds of hits. I started opening them one by one and immediately noticed most aren't real references. File paths, comments, unrelated variable names. I searched for the .html field in one Django project and got 1,202 results. The actual code accessing that field: 10 results.

1,192 were noise. But you couldn't know that upfront.

Why search returns 1,202 hits

When you give up because there are too many results, that's not a failure on your part. VS Code search and grep just weren't built for this.

These tools answer "does this string appear anywhere in this file?" That's useful for a lot of things. But when you want to know "is this field actually referenced in code?", text search picks up way too much. The field name in a string literal, in a comment, as part of a filename: it all counts as a hit. Sorting through them is manual work.

Here's what those 1,202 results for .html broke down to:

Type Count
File extensions (layout.html, etc.) 1,087
Unrelated strings 27
Comments 21
Other noise 57
Actual field accesses 10

You can't check 1,200 results. Giving up was the right call. The issue wasn't how you used the tool. You were using the wrong tool.

And .html isn't a special case. Say you want to delete a title field on an Article model. "title" shows up everywhere in a codebase: variable names, dictionary keys, comments, strings. Hundreds of results. Same problem.

Common field names are the worst. status, name, type, created_at — these appear in dozens of unrelated contexts throughout a typical Django project. The search that was supposed to answer a simple question becomes 40 minutes of opening files, closing files, and losing track of what you've already checked.

"I searched, couldn't check everything, left it alone." Most Django developers have been here. You want to delete it but can't. The check isn't impossible, it's just too expensive. So the field stays. They accumulate.

This compounds over time. A project that's two years old might have thirty fields that nobody's confident about. The developers who added them have moved on. The tickets that motivated them are closed. The tests, if they exist, pass regardless of whether the field is used. There's no mechanism forcing a cleanup, just the gradual intuition that the schema is getting harder to understand.

If you know regex, you might try narrowing it: grep -rn "\bhtml\b" --include="*.py" to limit to Python files. Still the same problem. "html" as a dictionary key, in a comment — it all still hits. "Python files only" and "actual field access" are completely different things.

The more you refine the regex, the more you start wondering whether the regex itself is missing something. You end up needing to verify the verification. The tool that was supposed to save you time has become another source of doubt.

There's also the psychological cost. You open VS Code, run the search, see 847 results for status, and your shoulders drop. You close the tab. You tell yourself you'll check it later. "Later" never comes. Nobody should have to hand-verify 847 results to answer a yes/no question.

When undeletable fields pile up, here's what actually happens.

The schema bloats. Ten, twenty unused columns accumulate. A new developer opens the model, scans the fields. "What's this one for?" They try to find out, can't, and decide not to touch it. Reasonable. But when that pattern repeats, you end up with an unspoken rule: don't touch this model.

Every migration feels slightly more risky. The unease builds until nobody touches it at all. Six months later, another developer thinks the same thing. The cycle repeats.

This is technical debt in the same way missing tests are. An unresolvable cost that accumulates. Development slows. Onboarding takes longer. Nobody intended this, but the project gets heavier over time.

This isn't a skills problem. The tool didn't exist yet.

Here's the situation I kept ending up in: I'd look at a field, feel like it was probably unused, open VS Code to check, get overwhelmed by results, close it, and go do something else. The field would still be there six months later. The next developer would go through the same loop. The field would still be there a year later.

At some point the schema becomes archaeology. Fields with names like legacy_content, old_slug, deprecated_flag: nobody knows what they do, nobody wants to touch them, and the project carries them forever. Every SELECT * is slightly slower. Every new developer's mental model of the data is slightly more confused.

The real cost is cognitive load. Every unused field is a small tax on everyone who reads the model. Multiply that by thirty fields and two years of new developers and you start to see why "we never clean up old fields" becomes an invisible drag on velocity.

How colref reads code structure instead of text

What you actually wanted to know was: where is obj.html referenced in code? colref returns exactly that. File paths and string contents ignored.

How does it tell the difference? Instead of treating code as a sequence of characters, it reads the code structure.

embed.html in Python means "read the html attribute of the embed object": a specific structure. "pages/publish.html" is string data, not an attribute access. Reading code structure makes that difference detectable. Only places written as object.field_name get picked up. The .html that appears inside a string is ignored.

If text search is like pressing Ctrl+F on a page, reading code structure is closer to a human reading through every line. Except it handles thousands of lines in an instant.

It scans Python code and returns only .field_name accesses, with file and line number.

Method Hits (for .html) What it sees
VS Code full-text search 3,534 All string matches
grep \.html\b 1,202 Word-boundary matches
colref 10 Actual field accesses only

3,534 or 1,202 becomes 10. Whether you can act on the results depends entirely on how many there are.

When you get 10 results: open each one. views.py:42 means go to that line and check whether obj.html is actually being accessed. Real reference — can't delete. Not a real reference — skip. Ten results takes 10–15 minutes.

A few common things you'll see when reviewing results: the field name appearing in a migration file (colref skips migrations, but if it didn't, this would be a false positive; the migration is just recording the history of the field's existence, not actively using it). You might also see test factories or fixtures that set the field value. Worth noting: if you delete the field and forget to update the factory, your test suite will break. That's not a reason not to delete, it's just something to clean up as part of the deletion.

When you get zero: you have a fact. "No references found in Python code." That's different from "I think it's probably unused." Move to the next step: checking getattr, templates, Admin, Forms, and Serializers. Zero from colref is the starting point, not the finish line.

The shift is from "check 1,200 things" to "check 10 things, then a handful of specific files." That's the difference between a task you'll defer indefinitely and one you'll do today.

For the technical details of how code structure is read, see ARCHITECTURE.md.

Installation

Install via pipx:

pipx install colref
Enter fullscreen mode Exit fullscreen mode

Or with pip:

pip install colref
Enter fullscreen mode Exit fullscreen mode

Specify the model name, field name, and your project directory:

colref check --orm django --model Embed --field html ./
Enter fullscreen mode Exit fullscreen mode

Results come back as filename:line_number. Each one is something you can open directly. Ten results takes maybe ten minutes to verify. Nothing compared to scrolling through 1,202 results, losing your place, and giving up halfway through.

A note on model names: use the class name exactly as it appears in your models file, including capitalization. Embed, not embed or EMBED. Field names are case-sensitive too: html, not HTML. If you get zero results for a field you know is used, double-check the casing first.

The ./ at the end is the path to scan. You can point it at a specific app directory if you want to narrow it down, but pointing at the project root works fine and makes sure nothing gets missed.

What zero results doesn't cover

Zero results doesn't mean "safe to delete." It means "not found in Python code."

Dynamic access: getattr(obj, field_name) with the field name in a variable won't be detected. Check separately:

grep -rn "getattr" --include="*.py" ./ | grep your_field
Enter fullscreen mode Exit fullscreen mode

Django templates: {{ page.html }} lives in .html files. colref only scans .py. Check templates separately:

grep -rn "your_field" --include="*.html" ./
Enter fullscreen mode Exit fullscreen mode

Django Admin, Forms, and DRF Serializers: This is the easiest one to miss. None of these are detected:

# Django Admin
class ArticleAdmin(admin.ModelAdmin):
    list_display = ['title']
    list_filter = ['title']

# Django Forms
class ArticleForm(forms.ModelForm):
    class Meta:
        fields = ['title']

# DRF Serializer
class ArticleSerializer(serializers.ModelSerializer):
    class Meta:
        fields = ['title']
Enter fullscreen mode Exit fullscreen mode

Determining which model the string 'title' in a list refers to requires tracing class inheritance, which colref doesn't handle yet. Check these separately:

grep -rn "your_field" --include="*.py" ./
Enter fullscreen mode Exit fullscreen mode

This grep has the same noise problem as full-text search. Opening the Admin, Forms, and Serializer files directly is more reliable. In most projects there aren't many of them.

These are exactly the places a Django beginner might not think to check. You verify the views, the serializers feel obvious after you remember them, but Django Admin is easy to forget, especially if the admin configuration lives in a file you rarely open. I've seen list_display hold a reference to a field that had been "confirmed deleted" twice already. The admin file just wasn't in anyone's mental checklist.

Once you've checked all of the above and colref returns zero, that's a grounded deletion: confirmed in Python code, checked getattr, templates, Admin/Forms/Serializers. Not "I think it's probably fine."

Checking Admin/Forms/Serializers by eye sounds tedious, but in practice it takes a few minutes. These files tend to be organized by model. Open admin.py, search for the model name, check list_display and related attributes. Open serializers.py, find the relevant serializer, check fields. Open forms.py if you have one. It's not a grep problem, it's an "open three files and look" problem. That's manageable even without a tool.

colref (Python attribute accesses) + grep (dynamic patterns and templates) + manual check (Admin/Forms/Serializers) covers the vast majority of real-world Django codebases. There are edge cases colref doesn't handle yet; the Detection Patterns docs list them. For most projects, this three-part check is enough to move from "I think it's probably unused" to "I have confirmed it's unused."

The five-step procedure

# 1. Check for field accesses in Python code
colref check --orm django --model YourModel --field your_field ./

# 2. Check for dynamic access
grep -rn "getattr" --include="*.py" ./ | grep your_field

# 3. Check templates
grep -rn "your_field" --include="*.html" ./

# 4. Delete the field and generate the migration
python manage.py makemigrations --name remove_your_field

# 5. Apply to the schema
python manage.py migrate
Enter fullscreen mode Exit fullscreen mode

Steps 2 and 3 are still grep — colref doesn't solve everything. But step 1 cuts 1,202 results down to 10. The "too many results to check, left it alone" situation: this is the one place that changes.

The difference between "probably unused, I think" and "zero results in Python code, no getattr, nothing in templates" is real. If something breaks in production, knowing what you checked tells you exactly where to look. You know the cause came from outside your checked scope: a dynamic reference, a template, a pattern colref doesn't handle yet. The cause is narrowed. Grounded deletion makes debugging faster when things go wrong.

One more thing about step 4 and 5: don't skip makemigrations --name. Giving the migration a descriptive name like remove_summary_field makes the history readable. Six months from now, someone scanning migration filenames can see what changed and when without opening every file.

Also: run the migration locally and make sure your test suite passes before deploying. When you're confident about a deletion it's tempting to skip the verification. Don't. If a test factory is still setting the deleted field, the tests will catch it before production does.

The whole process — run colref, check the checklist, generate the migration, run tests locally, deploy — takes maybe 30 minutes for a field that's actually unused. Compare that to leaving the field there indefinitely because you couldn't confirm it was safe to remove.

First run: try a field you know is used

If you don't have a candidate field in mind, try a field you know is used, something like title on Article:

colref check --orm django --model Article --field title ./
Enter fullscreen mode Exit fullscreen mode

If title is in use, you'll get multiple results with file and line number:

app/views.py:42
app/serializers.py:18
app/templates/article_detail.py:11
Enter fullscreen mode Exit fullscreen mode

Seeing what a real result looks like makes it easier to judge zero results later. Then try a field you've been wondering about. Close to zero? Move to steps 2 and 3.

From installation to first run: under five minutes. No need to read the README first. Running it is faster than reading about it.

What do you do when you get 3 results? Open all three. For each one: is this code still running in production? If a reference is inside a function that's clearly dead code, something wrapped in if False or commented out, it doesn't count. If it's live code, the field is still in use. But 3 results is a manageable number. You can make that judgment call.

What if you get 0 results? Don't stop there. Run steps 2 and 3. Zero from colref, zero from getattr grep, zero from template grep: that's three independent checks. At that point, also check your Admin, Forms, and Serializer files by eye. If all of that is clear, you have something solid to stand on.

Even without a deletion candidate right now, colref is useful for routine schema review. Scan migration history, spot something that looks unused, run colref. Zero results? It goes on the deletion candidate list. "Probably unused" becomes "not referenced in Python code" in 30 seconds.

I do this periodically on projects I maintain. Every few months I scan the migration history for fields I don't recognize, run colref on them, and build a short list. It takes maybe 20 minutes and usually turns up one or two candidates worth investigating further. Some end up staying because they're used in ways colref doesn't detect yet. But a few always turn out to be genuinely gone: references removed over time, nobody noticed, nobody cleaned it up. Those get deleted.

colref is still in development. If something doesn't work or you get unexpected results, open an issue at github.com/shinagawa-web/colref. Real usage feedback is what shapes the priorities. A bug report with a concrete example is more useful than ten feature requests.

colref currently supports Django and Rails. For the roadmap, see issue #74.

If you find a field that colref misses, something it should have flagged but didn't, that's especially useful to report. The detection gap around Admin, Forms, and Serializers is a known limitation, but there may be patterns in your codebase that nobody's encountered yet. The tool gets better with more real-world cases.

How many fields are you sitting on that you haven't been able to delete?

Source: dev.to

arrow_back Back to Tutorials