Streamlit dashboards meet AI coding: an end-to-end privacy workflow

The previous articles each closed one leak. A real Streamlit + pandas dashboard leaks through five at once. This is the whole session — obfuscate, run, prompt, apply — with every channel accounted for.

Why a dashboard is the hard case

The earlier articles in this series each took one surface: the runtime workspace and .env, the transparent proxy, pandas column names.
A data dashboard is where all of them collide, because a Streamlit app is the rare Python project that is simultaneously:

Code the AI reads and edits (business logic, service classes).
A schema expressed as pandas column names (churn_probability, annual_salary).
Sample data sitting in a CSV the AI can open.
Secrets — a database URL, an API key — in .env.
Framework structure — Streamlit's st.X API and page discovery — that must survive untouched or the app won't render.

Use Claude Code, Codex or Cursor naively on this project and all five leave your machine on the first prompt. This article runs one realistic feature request — "add a churn-by-region bar chart with a CSV export" — through the full PromptCape workflow and accounts for every channel.

The project

A staffing/HR dashboard. Streamlit front end, pandas for the aggregation, SQLAlchemy for the source rows, a .env for credentials, and a fixture CSV for local runs.

attrition-dashboard/
├── Dashboard.py                 # Streamlit entry point
├── pages/
│   └── 1_📊_Capacity.py
├── staffing/
│   ├── database.py              # SQLAlchemy engine/session
│   └── service.py               # StaffingService — the business logic
├── data/
│   └── sample_attrition.csv     # fixture: real column names, fake rows
├── .env                         # DATABASE_URL, SECRET_KEY, ACTIVITY_MONTHS
└── requirements.txt

Dashboard.py, lightly trimmed:

import streamlit as st
import pandas as pd

from staffing.database import get_session
from staffing.service import StaffingService


def load_frame(year: int) -> pd.DataFrame:
    with get_session() as session:
        rows = StaffingService(session).list_assignments(year=year)
    df = pd.DataFrame(rows)
    df = df[df["employment_status"] == "active"]
    df["tenure_years"] = df["tenure_months"] / 12
    return df


def main():
    st.set_page_config(page_title="Attrition", layout="wide")
    st.title("Attrition Dashboard")
    year = st.sidebar.selectbox("Year", range(2020, 2030))

    df = load_frame(year)
    st.metric("Avg churn risk", f"{df['churn_probability'].mean():.0%}")
    st.dataframe(df, hide_index=True)


if __name__ == "__main__":
    main()

Five lines of this file leak something sensitive, and they leak different things through different mechanisms. That's the point.

The five channels, and what closes each

#	Channel	What leaks in `Dashboard.py`	Closed by
1	Business identifiers	`StaffingService`, `list_assignments`, `load_frame`	Identifier rename → `Cls_…` / `mtd_…` (article 1)
2	Column names	`"employment_status"`, `"tenure_months"`, `"churn_probability"`	Column registry → `col_…` (article 9)
3	Sample data	`data/sample_attrition.csv` header row	Fixture header-rewrite (article 9)
4	Secrets	`.env` → `DATABASE_URL`, `SECRET_KEY`	`.env` pointer + `promptcape run` injection (article 7)
5	Framework structure	`st.set_page_config`, `st.title`, `st.metric`, `st.dataframe`, `st.sidebar`	`StreamlitDetector` exclusion list (streamlit example)

The whole workflow below is just: turn each of these on, in order, and verify nothing broke.

Step 0 — health check on the real source

Always start green. If the source doesn't run, obfuscation noise will be indistinguishable from your own bug.

cd ~/projects/attrition-dashboard
streamlit run Dashboard.py     # renders, charts load, no traceback → good baseline

Step 1 — obfuscate

promptcape obfuscate --language python --verify .
# -> ~/.promptcape/cache/a1b2c3d4/

What the engine does in one pass, mapped to the five channels:

(1) Identifiers renamed. StaffingService → Cls_…, list_assignments → mtd_…, load_frame → mtd_…, the df/rows/session locals → fld_…. Consistent across Dashboard.py, pages/, and staffing/service.py.
(2) Columns registered. PandasColumnDetector walks the column positions — df["employment_status"], df["tenure_months"], the assigned df["tenure_years"], df['churn_probability'].mean() — and maps each to a col_… hash. df is provably a DataFrame (assigned from pd.DataFrame(rows)), so its subscripts are in scope; a session["…"] access would not be.
(3) Fixture rewritten. data/sample_attrition.csv's header row is rewritten with the same registry, so the obfuscated code's col_… names line up with the fixture and streamlit run still works on sample data.
(4) .env left behind. No .env is copied. A .env.promptcape-pointer is written with the source path and a note to use promptcape run.
(5) Streamlit API frozen. The moment import streamlit as st is seen, ~80 module names (set_page_config, title, metric, dataframe, sidebar, selectbox, …) join the project-wide exclusion list. StaffingService-style identifiers are not on it.

The --verify step does static import resolution (no execution) and reports any import that wouldn't resolve — the canonical catch being a stdlib import accidentally renamed because a user identifier collided.

What Dashboard.py looks like in the cache:

import streamlit as st
import pandas as pd

from staffing.database import get_session
from staffing.service import Cls_7b2a1f0c


def mtd_4d9e1a22(year: int) -> pd.DataFrame:
    with get_session() as fld_5a8c3e91:
        fld_e5f6a7b8 = Cls_7b2a1f0c(fld_5a8c3e91).mtd_2c1b9d0a(year=year)
    fld_9c8d7e6f = pd.DataFrame(fld_e5f6a7b8)
    fld_9c8d7e6f = fld_9c8d7e6f[fld_9c8d7e6f["col_4a1f8b2e"] == "active"]
    fld_9c8d7e6f["col_7d3e9a14"] = fld_9c8d7e6f["col_b6c2f085"] / 12
    return fld_9c8d7e6f


def main():
    st.set_page_config(page_title="Attrition", layout="wide")
    st.title("Attrition Dashboard")
    year = st.sidebar.selectbox("Year", range(2020, 2030))

    fld_9c8d7e6f = mtd_4d9e1a22(year)
    st.metric("Avg churn risk", f"{fld_9c8d7e6f['col_e2d4b7c9'].mean():.0%}")
    st.dataframe(fld_9c8d7e6f, hide_index=True)


if __name__ == "__main__":
    main()

Every st.X is intact (renders correctly). Every column is a hash (no schema leak). StaffingService.list_assignments is gone (no business-logic leak). The page-config string "Attrition" and the metric label "Avg churn risk" stay — they're user-visible UI text, not schema, and the never-rewrite-values rule keeps them readable so the AI can reason about the layout.

Step 2 — run the obfuscated workspace

This is the step that separates Python from Java: the developer runs the workspace, so the secrets have to arrive at launch without ever landing on disk in the cache.

cd ~/.promptcape/cache/a1b2c3d4
promptcape run streamlit run Dashboard.py

promptcape run:

Resolves the source project from the cache.
Parses ~/projects/attrition-dashboard/.env.
Spawns streamlit run Dashboard.py with cwd = cache and the child environment = OS env + the parsed .env values.
Inherits the TTY so Streamlit's browser-open and hot-reload behave normally.

os.getenv("DATABASE_URL") inside the workspace returns the real URL; the cache directory still contains no secret. The dashboard renders identically to Step 0 — same charts, same numbers — but every file an AI could read is obfuscated.

Step 3 — connect the AI (two equivalent paths)

Path A — CLI, explicit. Point Claude Code at the cache directory and work there. Apply when done.

Path B — transparent proxy in Cursor. Run the PromptCape proxy, open Cursor's terminal, type pcc (the ANTHROPIC_BASE_URL-exporting launcher). The user types prompts in plain language using real names — "add a churn chart" — the proxy translates churn → its mapping on the way out, and translates the model's reply back on the way in. The developer never sees a hash; the provider never sees a real name.

Either way, the model on the other end receives the obfuscated workspace and an obfuscated prompt.

Step 4 — the feature request

The developer asks (real names, via the proxy):

Add a bar chart of average churn probability by region, and a button to download the filtered table as CSV.

What the model receives is phrased in col_… space and works against the obfuscated frame. What it writes:

st.subheader("Churn by region")
fld_chart = (
    fld_9c8d7e6f.groupby("col_1f7b3d6a")["col_e2d4b7c9"]
    .mean()
    .reset_index()
)
st.bar_chart(fld_chart, x="col_1f7b3d6a", y="col_e2d4b7c9")

st.download_button(
    label="Download CSV",
    data=fld_9c8d7e6f.to_csv(index=False),
    file_name="attrition.csv",
    mime="text/csv",
)

The model correctly reused the existing col_e2d4b7c9 (churn) and col_1f7b3d6a (region) columns — it had seen them in the file — and used st.bar_chart / st.download_button from the intact Streamlit surface. It had no idea what any of it meant, and didn't need to.

Step 5 — apply and verify

promptcape run streamlit run Dashboard.py   # the AI's change, still in obfuscated space → renders
promptcape apply                            # reverse-map col_/fld_/mtd_/Cls_ back to real names
streamlit run Dashboard.py                  # de-obfuscated, on real source → renders

After apply, the new code reads in the developer's own vocabulary:

st.subheader("Churn by region")
chart = (
    df.groupby("region")["churn_probability"]
    .mean()
    .reset_index()
)
st.bar_chart(chart, x="region", y="churn_probability")

st.download_button(
    label="Download CSV",
    data=df.to_csv(index=False),
    file_name="attrition.csv",
    mime="text/csv",
)

col_1f7b3d6a → region, col_e2d4b7c9 → churn_probability via the column registry; fld_chart was AI-invented, so it comes back as the developer renames it (here, chart).
The CSV label, file name, and MIME type were strings all along — untouched start to finish.

The failure modes, consolidated

Each step has one characteristic way to fail. Knowing them turns a confusing render into a one-line fix.

Symptom	Cause	Fix
`AttributeError: module 'streamlit' has no attribute 'fld_xxx'`	A user identifier collided with an `st.X` name not on the exclusion list (`error`, `exception` are deliberately off it)	Add the name to `~/.promptcape/mappings/<hash>-exclusions.txt`
`KeyError: 'col_xxxxxxxx'` after obfuscation	A column reached the code through an un-inferrable DataFrame, so it wasn't registered — or the fixture header wasn't rewritten	`grep` the cache for the hash to see the real name in context; confirm the var is DataFrame-inferred
`ValueError: invalid literal for int(): 'REDACTED'` at launch	Ran `streamlit run` directly instead of `promptcape run`; no env vars injected	Re-run through `promptcape run`
Dashboard renders but a column label shows a hash	A `col_…` leaked into UI text (e.g., `st.write(df.columns)`)	Expected — column names are obfuscated; if you display them, display the reverse-mapped frame after `apply`, or whitelist that column
`--verify` flags a renamed stdlib import	A user identifier collided with a stdlib top-level module name	The detectors add it to the exclusion list; re-obfuscate

The end-to-end threat boundary

What actually left the machine, and what didn't, for the whole session:

Channel	Reached the AI provider?
Business class/method names (`StaffingService.list_assignments`)	No — hashed
Column vocabulary (`churn_probability`, `annual_salary`, `region`)	No — registry-mapped
Sample data column meaning	No — fixture header rewritten
Database URL, secret key	No — never copied to the workspace, injected at subprocess launch
Streamlit structure, UI labels, the shape of the analysis	Yes — by design; this is what lets the AI be useful
Row values in the fixture (`142000`, `0.12`)	The numbers, yes; their meaning, no (header-less)

The line PromptCape draws: the AI sees enough structure to write correct code and nothing it can use to reconstruct your business. A groupby is visibly a groupby; that it groups attrition risk by region is not on the wire.

Conclusion

A Streamlit dashboard is the integration test for this whole series, because no single technique covers it. It takes identifier renaming, pandas column registration, fixture header rewriting, subprocess-launch secret injection, and a framework exclusion list — and they have to compose without stepping on each other.

The three things that make the composition work:

One pass, five channels, consistent hashing. Identifiers, columns, and fixture headers all draw from the same registry, so a name that appears as a method, a column, and a CSV header round-trips as one mapping. Independent subsystems sharing one source of truth is what keeps apply correct.
Run the workspace, don't just read it. The dashboard has to render at every step or you can't tell a real bug from an obfuscation artifact. promptcape run is what makes "verify after each step" possible without ever putting secrets on disk.
Useful requires leaking structure — so leak only structure. The AI needs to see that it's a groupby().agg() over a frame with N columns. It does not need the column names, the credentials, or the rows. Drawing the line exactly there is the entire job.

PromptCape ships open for trial at https://promptcape.com/ — free for 3 months, no credit card required. Streamlit, pandas, SQLAlchemy, and the promptcape run wrapper are all in the same JAR; point it at the dashboard directory and the language and frameworks are auto-detected.