The previous articles each closed one leak. A real Streamlit + pandas dashboard leaks through five at once. This is the whole session — obfuscate, run, prompt, apply — with every channel accounted for.
Why a dashboard is the hard case
The earlier articles in this series each took one surface: the runtime workspace and .env, the transparent proxy, pandas column names.
A data dashboard is where all of them collide, because a Streamlit app is the rare Python project that is simultaneously:
- Code the AI reads and edits (business logic, service classes).
-
A schema expressed as pandas column names (
churn_probability,annual_salary). - Sample data sitting in a CSV the AI can open.
-
Secrets — a database URL, an API key — in
.env. -
Framework structure — Streamlit's
st.XAPI and page discovery — that must survive untouched or the app won't render.
Use Claude Code, Codex or Cursor naively on this project and all five leave your machine on the first prompt. This article runs one realistic feature request — "add a churn-by-region bar chart with a CSV export" — through the full PromptCape workflow and accounts for every channel.
The project
A staffing/HR dashboard. Streamlit front end, pandas for the aggregation, SQLAlchemy for the source rows, a .env for credentials, and a fixture CSV for local runs.
attrition-dashboard/
├── Dashboard.py # Streamlit entry point
├── pages/
│ └── 1_📊_Capacity.py
├── staffing/
│ ├── database.py # SQLAlchemy engine/session
│ └── service.py # StaffingService — the business logic
├── data/
│ └── sample_attrition.csv # fixture: real column names, fake rows
├── .env # DATABASE_URL, SECRET_KEY, ACTIVITY_MONTHS
└── requirements.txt
Dashboard.py, lightly trimmed:
import streamlit as st
import pandas as pd
from staffing.database import get_session
from staffing.service import StaffingService
def load_frame(year: int) -> pd.DataFrame:
with get_session() as session:
rows = StaffingService(session).list_assignments(year=year)
df = pd.DataFrame(rows)
df = df[df["employment_status"] == "active"]
df["tenure_years"] = df["tenure_months"] / 12
return df
def main():
st.set_page_config(page_title="Attrition", layout="wide")
st.title("Attrition Dashboard")
year = st.sidebar.selectbox("Year", range(2020, 2030))
df = load_frame(year)
st.metric("Avg churn risk", f"{df['churn_probability'].mean():.0%}")
st.dataframe(df, hide_index=True)
if __name__ == "__main__":
main()
Five lines of this file leak something sensitive, and they leak different things through different mechanisms. That's the point.
The five channels, and what closes each
| # | Channel | What leaks in Dashboard.py
|
Closed by |
|---|---|---|---|
| 1 | Business identifiers |
StaffingService, list_assignments, load_frame
|
Identifier rename → Cls_… / mtd_… (article 1) |
| 2 | Column names |
"employment_status", "tenure_months", "churn_probability"
|
Column registry → col_… (article 9) |
| 3 | Sample data |
data/sample_attrition.csv header row |
Fixture header-rewrite (article 9) |
| 4 | Secrets |
.env → DATABASE_URL, SECRET_KEY
|
.env pointer + promptcape run injection (article 7) |
| 5 | Framework structure |
st.set_page_config, st.title, st.metric, st.dataframe, st.sidebar
|
StreamlitDetector exclusion list (streamlit example) |
The whole workflow below is just: turn each of these on, in order, and verify nothing broke.
Step 0 — health check on the real source
Always start green. If the source doesn't run, obfuscation noise will be indistinguishable from your own bug.
cd ~/projects/attrition-dashboard
streamlit run Dashboard.py # renders, charts load, no traceback → good baseline
Step 1 — obfuscate
promptcape obfuscate --language python --verify .
# -> ~/.promptcape/cache/a1b2c3d4/
What the engine does in one pass, mapped to the five channels:
-
(1) Identifiers renamed.
StaffingService→Cls_…,list_assignments→mtd_…,load_frame→mtd_…, thedf/rows/sessionlocals →fld_…. Consistent acrossDashboard.py,pages/, andstaffing/service.py. -
(2) Columns registered.
PandasColumnDetectorwalks the column positions —df["employment_status"],df["tenure_months"], the assigneddf["tenure_years"],df['churn_probability'].mean()— and maps each to acol_…hash.dfis provably a DataFrame (assigned frompd.DataFrame(rows)), so its subscripts are in scope; asession["…"]access would not be. -
(3) Fixture rewritten.
data/sample_attrition.csv's header row is rewritten with the same registry, so the obfuscated code'scol_…names line up with the fixture andstreamlit runstill works on sample data. -
(4)
.envleft behind. No.envis copied. A.env.promptcape-pointeris written with the source path and a note to usepromptcape run. -
(5) Streamlit API frozen. The moment
import streamlit as stis seen, ~80 module names (set_page_config,title,metric,dataframe,sidebar,selectbox, …) join the project-wide exclusion list.StaffingService-style identifiers are not on it.
The --verify step does static import resolution (no execution) and reports any import that wouldn't resolve — the canonical catch being a stdlib import accidentally renamed because a user identifier collided.
What Dashboard.py looks like in the cache:
import streamlit as st
import pandas as pd
from staffing.database import get_session
from staffing.service import Cls_7b2a1f0c
def mtd_4d9e1a22(year: int) -> pd.DataFrame:
with get_session() as fld_5a8c3e91:
fld_e5f6a7b8 = Cls_7b2a1f0c(fld_5a8c3e91).mtd_2c1b9d0a(year=year)
fld_9c8d7e6f = pd.DataFrame(fld_e5f6a7b8)
fld_9c8d7e6f = fld_9c8d7e6f[fld_9c8d7e6f["col_4a1f8b2e"] == "active"]
fld_9c8d7e6f["col_7d3e9a14"] = fld_9c8d7e6f["col_b6c2f085"] / 12
return fld_9c8d7e6f
def main():
st.set_page_config(page_title="Attrition", layout="wide")
st.title("Attrition Dashboard")
year = st.sidebar.selectbox("Year", range(2020, 2030))
fld_9c8d7e6f = mtd_4d9e1a22(year)
st.metric("Avg churn risk", f"{fld_9c8d7e6f['col_e2d4b7c9'].mean():.0%}")
st.dataframe(fld_9c8d7e6f, hide_index=True)
if __name__ == "__main__":
main()
Every st.X is intact (renders correctly). Every column is a hash (no schema leak). StaffingService.list_assignments is gone (no business-logic leak). The page-config string "Attrition" and the metric label "Avg churn risk" stay — they're user-visible UI text, not schema, and the never-rewrite-values rule keeps them readable so the AI can reason about the layout.
Step 2 — run the obfuscated workspace
This is the step that separates Python from Java: the developer runs the workspace, so the secrets have to arrive at launch without ever landing on disk in the cache.
cd ~/.promptcape/cache/a1b2c3d4
promptcape run streamlit run Dashboard.py
promptcape run:
- Resolves the source project from the cache.
- Parses
~/projects/attrition-dashboard/.env. - Spawns
streamlit run Dashboard.pywithcwd= cache and the child environment = OS env + the parsed.envvalues. - Inherits the TTY so Streamlit's browser-open and hot-reload behave normally.
os.getenv("DATABASE_URL") inside the workspace returns the real URL; the cache directory still contains no secret. The dashboard renders identically to Step 0 — same charts, same numbers — but every file an AI could read is obfuscated.
Step 3 — connect the AI (two equivalent paths)
Path A — CLI, explicit. Point Claude Code at the cache directory and work there. Apply when done.
Path B — transparent proxy in Cursor. Run the PromptCape proxy, open Cursor's terminal, type pcc (the ANTHROPIC_BASE_URL-exporting launcher). The user types prompts in plain language using real names — "add a churn chart" — the proxy translates churn → its mapping on the way out, and translates the model's reply back on the way in. The developer never sees a hash; the provider never sees a real name.
Either way, the model on the other end receives the obfuscated workspace and an obfuscated prompt.
Step 4 — the feature request
The developer asks (real names, via the proxy):
Add a bar chart of average churn probability by region, and a button to download the filtered table as CSV.
What the model receives is phrased in col_… space and works against the obfuscated frame. What it writes:
st.subheader("Churn by region")
fld_chart = (
fld_9c8d7e6f.groupby("col_1f7b3d6a")["col_e2d4b7c9"]
.mean()
.reset_index()
)
st.bar_chart(fld_chart, x="col_1f7b3d6a", y="col_e2d4b7c9")
st.download_button(
label="Download CSV",
data=fld_9c8d7e6f.to_csv(index=False),
file_name="attrition.csv",
mime="text/csv",
)
The model correctly reused the existing col_e2d4b7c9 (churn) and col_1f7b3d6a (region) columns — it had seen them in the file — and used st.bar_chart / st.download_button from the intact Streamlit surface. It had no idea what any of it meant, and didn't need to.
Step 5 — apply and verify
promptcape run streamlit run Dashboard.py # the AI's change, still in obfuscated space → renders
promptcape apply # reverse-map col_/fld_/mtd_/Cls_ back to real names
streamlit run Dashboard.py # de-obfuscated, on real source → renders
After apply, the new code reads in the developer's own vocabulary:
st.subheader("Churn by region")
chart = (
df.groupby("region")["churn_probability"]
.mean()
.reset_index()
)
st.bar_chart(chart, x="region", y="churn_probability")
st.download_button(
label="Download CSV",
data=df.to_csv(index=False),
file_name="attrition.csv",
mime="text/csv",
)
col_1f7b3d6a → region, col_e2d4b7c9 → churn_probability via the column registry; fld_chart was AI-invented, so it comes back as the developer renames it (here, chart).
The CSV label, file name, and MIME type were strings all along — untouched start to finish.
The failure modes, consolidated
Each step has one characteristic way to fail. Knowing them turns a confusing render into a one-line fix.
| Symptom | Cause | Fix |
|---|---|---|
AttributeError: module 'streamlit' has no attribute 'fld_xxx' |
A user identifier collided with an st.X name not on the exclusion list (error, exception are deliberately off it) |
Add the name to ~/.promptcape/mappings/<hash>-exclusions.txt
|
KeyError: 'col_xxxxxxxx' after obfuscation |
A column reached the code through an un-inferrable DataFrame, so it wasn't registered — or the fixture header wasn't rewritten |
grep the cache for the hash to see the real name in context; confirm the var is DataFrame-inferred |
ValueError: invalid literal for int(): 'REDACTED' at launch |
Ran streamlit run directly instead of promptcape run; no env vars injected |
Re-run through promptcape run
|
| Dashboard renders but a column label shows a hash | A col_… leaked into UI text (e.g., st.write(df.columns)) |
Expected — column names are obfuscated; if you display them, display the reverse-mapped frame after apply, or whitelist that column |
--verify flags a renamed stdlib import |
A user identifier collided with a stdlib top-level module name | The detectors add it to the exclusion list; re-obfuscate |
The end-to-end threat boundary
What actually left the machine, and what didn't, for the whole session:
| Channel | Reached the AI provider? |
|---|---|
Business class/method names (StaffingService.list_assignments) |
No — hashed |
Column vocabulary (churn_probability, annual_salary, region) |
No — registry-mapped |
| Sample data column meaning | No — fixture header rewritten |
| Database URL, secret key | No — never copied to the workspace, injected at subprocess launch |
| Streamlit structure, UI labels, the shape of the analysis | Yes — by design; this is what lets the AI be useful |
Row values in the fixture (142000, 0.12) |
The numbers, yes; their meaning, no (header-less) |
The line PromptCape draws: the AI sees enough structure to write correct code and nothing it can use to reconstruct your business. A groupby is visibly a groupby; that it groups attrition risk by region is not on the wire.
Conclusion
A Streamlit dashboard is the integration test for this whole series, because no single technique covers it. It takes identifier renaming, pandas column registration, fixture header rewriting, subprocess-launch secret injection, and a framework exclusion list — and they have to compose without stepping on each other.
The three things that make the composition work:
-
One pass, five channels, consistent hashing. Identifiers, columns, and fixture headers all draw from the same registry, so a name that appears as a method, a column, and a CSV header round-trips as one mapping. Independent subsystems sharing one source of truth is what keeps
applycorrect. -
Run the workspace, don't just read it. The dashboard has to render at every step or you can't tell a real bug from an obfuscation artifact.
promptcape runis what makes "verify after each step" possible without ever putting secrets on disk. -
Useful requires leaking structure — so leak only structure. The AI needs to see that it's a
groupby().agg()over a frame with N columns. It does not need the column names, the credentials, or the rows. Drawing the line exactly there is the entire job.
PromptCape ships open for trial at https://promptcape.com/ — free for 3 months, no credit card required. Streamlit, pandas, SQLAlchemy, and the promptcape run wrapper are all in the same JAR; point it at the dashboard directory and the language and frameworks are auto-detected.