Turning Manual Ops Into a 10-Minute Task

I once turned a 2-week manual data update process into a 10-minute automated pipeline by writing a PHP script that ingested a vendor spreadsheet, normalized everything into a temporary MySQL database, and surfaced the result in a review dashboard before pushing to production. This post is the short version of that project — the tools I used, the approach, and the outcome — for any developer staring at a tedious manual ops process and wondering whether it's worth automating. (Spoiler: it almost always is.)

TL;DR

Before: ~10 business days of careful manual data entry against a fragile legacy database, every six months.
After: ~10-minute automated run, ~30-second push to prod, single dashboard for human review.
Stack: PHP for ingestion and transforms, a temporary MySQL database for staging and validation, a web dashboard for human review.
Why it worked: The repetitive parts were genuinely repetitive (same enums, same transforms, same edge cases) and a human still got the final sign-off before anything hit production.
Outcome: ~90%+ reduction in customer-facing data issues, plus dev hours and company time saved every cycle.

The BEFORE Process

I worked for a marketing company who's job it was to update a major restaurant's nutrition information with ingredients, UOMs, and caloric content. We would get a new spreadsheet full of updates we needed to apply to the database to update the website's display for 8m customers.

This process typically took around 10 business days (2 weeks) to complete all the changes. We updated this I believe every 6 months or so. A lot of manual work, checking, re-checking, typing very carefully to maintain the dwindling data integrity without introducing new issues. Very picky old system that had to be handled a certain way so it would correctly feed the iOS app. Legacy code and DB setup. Very tedious and exhausting to complete.

My Approach To Improve

After completing this a handful of times I saw that there were common assumptions I could make that would shortcut the time we needed to complete this process, and automate many of the repetitive tasks. This would of course increase the validity of data (no human error) and allow easier checking of final results as well (verified against the original source of truth (the spreadsheet).

Tools I Used

The best tools I had at that time was PHP as the scripting language for specific tasks, and a temporary MYSQL DB to help check and manipulate data to speed things along.

The Solution

I wrote some logic in PHP to ingest the spreadsheet data, match all the fields against common enums per category, and applied transforms for specific labels and description content, and then piped that into the final database only after it was tested, reviewed on a dashboard for quality and ready for production.

Essentially the fix was to let the computer do as much processing as it could, have a human verify its work when done, and then automatically apply it to the target system, without the tedium of checking things one by one.

The Result

The process with my script and DB system would take only a 10 minute run to process the data and display the final values. We could check it all on a web page and make adjustments to anything that was off, and then it was around 30 seconds to push to the prod DB. This saved devs hassle, the company money and time, and the customer issues at around a 90%+ rate. Not a bad outcome in the end. It is one of the projects I am most proud of to date and it was my original thought to even work on it. This is the kind of thing I love to do.

FAQ

Why a temporary MySQL database instead of validating in code?

Spreadsheets have repetition, contradictions, and edge cases that are far easier to spot with SQL than with imperative validation code. A staging table with constraints catches duplicates and bad references immediately, and the dashboard can run any ad-hoc query against it before production gets touched.

Why keep a human in the loop if the script is reliable?

The data feeds an iOS app used by 8 million customers. A bad row in production isn't a bug — it's a customer-facing nutrition error. The 30 seconds it takes a human to scan a dashboard is cheap insurance against the kind of mistake nobody wants to explain in a postmortem.

Could you have used a more modern stack?

Sure — Python with pandas would have been a natural fit, and Postgres would have given me more flexibility. But PHP and MySQL were what the company already ran, and the entire project shipped without asking anyone for new infrastructure. That's part of why it actually got built.

What would you change if you did it again today?

I'd add automated diff-vs-previous-cycle reports so reviewers see only what changed, version-control the transform rules, and write per-row confidence scores so the dashboard can highlight low-confidence entries first. The core architecture — ingest → stage → review → push — would stay exactly the same.