When an API stopped returning JSON, I switched to Selenium and added AI summaries

python dev.to

I built a parser around the DNB Business Directory API. At first, everything worked fine — simple requests, JSON responses, clean and fast.

Then it suddenly stopped working.

My script started getting empty or unusable responses, even though the same requests still worked perfectly in the browser. Status codes were often 200, but the data was missing or incomplete.

After trying different headers, sessions, retries, and delays, it became clear that this wasn’t a normal API issue. Most likely, anti-bot filtering.

What I changed

Instead of trying to bypass it at the HTTP level, I switched to Selenium.

The new approach:

open the site in a real browser
search companies by keyword + country
paginate through results
collect company profile links
parse data directly from rendered pages

This worked immediately because it behaves like a real user session.

Then I added AI

After collecting company data, I wanted to understand what these companies actually do.

So I added a second stage:

scrape the company website (home + a few internal pages)
clean the text
send it to Groq
generate a short summary + list of services

I also added a simple keyword-based filter to detect risky content (gambling, adult, etc.) before sending data to the LLM.

Final pipeline

Selenium → company profiles → websites → multi-page scraping → AI summaries → Excel

Result

Interestingly, the workaround ended up being more useful than the original solution.

Instead of just collecting structured data, I now also get:

a quick description of each company
basic content classification
Demo / project

https://dibara512.github.io/my-site/

Source: dev.to

arrow_back Back to Tutorials