SEO tools are useful, but many of them are either too heavy, too expensive, or too dependent on dashboards.
So I built seoextract — a simple Python CLI package that audits a website directly from the terminal and gives a clear SEO report with scores, grades, and actionable issues.
The goal was simple:
«Enter a website URL. Get an SEO audit. Understand what needs to be fixed.»
What is "seoextract"?
"seoextract" is a Python-based SEO audit tool that crawls a website and checks common SEO problems such as:
- Missing or weak title tags
- Missing meta descriptions
- Thin content
- Missing canonical tags
- Poor internal linking
- Missing schema markup
- Missing viewport meta tag
- Basic page-level SEO quality
It then calculates a score and grade for the website.
Example output:
seoextract audit https://www.python.org --max-pages 1
Output:
SEO Audit Complete
URL: https://www.python.org
Pages crawled: 1
Site score: 75.0
Grade: B
Total issues: 3
Critical: 0
Warnings: 2
Info: 1
Why I Built It
Most beginners learn SEO as a checklist:
- Add a title
- Add a meta description
- Use headings properly
- Add internal links
- Add schema markup
- Improve content length
But when building real websites, manually checking every page becomes boring and repetitive.
I wanted to build something that could automate the basic audit process.
At the same time, I wanted this project to help me improve my Python skills, especially in:
- Web scraping
- CLI development
- Package structuring
- SEO rule design
- Publishing Python packages to PyPI
That is how "seoextract" started.
How It Works
The workflow is straightforward.
First, the user runs a command from the terminal:
seoextract audit https://example.com
Then "seoextract" performs these steps:
- Fetches the page HTML
- Parses the page content
- Extracts SEO-related elements
- Applies built-in SEO rules
- Calculates a score
- Displays a structured audit report
The tool checks both technical SEO signals and content-level signals.
For example, if a page has no meta description, the tool reports it as an issue and gives a fix suggestion:
[WARNING] Missing Meta Description
fix: Add a meta description between 50–160 characters summarising the page.
Example Audit
Running:
seoextract audit https://example.com
May return something like:
Audit Summary
site_score : 59.0
grade : D
pages_crawled : 1
total_issues : 7
safe_browsing : True
Detected issues:
[WARNING] Title Too Short
fix: Title is 14 chars. Expand to at least 50 characters.
[WARNING] Missing Meta Description
fix: Add a meta description between 50–160 characters summarising the page.
[WARNING] Thin Content
fix: Page has only 21 words. Aim for at least 300 words of meaningful content.
[INFO] Missing Canonical Tag
fix: Add a canonical tag to prevent duplicate content issues.
[INFO] Poor Internal Linking
fix: Add at least 2 internal links to help search engines discover related pages.
[INFO] No Schema Markup
fix: Add Schema.org structured data to improve search result appearance.
This makes the report beginner-friendly because it does not just say what is wrong. It also tells what needs to be fixed.
Features
The current version includes:
- Website SEO auditing from the terminal
- Page crawling with max-page control
- SEO issue detection
- Score and grade calculation
- Human-readable fix suggestions
- CLI interface
- PyPI package support
The command format is simple:
seoextract audit --max-pages
Example:
seoextract audit https://www.python.org --max-pages 1
What I Learned While Building It
This project taught me that even a simple CLI tool needs proper structure.
At first, web scraping looks like it can be done in just a few lines of Python using BeautifulSoup.
But a real tool needs more than that.
It needs:
- Input validation
- Error handling
- HTML parsing
- URL normalization
- Rule-based checks
- Scoring logic
- Clean terminal output
- Package configuration
- CLI command registration
That is why a proper project structure matters.
A 5-line script can scrape a title.
But a package should be reliable, reusable, and understandable.
Why This Project Matters
"seoextract" is not trying to replace advanced SEO platforms.
Instead, it is useful for:
- Beginners learning SEO
- Developers checking their websites
- Students building portfolio projects
- Freelancers auditing small websites
- Python learners practicing real-world CLI tools
It is a practical project because it combines programming with a real business use case.
SEO is not just a technical topic. It connects directly to traffic, visibility, leads, and marketing.
That makes this project more useful than a basic toy script.
Future Improvements
The next versions can include:
- LLM-based SEO suggestions
- Better scoring rules
- Export to JSON, CSV, or PDF
- More detailed page reports
- Broken link detection
- Keyword density analysis
- Image alt text checking
- Sitemap detection
- Robots.txt analysis
- Better multi-page crawling
- AI-generated recommendations
One important improvement I am planning is to add optional LLM support so the tool can generate more detailed recommendations based on the page content.
For example, instead of only saying:
Missing meta description
It could suggest:
Suggested meta description:
Learn Python programming with tutorials, documentation, downloads, and community resources from Python.org.
That would make the tool much more useful for real users.
Final Thoughts
Building "seoextract" helped me understand how real developer tools are structured.
It is not just about writing scraping code.
It is about turning code into a usable product:
- A CLI command
- A package
- A report system
- A scoring engine
- A tool that someone else can install and use
This project started as a simple SEO checker, but it became a strong learning experience in Python packaging, CLI development, and practical automation.
If you are learning Python, I highly recommend building small CLI tools like this.
They force you to think beyond code and start thinking like a product builder.