How to Scrape Goodreads Data in 2026: Complete Guide

Why Scrape Goodreads?

Goodreads is the world's largest book community — 150+ million members, 3.5 billion books catalogued, and millions of reviews. Whether you're building a book recommendation engine, analyzing reading trends, tracking author performance, or researching the publishing market, Goodreads data is invaluable.

But Goodreads has no public API (they shut it down in December 2020). That means scraping is the only way to access structured book data at scale.

In this guide, I'll show you how to scrape Goodreads books, reviews, and author data using Python — including a ready-to-use solution that handles anti-bot detection, pagination, and data formatting.

What Data Can You Extract from Goodreads?

Here's what's available:

Book details: title, author, ISBN/ISBN-13, publisher, publication date, page count, edition, format
Ratings & reviews: average rating, total ratings, total reviews, star distribution, individual review text
Author info: name, bio, follower count, book count
Genres & shelves: genre tags, popular shelves, reading lists
Search results: keyword search, genre browsing, bestseller lists

Method 1: DIY with Python + BeautifulSoup

You can scrape Goodreads yourself with requests and BeautifulSoup:

import requests
from bs4 import BeautifulSoup
import time
import json

def scrape_goodreads_book(url):
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
    }
    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.text, "html.parser")

    title = soup.select_one("h1.Text__title1").text.strip()
    author = soup.select_one("span.ContributorLink__name").text.strip()
    rating = soup.select_one("div.RatingStatistics__rating").text.strip()

    return {
        "title": title,
        "author": author,
        "rating": float(rating),
        "url": url
    }

# Example usage
book = scrape_goodreads_book("https://www.goodreads.com/book/show/5907.The_Hobbit")
print(json.dumps(book, indent=2))

The problem: Goodreads uses heavy JavaScript rendering, dynamic class names, and aggressive rate limiting. Your DIY scraper will break within weeks as selectors change, and you'll get blocked after a few hundred requests.

Method 2: Using the Apify Goodreads Scraper (Recommended)

A more reliable approach is using a managed scraper that handles anti-bot detection, retries, and proxy rotation automatically.

The Goodreads Scraper on Apify extracts structured book data with zero configuration:

from apify_client import ApifyClient

# Initialize the Apify client
client = ApifyClient("YOUR_APIFY_TOKEN")

# Configure the scraper
run_input = {
    "searchTerms": ["science fiction 2026"],
    "maxResults": 50,
    "includeReviews": True
}

# Run the actor
run = client.actor("cryptosignals/goodreads-scraper").call(run_input=run_input)

# Fetch results
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(f"{item['title']} by {item['author']} — {item['rating']}/5 ({item['ratingsCount']} ratings)")

Install the client first:

pip install apify-client

Sample Output

Here's what the structured output looks like:

{"title":"Project Hail Mary","author":"Andy Weir","rating":4.52,"ratingsCount":1245678,"reviewsCount":89432,"isbn":"0593135202","isbn13":"9780593135204","pages":496,"publisher":"Ballantine Books","publishDate":"2021-05-04","genres":["Science Fiction","Fiction","Audiobook","Space"],"description":"Ryland Grace is the sole survivor on a desperate...","url":"https://www.goodreads.com/book/show/54493401"}

Clean, structured JSON with every field you need — no parsing HTML, no broken selectors, no proxy management.

Use Cases for Goodreads Data

1. Book Recommendation Engines

Scrape ratings, genres, and review sentiment to build collaborative filtering models. Combine with user shelf data to find "readers who liked X also liked Y" patterns.

2. Publishing Market Research

Track which genres are trending, which debut authors are gaining traction, and what publication formats (hardcover vs. ebook vs. audio) are growing. Invaluable for publishers and literary agents.

3. Author Analytics

Monitor an author's rating trajectory over time, track review sentiment, compare performance across titles. Useful for marketing teams and self-published authors.

4. Academic Research

Study reading trends, cultural preferences across regions, or the impact of book-to-film adaptations on ratings. Goodreads data has been used in hundreds of published papers.

5. Competitive Intelligence for Booksellers

Track competitor titles' performance, identify underserved niches, and optimize inventory based on real reader demand rather than publisher push.

Cost Comparison: Goodreads Data Sources

Method	Cost	Reliability	Speed
DIY scraper	Free (your time)	Low — breaks often	Slow — rate limited
Goodreads API	Dead (shut down 2020)	N/A	N/A
Apify Goodreads Scraper	$0.01/result, first 100 free	High — maintained	Fast — parallel
Data brokers	$200-500/dataset	Medium	One-time dump
Manual collection	Free	High	Extremely slow

At $0.01 per result, scraping 1,000 books costs $9. That's less than a single Goodreads premium membership used to cost.

Advanced: Scraping Goodreads Reviews at Scale

Reviews are the most valuable Goodreads data for NLP and sentiment analysis. Here's how to extract them:

from apify_client import ApifyClient
import pandas as pd

client = ApifyClient("YOUR_APIFY_TOKEN")

# Scrape reviews for a specific book
run_input = {
    "bookUrls": ["https://www.goodreads.com/book/show/5907.The_Hobbit"],
    "includeReviews": True,
    "maxReviews": 500
}

run = client.actor("cryptosignals/goodreads-scraper").call(run_input=run_input)

# Load into pandas for analysis
results = list(client.dataset(run["defaultDatasetId"]).iterate_items())
df = pd.DataFrame(results)

# Basic sentiment breakdown
print(f"Average rating: {df['rating'].mean():.2f}")
print(f"5-star reviews: {len(df[df['rating']==5])}")
print(f"1-star reviews: {len(df[df['rating']==1])}")

Tips for Scraping Goodreads Effectively

Start with search terms, not URLs. The scraper can find books by keyword, which is faster than collecting individual book URLs.
Use the free tier to test. Every run includes 100 free results — enough to validate your data pipeline before committing.
Export to CSV for spreadsheets. Apify lets you download results as CSV, JSON, or Excel directly from the dashboard.
Schedule recurring scrapes. Set up daily or weekly runs to track how ratings and review counts change over time.
Respect the platform. Don't scrape faster than necessary. The managed scraper handles rate limiting automatically.

Getting Started

Create a free Apify account
Go to the Goodreads Scraper
Enter your search terms or book URLs
Click Start and get structured data in minutes

No credit card needed for the free tier. First 100 results per run are always free.

Built by CryptoSignals on Apify. Have questions or feature requests? Open an issue on the actor page.