Tutorials

How to Scrape Amazon Reviews: Complete Guide (2026)

Learn how to extract Amazon customer reviews at scale. Covers the data available, methods, Python code, legal considerations, and how to use review data for business intelligence.

Amazon Scraping Team5 min read

Amazon reviews are a goldmine of customer intelligence. For any product, thousands of customers have documented exactly what they love, hate, and wish was different. Businesses use scraped review data for sentiment analysis, product development, competitive intelligence, and marketing.

This guide covers how Amazon review scraping works, what data you can extract, and how to use it effectively.

What Data Can You Extract from Amazon Reviews?

Amazon review pages contain rich, structured data:

FieldDescription
Review titleShort headline written by reviewer
Review bodyFull review text
Star rating1–5 stars
Reviewer nameDisplay name (public)
Review dateDate review was submitted
Verified purchaseWhether buyer actually purchased the product
Helpful votesNumber of users who found the review helpful
Total votesTotal votes on the review
Reviewer locationCountry of reviewer (sometimes shown)
Vine reviewWhether it's an Amazon Vine programme review
Review imagesCustomer-uploaded images with the review

Why Businesses Scrape Amazon Reviews

1. Competitive Product Intelligence

Read your competitors' 1-star and 2-star reviews. These are free customer research — they tell you exactly what the market wants that competitors aren't delivering.

2. Sentiment Analysis

With hundreds of reviews scraped, you can run NLP analysis to identify:

  • Most common complaints (negative sentiment clusters)
  • Most praised features (positive sentiment clusters)
  • Feature gaps mentioned repeatedly

3. Review Monitoring for Your Own Products

Get alerted to new negative reviews faster than checking manually. A sudden spike in 1-star reviews often signals a product defect or fulfilment issue.

4. Marketing Copy

The language customers use in positive reviews is your best marketing copy. It reflects how real buyers describe the benefits — use it in your own listing and ad copy.

5. Fake Review Detection

Analyse review patterns to spot review manipulation by competitors: sudden bursts of 5-star reviews, unverified purchases, similar language patterns.

How Amazon Review Scraping Works

Amazon review pages are structured with pagination. Each ASIN typically has:

  • A star rating summary page (ratings breakdown by 1–5 stars)
  • Paginated review pages (10 reviews per page)
  • Filter options (by star rating, verified only, with images, etc.)

A complete scraper needs to:

  1. Identify the total review count
  2. Calculate the number of pages
  3. Iterate through all pages with delays
  4. Parse each review's fields
  5. Handle anti-bot detection (the review endpoint is heavily protected)

Python Example — Basic Review Scraper

import requests
from bs4 import BeautifulSoup
import json
import time
import random

HEADERS = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
                  'AppleWebKit/537.36 (KHTML, like Gecko) '
                  'Chrome/124.0.0.0 Safari/537.36',
    'Accept-Language': 'en-US,en;q=0.9',
}

def scrape_reviews_page(asin: str, page: int = 1) -> list[dict]:
    """
    Scrape a single page of reviews for a given ASIN.
    Returns a list of review dicts.
    """
    url = (
        f'https://www.amazon.com/product-reviews/{asin}'
        f'?reviewerType=all_reviews&pageNumber={page}'
    )
    
    response = requests.get(url, headers=HEADERS, timeout=15)
    if response.status_code != 200:
        return []
    
    soup = BeautifulSoup(response.content, 'lxml')
    reviews = []
    
    for review_el in soup.select('[data-hook="review"]'):
        # Extract fields
        title_el    = review_el.select_one('[data-hook="review-title"]')
        body_el     = review_el.select_one('[data-hook="review-body"]')
        rating_el   = review_el.select_one('[data-hook="review-star-rating"]')
        date_el     = review_el.select_one('[data-hook="review-date"]')
        verified_el = review_el.select_one('[data-hook="avp-badge"]')
        helpful_el  = review_el.select_one('[data-hook="helpful-vote-statement"]')
        
        reviews.append({
            'asin':      asin,
            'title':     title_el.text.strip() if title_el else None,
            'body':      body_el.text.strip() if body_el else None,
            'rating':    rating_el.text.strip() if rating_el else None,
            'date':      date_el.text.strip() if date_el else None,
            'verified':  verified_el is not None,
            'helpful':   helpful_el.text.strip() if helpful_el else '0',
        })
    
    return reviews


def scrape_all_reviews(asin: str, max_pages: int = 10) -> list[dict]:
    all_reviews = []
    
    for page in range(1, max_pages + 1):
        print(f'Scraping page {page} for ASIN {asin}...')
        page_reviews = scrape_reviews_page(asin, page)
        
        if not page_reviews:
            print(f'No reviews on page {page}, stopping.')
            break
        
        all_reviews.extend(page_reviews)
        time.sleep(random.uniform(3, 7))  # Be respectful
    
    return all_reviews


# Usage
asin = 'B09G3HRMVB'
reviews = scrape_all_reviews(asin, max_pages=5)

with open(f'{asin}_reviews.json', 'w', encoding='utf-8') as f:
    json.dump(reviews, f, indent=2, ensure_ascii=False)

print(f'Scraped {len(reviews)} reviews for {asin}')

Running Sentiment Analysis on Reviews

Once you have reviews scraped, you can run basic sentiment analysis:

from collections import Counter
import re

def find_common_complaints(reviews: list[dict], top_n: int = 20) -> list:
    """Find most-mentioned words in 1-2 star reviews."""
    negative = [r for r in reviews if r['rating'] 
                and r['rating'].startswith(('1', '2'))]
    
    # Combine all negative review text
    all_text = ' '.join([r.get('body', '') for r in negative]).lower()
    
    # Remove stopwords (simplified)
    stopwords = {'the','a','an','is','it','in','and','or','to','this','that',
                 'was','for','of','with','my','i','but','not','very','so','be'}
    words = re.findall(r'\b[a-z]{4,}\b', all_text)
    meaningful = [w for w in words if w not in stopwords]
    
    return Counter(meaningful).most_common(top_n)

complaints = find_common_complaints(reviews)
print('Most common words in negative reviews:')
for word, count in complaints:
    print(f'  {word}: {count}')

Scale Considerations

VolumeRecommended Approach
< 5,000 reviewsDIY Python scraper
5,000 – 100,000 reviewsPython + proxy rotation
100,000 – 1M reviewsManaged scraping service
1M+ reviewsEnterprise managed service

Important Notes on Review Data

  1. Only scrape public reviews — reviews visible without logging in are fair game
  2. Don't store personally identifiable data beyond reviewer display names (which are public)
  3. If operating in EU, document your legitimate interest under GDPR for processing review data
  4. Amazon heavily protects the review endpoint — expect higher block rates than product pages

Our Amazon Review Scraping Service

For large-scale review extraction, our Amazon review scraper delivers:

  • All review fields (title, body, rating, date, verified, helpful votes)
  • Bulk extraction across thousands of ASINs
  • All star rating filters
  • Vine review identification
  • Review image URLs
  • Clean JSON or CSV delivery
  • All 12+ Amazon marketplaces

Get a free quote with a sample review dataset for your target ASINs.

Amazon Scraping TeamData Extraction Specialists · 10+ Years Experience

Our team of senior data engineers and web scraping specialists has delivered over 500 million records across 12+ Amazon marketplaces. We write about scraping techniques, eCommerce data strategy, and Amazon market intelligence based on real-world project experience.