Amazon reviews are a goldmine of customer intelligence. For any product, thousands of customers have documented exactly what they love, hate, and wish was different. Businesses use scraped review data for sentiment analysis, product development, competitive intelligence, and marketing.
This guide covers how Amazon review scraping works, what data you can extract, and how to use it effectively.
What Data Can You Extract from Amazon Reviews?
Amazon review pages contain rich, structured data:
| Field | Description |
|---|---|
| Review title | Short headline written by reviewer |
| Review body | Full review text |
| Star rating | 1–5 stars |
| Reviewer name | Display name (public) |
| Review date | Date review was submitted |
| Verified purchase | Whether buyer actually purchased the product |
| Helpful votes | Number of users who found the review helpful |
| Total votes | Total votes on the review |
| Reviewer location | Country of reviewer (sometimes shown) |
| Vine review | Whether it's an Amazon Vine programme review |
| Review images | Customer-uploaded images with the review |
Why Businesses Scrape Amazon Reviews
1. Competitive Product Intelligence
Read your competitors' 1-star and 2-star reviews. These are free customer research — they tell you exactly what the market wants that competitors aren't delivering.
2. Sentiment Analysis
With hundreds of reviews scraped, you can run NLP analysis to identify:
- Most common complaints (negative sentiment clusters)
- Most praised features (positive sentiment clusters)
- Feature gaps mentioned repeatedly
3. Review Monitoring for Your Own Products
Get alerted to new negative reviews faster than checking manually. A sudden spike in 1-star reviews often signals a product defect or fulfilment issue.
4. Marketing Copy
The language customers use in positive reviews is your best marketing copy. It reflects how real buyers describe the benefits — use it in your own listing and ad copy.
5. Fake Review Detection
Analyse review patterns to spot review manipulation by competitors: sudden bursts of 5-star reviews, unverified purchases, similar language patterns.
How Amazon Review Scraping Works
Amazon review pages are structured with pagination. Each ASIN typically has:
- A star rating summary page (ratings breakdown by 1–5 stars)
- Paginated review pages (10 reviews per page)
- Filter options (by star rating, verified only, with images, etc.)
A complete scraper needs to:
- Identify the total review count
- Calculate the number of pages
- Iterate through all pages with delays
- Parse each review's fields
- Handle anti-bot detection (the review endpoint is heavily protected)
Python Example — Basic Review Scraper
import requests
from bs4 import BeautifulSoup
import json
import time
import random
HEADERS = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
'AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/124.0.0.0 Safari/537.36',
'Accept-Language': 'en-US,en;q=0.9',
}
def scrape_reviews_page(asin: str, page: int = 1) -> list[dict]:
"""
Scrape a single page of reviews for a given ASIN.
Returns a list of review dicts.
"""
url = (
f'https://www.amazon.com/product-reviews/{asin}'
f'?reviewerType=all_reviews&pageNumber={page}'
)
response = requests.get(url, headers=HEADERS, timeout=15)
if response.status_code != 200:
return []
soup = BeautifulSoup(response.content, 'lxml')
reviews = []
for review_el in soup.select('[data-hook="review"]'):
# Extract fields
title_el = review_el.select_one('[data-hook="review-title"]')
body_el = review_el.select_one('[data-hook="review-body"]')
rating_el = review_el.select_one('[data-hook="review-star-rating"]')
date_el = review_el.select_one('[data-hook="review-date"]')
verified_el = review_el.select_one('[data-hook="avp-badge"]')
helpful_el = review_el.select_one('[data-hook="helpful-vote-statement"]')
reviews.append({
'asin': asin,
'title': title_el.text.strip() if title_el else None,
'body': body_el.text.strip() if body_el else None,
'rating': rating_el.text.strip() if rating_el else None,
'date': date_el.text.strip() if date_el else None,
'verified': verified_el is not None,
'helpful': helpful_el.text.strip() if helpful_el else '0',
})
return reviews
def scrape_all_reviews(asin: str, max_pages: int = 10) -> list[dict]:
all_reviews = []
for page in range(1, max_pages + 1):
print(f'Scraping page {page} for ASIN {asin}...')
page_reviews = scrape_reviews_page(asin, page)
if not page_reviews:
print(f'No reviews on page {page}, stopping.')
break
all_reviews.extend(page_reviews)
time.sleep(random.uniform(3, 7)) # Be respectful
return all_reviews
# Usage
asin = 'B09G3HRMVB'
reviews = scrape_all_reviews(asin, max_pages=5)
with open(f'{asin}_reviews.json', 'w', encoding='utf-8') as f:
json.dump(reviews, f, indent=2, ensure_ascii=False)
print(f'Scraped {len(reviews)} reviews for {asin}')
Running Sentiment Analysis on Reviews
Once you have reviews scraped, you can run basic sentiment analysis:
from collections import Counter
import re
def find_common_complaints(reviews: list[dict], top_n: int = 20) -> list:
"""Find most-mentioned words in 1-2 star reviews."""
negative = [r for r in reviews if r['rating']
and r['rating'].startswith(('1', '2'))]
# Combine all negative review text
all_text = ' '.join([r.get('body', '') for r in negative]).lower()
# Remove stopwords (simplified)
stopwords = {'the','a','an','is','it','in','and','or','to','this','that',
'was','for','of','with','my','i','but','not','very','so','be'}
words = re.findall(r'\b[a-z]{4,}\b', all_text)
meaningful = [w for w in words if w not in stopwords]
return Counter(meaningful).most_common(top_n)
complaints = find_common_complaints(reviews)
print('Most common words in negative reviews:')
for word, count in complaints:
print(f' {word}: {count}')
Scale Considerations
| Volume | Recommended Approach |
|---|---|
| < 5,000 reviews | DIY Python scraper |
| 5,000 – 100,000 reviews | Python + proxy rotation |
| 100,000 – 1M reviews | Managed scraping service |
| 1M+ reviews | Enterprise managed service |
Important Notes on Review Data
- Only scrape public reviews — reviews visible without logging in are fair game
- Don't store personally identifiable data beyond reviewer display names (which are public)
- If operating in EU, document your legitimate interest under GDPR for processing review data
- Amazon heavily protects the review endpoint — expect higher block rates than product pages
Our Amazon Review Scraping Service
For large-scale review extraction, our Amazon review scraper delivers:
- All review fields (title, body, rating, date, verified, helpful votes)
- Bulk extraction across thousands of ASINs
- All star rating filters
- Vine review identification
- Review image URLs
- Clean JSON or CSV delivery
- All 12+ Amazon marketplaces
Get a free quote with a sample review dataset for your target ASINs.
Our team of senior data engineers and web scraping specialists has delivered over 500 million records across 12+ Amazon marketplaces. We write about scraping techniques, eCommerce data strategy, and Amazon market intelligence based on real-world project experience.