The build-vs-buy memo
Don't Scrape Zillow.
Use the API.
Building your own Zillow scraper means residential proxies, captcha solvers, brittle HTML selectors, and a ban-and-rebuild cycle. We've already done that work. Hit one REST endpoint instead.
●50 free calls / month · no card required
The cost of building it yourself
DIY scraper vs $29/mo API
The all-in monthly cost of running your own Zillow scraper at moderate volume (~5,000 pulls/mo).
| Line item | Build it yourself | realestateinvestingapi.com |
|---|---|---|
| Residential proxy budget | $200–$800 / mo | — |
| Captcha solver service | $50–$300 / mo | — |
| Developer time (build + maintain) | 40–80h / mo @ $75/hr = $3,000–$6,000 | ~2h integration, one-time |
| Cloud + headless browser instances | $80–$250 / mo | — |
| Ban risk | Days-of-downtime per quarter | Our problem, not yours |
| Schema breakage | ~Every 6 weeks, full rewrite | Versioned OpenAPI 3.1 spec |
| Monthly total | $3,330 – $7,350+ | $29 |
What you'd have to build
The 5-step Zillow scraper from scratch
Each step is its own project. You'd own all five forever.
- 01
Build a proxy rotator
Buy residential IPs from Bright Data / Oxylabs. Rotate per request, blacklist banned IPs, retry on 403s.
- 02
Bypass press-and-hold captcha
Zillow uses PerimeterX. Plug a captcha-solver API (~$2/1000 solves) and detect challenges before they tank your pipeline.
- 03
Drive a headless browser
Playwright with stealth plugins. Random UA + viewport + mouse-jitter. Block analytics network calls to look human.
- 04
Maintain HTML selectors
data-testid attributes change ~every 6 weeks. Build a monitoring layer that detects schema drift before it hits prod.
- 05
Scale it horizontally
1k pulls/day is one box. 100k/day is a Kubernetes cluster with proxy pools, queue, dead-letter, alerting, oncall rotation.
Same job, two stacks
200 lines of brittle Python — or one cURL call.
Build it yourself
import asyncio
from playwright.async_api import async_playwright
from bs4 import BeautifulSoup
import random, time
PROXIES = [
"http://user:pass@45.12.55.10:10001",
"http://user:pass@45.12.55.11:10001",
# … hundreds more residential IPs
]
UA_POOL = [
"Mozilla/5.0 (Windows NT 10.0; Win64) AppleWebKit/537.36 …",
# … rotate per request
]
async def scrape(zpid):
proxy = random.choice(PROXIES)
async with async_playwright() as p:
browser = await p.chromium.launch(
proxy={"server": proxy},
args=["--disable-blink-features=AutomationControlled"],
)
ctx = await browser.new_context(user_agent=random.choice(UA_POOL))
page = await ctx.new_page()
await page.goto(f"https://www.zillow.com/homedetails/{zpid}_zpid/")
# check for press-and-hold captcha
if await page.query_selector("#px-captcha"):
await solve_captcha(page) # ← external service, $$/solve
await page.reload()
await page.wait_for_selector("[data-testid='price']", timeout=15000)
html = await page.content()
await browser.close()
soup = BeautifulSoup(html, "html.parser")
# selectors break ~every 6 weeks
price = soup.select_one("[data-testid='price']").text
beds = soup.select_one("[data-testid='bed-bath-item']").text
# … 40 more selectors
return {"price": price, "beds": beds, ...}
# this works for ~3 weeks before bans escalate, then rebuildUse the API
curl -X POST https://api.realestateinvestingapi.com/v1/zillow \
-H "Authorization: Bearer reia_live_••••••••" \
-d '{"action":"propertyDetails","params":{"zpid":"29453621"}}'On the legal question
A short, plain-factual note
We operate as a search aggregator. We respect robots.txt, we don't bypass authentication, and we rate-limit our upstream calls — the same posture a search engine takes when indexing public pages.
Public-data scraping has been litigated in US courts, most notably hiQ Labs v. LinkedIn. The 9th Circuit held that scraping public web data is not a CFAA violation. That doesn't override the platform's Terms of Use, which create contractual (not criminal) obligations.
We're not your lawyer. We're not anybody's lawyer. You should consult your own counsel about your specific use case — particularly if you're republishing data, building a directly competitive product, or operating in a regulated industry.
Pricing
$29/mo replaces a $7k/mo scraper stack
Free
Kick the tires. No card required.
$0/mo50 calls included · hard cap
- 50 API calls / month
- All 30 endpoints
- Hard cap — no overages
- Community support
Starter
Solo wholesalers and side projects.
$29/mo1,000 calls included · then $0.010/call
- 1,000 API calls / month
- All 30 endpoints
- $0.01 per call after
- Email support
- Most popular
Growth
Internal tools, dashboards, lead engines.
$99/mo10,000 calls included · then $0.005/call
- 10,000 API calls / month
- All 30 endpoints
- $0.005 per call after
- Priority email support
- Webhook delivery
Scale
Funded prop-tech and high-volume teams.
$299/mo50,000 calls included · then $0.003/call
- 50,000 API calls / month
- All 30 endpoints
- $0.003 per call after
- 99.9% uptime SLA
- Slack-shared support channel
All plans · 99.9% uptime SLA · OpenAPI 3.1 spec · scrape.do failover · US-based servers
FAQ
Build-vs-buy questions
priceHistory returns the full list/sale timeline. zestimateHistory returns 60 months of Zestimate values. taxHistory covers annual assessment + tax bill records.