Download match dataset
Fetch recent match IDs, fetch each match JSON, and make reruns resume-friendly.
Run after each small change. Tiny loops win.
uv run python -m src.scout src/scout/(match-v5 fetch)data/raw/matches/
60–120 minutes
Do this (suggested order)
- Load your
puuid(from Step 02’s saved JSON). - Fetch recent match IDs (match-v5) and save
data/raw/match_ids_<puuid>.json. - Loop through IDs and download match details into
data/raw/matches/<matchId>.json. - Make the run resumable: if the file exists, skip it.
- Print counts: ids fetched, files found, new downloaded.
You’ll practice
- Iterate through IDs safely
- Resume behavior: skip files you already have
- Inspect nested JSON without getting lost
Explainers (for context, not homework)
- Routing values (platform vs regional) — Match-v5 wants regional hosts
- HTTP + JSON in 12 minutes — What to print when it fails
- Caching sanity — Why reruns should be calm
- Tracebacks (reading errors) — Fix the real line, not the vibes
Build
- Fetch IDs for PUUID
- Save to data/raw/match_ids_<puuid>.json
- For each ID, fetch match JSON
- Save to data/raw/matches/<matchId>.json
- Add a limit (e.g., most recent 20 or 50)
- If match file exists, skip download
Check yourself
- Print number of match IDs fetched
- Print how many match files exist
- Print how many new matches downloaded today
If it breaks
- Using platform routing instead of regional for match-v5
- Getting [] because of wrong routing/filters
- Crashing on one bad match instead of skipping
Hints (spoilers)
Hint: peek safely (JSON inspection ladder)
Inspect JSON like stairs, not a dive: print top-level keys → print info keys → print participant count. Stop there and decide your next question.
The ladder
print(match.keys())
print(match['info'].keys())
print('participants:', len(match['info']['participants'])) Bigger hint: find yourself (don’t guess the participant)
Don’t guess which participant is “you”. Match JSON contains 10 participants; you want the one whose
puuid matches yours.
A tiny find-the-index move
parts = match['info']['participants']
idx = next(i for i,p in enumerate(parts) if p.get('puuid') == my_puuid)
me = parts[idx] Unblock-me: empty match ID list (print the host + params)
If you get [], don’t spiral. Print the base URL (host included) and the params. Most of the time:
wrong routing value or filters.
Two prints that answer 80% of questions
print('HOST:', base_url)
print('PARAMS:', params) Unblock-me: rate limits (429 = you’re too fast, not too dumb)
If you hit 429, you’re not failing—you’re speedrunning. Fetch fewer matches, add a small sleep + retry, and rely on your cache.
The calming move
limit to 20–50 matches
sleep 0.5–1.0s between requests
cache everything Expected raw files
data/
raw/
match_ids_<puuid>.json
matches/
<matchId>.json The JSON inspection ladder
Ask a tiny question, print a tiny answer, repeat.
print(match.keys())
print(match['info'].keys())
print('participants:', len(match['info']['participants'])) Resume-friendly behavior (what you want to see)
SKIP (exists) <matchId>
DOWNLOAD <matchId>
DONE: new=12, skipped=38