Mack's LoL Scout
Step 04

Download match dataset

Fetch recent match IDs, fetch each match JSON, and make reruns resume-friendly.

Run

Run after each small change. Tiny loops win.

uv run python -m src.scout
You will touch
  • src/scout/ (match-v5 fetch)
  • data/raw/matches/
Time

60–120 minutes

Do this (suggested order)

  1. Load your puuid (from Step 02’s saved JSON).
  2. Fetch recent match IDs (match-v5) and save data/raw/match_ids_<puuid>.json.
  3. Loop through IDs and download match details into data/raw/matches/<matchId>.json.
  4. Make the run resumable: if the file exists, skip it.
  5. Print counts: ids fetched, files found, new downloaded.

You’ll practice

  • Iterate through IDs safely
  • Resume behavior: skip files you already have
  • Inspect nested JSON without getting lost

Explainers (for context, not homework)

Build

Match IDs
  • Fetch IDs for PUUID
  • Save to data/raw/match_ids_<puuid>.json
Match details
  • For each ID, fetch match JSON
  • Save to data/raw/matches/<matchId>.json
  • Add a limit (e.g., most recent 20 or 50)
Resumable runs
  • If match file exists, skip download

Check yourself

  • Print number of match IDs fetched
  • Print how many match files exist
  • Print how many new matches downloaded today

If it breaks

  • Using platform routing instead of regional for match-v5
  • Getting [] because of wrong routing/filters
  • Crashing on one bad match instead of skipping

Hints (spoilers)

Hint: peek safely (JSON inspection ladder)

Inspect JSON like stairs, not a dive: print top-level keys → print info keys → print participant count. Stop there and decide your next question.

The ladder

print(match.keys())
print(match['info'].keys())
print('participants:', len(match['info']['participants']))
Bigger hint: find yourself (don’t guess the participant)

Don’t guess which participant is “you”. Match JSON contains 10 participants; you want the one whose puuid matches yours.

A tiny find-the-index move

parts = match['info']['participants']
idx = next(i for i,p in enumerate(parts) if p.get('puuid') == my_puuid)
me = parts[idx]
Unblock-me: empty match ID list (print the host + params)

If you get [], don’t spiral. Print the base URL (host included) and the params. Most of the time: wrong routing value or filters.

Two prints that answer 80% of questions

print('HOST:', base_url)
print('PARAMS:', params)
Unblock-me: rate limits (429 = you’re too fast, not too dumb)

If you hit 429, you’re not failing—you’re speedrunning. Fetch fewer matches, add a small sleep + retry, and rely on your cache.

The calming move

limit to 20–50 matches
sleep 0.5–1.0s between requests
cache everything

Expected raw files

data/
  raw/
    match_ids_<puuid>.json
    matches/
      <matchId>.json

The JSON inspection ladder

Ask a tiny question, print a tiny answer, repeat.

print(match.keys())
print(match['info'].keys())
print('participants:', len(match['info']['participants']))

Resume-friendly behavior (what you want to see)

SKIP (exists) <matchId>
DOWNLOAD     <matchId>
DONE: new=12, skipped=38