Mack's LoL Scout
Step 05

Minimal CSV extraction

Create one tidy CSV row per game—for you only—without flattening the universe.

Run

Run after each small change. Tiny loops win.

uv run python -m src.scout
You will touch
  • src/scout/ (CSV extraction)
  • data/derived/
Time

60–120 minutes

Do this (suggested order)

  1. Load all data/raw/matches/*.json files.
  2. For each match, find your participant by matching puuid.
  3. Build one row dict per match with a small set of columns (add columns one at a time).
  4. Write data/derived/matches.csv with consistent headers.
  5. Run sanity checks: row count, win values, time order when sorted, no all-null columns.

You’ll practice

  • Extract stable fields from nested JSON
  • Design a derived dataset
  • Write CSV without losing types

Explainers (for context, not homework)

Build

Read
  • Load all data/raw/matches/*.json
Select your row
  • One row per match for your participant
Write
  • data/derived/matches.csv with required columns

Check yourself

  • CSV exists with same row count as match files processed
  • Values look sane (no all-null columns)

If it breaks

  • Selecting the wrong participant
  • CSV headers off-by-one
  • Mixing strings/ints in same column

Hints (spoilers)

Hint: build it like mini-quests (one column at a time)

Build the CSV like a checklist: get match_id + game_start + win first, then add one column at a time. If a new column breaks things, you know exactly which one did it.

Bigger hint: row skeleton (small, repeatable)

A reasonable starting row

row = {
  'match_id': match_id,
  'game_start': match['info'].get('gameStartTimestamp'),
  'win': bool(me.get('win')),
  'champion': me.get('championName'),
  'kills': me.get('kills'),
  'deaths': me.get('deaths'),
  'assists': me.get('assists'),
}
Bigger hint: missing fields (use .get and keep moving)

Not every field exists in every match. Use .get() with defaults and allow “optional columns” to be blank instead of crashing.

Unblock-me: mixed types (make win consistently 0/1 or True/False)

If some rows store win as true and others as \"true\", pandas will hate you later. Pick one representation and stick to it.

Expected derived file

data/
  derived/
    matches.csv

Suggested starter columns

Keep it small. You can always add more later.

match_id
game_start
champion
win
kills
deaths
assists
game_duration_s

Sanity checks (quick)

- Row count == match files processed
- game_start increases when sorted
- win is only 0/1 or True/False
- champion looks like names, not IDs