Step 03

Disk cache + update mode

Befriend the rate limit dragon: reuse local JSON, only fetch when you mean it.

Run

Run after each small change. Tiny loops win.

uv run python -m src.scout

You will touch

src/scout/ (your fetch_json helper)
data/cache/

Time

60–90 minutes

Do this (suggested order)

Create data/cache/.
Design a deterministic cache key from url + params (sorted).
Write fetch_json(url, params, *, update=False): reuse cached files by default, fetch on update.
Save metadata with each response (url, params, fetched_at, status_code).
Add tiny logs so you can see behavior: CACHE HIT / CACHE MISS / NETWORK.

You’ll practice

Deterministic cache keys
Store metadata (fetched_at, params, status)
Design an “update mode” switch

Explainers (for context, not homework)

Caching sanity — Raw vs derived, update mode mindset
Terminal + folders — If your cache writes to weird places

Build

Helper

fetch_json(url, params) checks data/cache first
Cache files store url, params, fetched_at, status_code, json/text

Update mode

OFF → never network if file exists
ON → fetch fresh and merge local dataset

Check yourself

Run twice: first run shows CACHE MISS, second is all CACHE HIT and makes ~0 network calls.

If it breaks

Filename collisions
Caching error responses forever
Forgetting query params in the cache key

Hints (spoilers)

Hint: safe filenames (hash it and move on with your life)

Cache key should include URL + sorted params. If you hash, life is easier. If you don’t hash, be very polite to filenames.

A simple key idea

Not required, but hashing prevents “filename too long” sadness.

import hashlib, json

payload = {'url': url, 'params': params}
raw = json.dumps(payload, sort_keys=True, separators=(',', ':')).encode('utf-8')
cache_key = hashlib.sha1(raw).hexdigest()

Bigger hint: params must be part of the key

The same endpoint with different params is a different request. If params aren’t in the key, you will cache the wrong thing and your code will “randomly” lie later.

Bigger hint: error caching (save receipts, but don’t treat them as a win)

Save error responses (status + body text) for debugging, but don’t treat them as “good cached data” forever. Decide what counts as a real hit (usually: status_code == 200).

A tiny rule you can log

if cached.status_code != 200: treat as MISS (and maybe refetch in update mode)

Unblock-me: cache lives on disk, so paths matter

If your program writes cache files into a weird place, you’re probably running from the wrong directory. Run from the project root (the folder that contains data/).

Cache folder (expected)

data/
  cache/
    <cache_key>.json

Cache file shape (suggested)

You can change names, but keep the idea: metadata + payload.

{
  "url": "...",
  "params": {"...": "..."},
  "fetched_at": "2025-12-15T00:00:00Z",
  "status_code": 200,
  "json": {"...": "..."}
}

Tiny log lines that make debugging easy

CACHE HIT  <cache_key>
CACHE MISS <cache_key>
NETWORK    <url>