Disk cache + update mode
Befriend the rate limit dragon: reuse local JSON, only fetch when you mean it.
Run after each small change. Tiny loops win.
uv run python -m src.scout src/scout/(your fetch_json helper)data/cache/
60–90 minutes
Do this (suggested order)
- Create
data/cache/. - Design a deterministic cache key from
url+params(sorted). - Write
fetch_json(url, params, *, update=False): reuse cached files by default, fetch on update. - Save metadata with each response (url, params, fetched_at, status_code).
- Add tiny logs so you can see behavior:
CACHE HIT/CACHE MISS/NETWORK.
You’ll practice
- Deterministic cache keys
- Store metadata (fetched_at, params, status)
- Design an “update mode” switch
Explainers (for context, not homework)
- Caching sanity — Raw vs derived, update mode mindset
- Terminal + folders — If your cache writes to weird places
Build
- fetch_json(url, params) checks data/cache first
- Cache files store url, params, fetched_at, status_code, json/text
- OFF → never network if file exists
- ON → fetch fresh and merge local dataset
Check yourself
- Run twice: first run shows CACHE MISS, second is all CACHE HIT and makes ~0 network calls.
If it breaks
- Filename collisions
- Caching error responses forever
- Forgetting query params in the cache key
Hints (spoilers)
Hint: safe filenames (hash it and move on with your life)
Cache key should include URL + sorted params. If you hash, life is easier. If you don’t hash, be very polite to filenames.
A simple key idea
Not required, but hashing prevents “filename too long” sadness.
import hashlib, json
payload = {'url': url, 'params': params}
raw = json.dumps(payload, sort_keys=True, separators=(',', ':')).encode('utf-8')
cache_key = hashlib.sha1(raw).hexdigest() Bigger hint: params must be part of the key
The same endpoint with different params is a different request. If params aren’t in the key, you will cache the wrong thing and your code will “randomly” lie later.
Bigger hint: error caching (save receipts, but don’t treat them as a win)
Save error responses (status + body text) for debugging, but don’t treat them as “good cached data” forever.
Decide what counts as a real hit (usually: status_code == 200).
A tiny rule you can log
if cached.status_code != 200: treat as MISS (and maybe refetch in update mode) Unblock-me: cache lives on disk, so paths matter
If your program writes cache files into a weird place, you’re probably running from the wrong directory. Run
from the project root (the folder that contains data/).
Cache folder (expected)
data/
cache/
<cache_key>.json Cache file shape (suggested)
You can change names, but keep the idea: metadata + payload.
{
"url": "...",
"params": {"...": "..."},
"fetched_at": "2025-12-15T00:00:00Z",
"status_code": 200,
"json": {"...": "..."}
} Tiny log lines that make debugging easy
CACHE HIT <cache_key>
CACHE MISS <cache_key>
NETWORK <url>