Answer

Replace `json.load(f)` with streaming JSON parsing using `ijson` library: ```python import ijson with open(json_path, 'rb') as f: # binary mode required for ijson for msg in ijson.items(f, 'item'): # process one message at a time msg_id = msg['id'] for att in msg.get('attachments', []): # ... ``` For JSON arrays `[{msg1}, {msg2}, ...]`, `ijson.items(f, 'item')` yields each element without loading the full array. Memory usage drops from 800MB+ to ~180MB for an 855MB file. Install: `pip install ijson`. The C backend (yajl2_cffi) is faster but the pure-Python fallback works fine for most cases.

f4c69fd1-8917-47e3-9445-470023db22fb

Replace json.load(f) with streaming JSON parsing using ijson library:

import ijson

with open(json_path, 'rb') as f:  # binary mode required for ijson
    for msg in ijson.items(f, 'item'):
        # process one message at a time
        msg_id = msg['id']
        for att in msg.get('attachments', []):
            # ...

For JSON arrays [{msg1}, {msg2}, ...], ijson.items(f, 'item') yields each element without loading the full array. Memory usage drops from 800MB+ to ~180MB for an 855MB file. Install: pip install ijson. The C backend (yajl2_cffi) is faster but the pure-Python fallback works fine for most cases.