Two root causes working together: **1. "Omit if uncertain" rules are interpreted as permission to skip** Models (especially in batch mode where they're trying to be efficient) treat optional node types as low-priority. Abstract types like Pattern require genuine inference ("what class of problem is this?") rather than direct extraction — and ambiguity in the prompt gives the model an exit. It takes the exit. **2. Weaker models optimize for concrete, extractable nodes** Haiku-tier models are better at pulling literal facts (error messages, package names) than at abstracting to patterns. When batching 20+ pairs in one call, quality on abstract nodes degrades further as the model optimizes for throughput. **Fixes:** **a) Make critical nodes required, not optional.** Change the prompt from "include if present" to "always required — every problem must have a Pattern via INSTANCE_OF." For Pattern specifically: add the instruction "ask yourself 'what class of problem is this?' and create that Pattern node." **b) Provide a complete worked example.** Include one full example in the prompt showing the exact output you expect, including the abstract node and its edge. Models match format much more reliably than they follow abstract rules. The example should demonstrate the full chain: ```json {"type": "Problem", "description": "Drizzle ORM inArray() throws when passed an empty array"} {"type": "Pattern", "description": "ORM helper throws or produces invalid SQL when given an empty collection"} // edge: Problem INSTANCE_OF Pattern ``` **c) Use a stronger model for extraction.** The quality gap between Haiku and Sonnet on structured abstract reasoning is significant. The cost difference on a small graph is negligible; the data quality difference is not. After applying these, verify with a spot-check: count INSTANCE_OF edges in the graph. If it's near zero, the extraction is still skipping Pattern nodes.
a46430d8-3af4-4803-9e82-389f9b546d82
Two root causes working together:
1. "Omit if uncertain" rules are interpreted as permission to skip Models (especially in batch mode where they're trying to be efficient) treat optional node types as low-priority. Abstract types like Pattern require genuine inference ("what class of problem is this?") rather than direct extraction — and ambiguity in the prompt gives the model an exit. It takes the exit.
2. Weaker models optimize for concrete, extractable nodes Haiku-tier models are better at pulling literal facts (error messages, package names) than at abstracting to patterns. When batching 20+ pairs in one call, quality on abstract nodes degrades further as the model optimizes for throughput.
Fixes:
a) Make critical nodes required, not optional. Change the prompt from "include if present" to "always required — every problem must have a Pattern via INSTANCE_OF." For Pattern specifically: add the instruction "ask yourself 'what class of problem is this?' and create that Pattern node."
b) Provide a complete worked example. Include one full example in the prompt showing the exact output you expect, including the abstract node and its edge. Models match format much more reliably than they follow abstract rules. The example should demonstrate the full chain:
{"type": "Problem", "description": "Drizzle ORM inArray() throws when passed an empty array"}
{"type": "Pattern", "description": "ORM helper throws or produces invalid SQL when given an empty collection"}
// edge: Problem INSTANCE_OF Patternc) Use a stronger model for extraction. The quality gap between Haiku and Sonnet on structured abstract reasoning is significant. The cost difference on a small graph is negligible; the data quality difference is not.
After applying these, verify with a spot-check: count INSTANCE_OF edges in the graph. If it's near zero, the extraction is still skipping Pattern nodes.