Answer

Two separate problems compound here: the model skipping optional nodes, and the model confusing node types when it does emit them. Both need prompt-level fixes. ## 1. Make abstract node types non-optional with explicit obligation language Instead of "omit if not clearly present", use language that makes the type feel required: ``` - Pattern: the recurring abstract pattern this problem is an instance of (required — always abstract) Pattern is not optional: every Problem should link to a Pattern via INSTANCE_OF. Ask yourself "what class of problem is this?" and create that Pattern node. ``` The word "required" and the explicit reasoning prompt ("ask yourself...") dramatically improves consistency. Models treat "optional" as "skip when uncertain" — even when the signal is clear. ## 2. Add disambiguation rules for easily confused node types Component and Concept are the most commonly conflated pair. The model will emit things like `Component: "LLM context window (128k tokens on GPT-4o)"` when it should be `Component: "GPT-4o"` + `Concept: "large language model (LLM)"`. Fix with explicit definitions and negative examples: ``` - Component: a specific, named product — a proper noun / brand name with no qualifiers or specs. Good: "PostgreSQL", "Redis", "GPT-4o", "Claude". Bad: "PostgreSQL Query Engine", "LLM context window (128k tokens on GPT-4o)". When a Component has a specification detail (e.g. token limit, memory size), discard it — it is not a node. - Concept: an abstract technology category or capability class — the kind of thing a Component IS. Good: "ORM (Object-Relational Mapper)", "large language model (LLM)", "context window", "vector database". A Component or Package IMPLEMENTATION_OF a Concept. If something could be described as "what kind of thing is X?", that answer is the Concept. ``` ## 3. Add a canonicalization rule as a negative test ``` - Component descriptions must be canonical product names only — strip all qualifiers, role descriptions, specs, and context. If a description contains parenthetical specs or role words, it is wrong. ``` This gives the model a self-check it can apply before emitting a node. ## Why this works The model isn't "forgetting" — it's being uncertain about what counts as a valid instance of the type. Obligation language removes the uncertainty escape hatch. Negative examples provide concrete boundaries. The canonicalization rule as a negative test lets the model catch its own errors before emitting.

5cdacdc3-2ae1-4c70-93f2-f29822367114

Two separate problems compound here: the model skipping optional nodes, and the model confusing node types when it does emit them. Both need prompt-level fixes.

1. Make abstract node types non-optional with explicit obligation language

Instead of "omit if not clearly present", use language that makes the type feel required:

- Pattern: the recurring abstract pattern this problem is an instance of (required — always abstract)
  Pattern is not optional: every Problem should link to a Pattern via INSTANCE_OF.
  Ask yourself "what class of problem is this?" and create that Pattern node.

The word "required" and the explicit reasoning prompt ("ask yourself...") dramatically improves consistency. Models treat "optional" as "skip when uncertain" — even when the signal is clear.

2. Add disambiguation rules for easily confused node types

Component and Concept are the most commonly conflated pair. The model will emit things like Component: "LLM context window (128k tokens on GPT-4o)" when it should be Component: "GPT-4o" + Concept: "large language model (LLM)".

Fix with explicit definitions and negative examples:

- Component: a specific, named product — a proper noun / brand name with no qualifiers or specs.
  Good: "PostgreSQL", "Redis", "GPT-4o", "Claude".
  Bad: "PostgreSQL Query Engine", "LLM context window (128k tokens on GPT-4o)".
  When a Component has a specification detail (e.g. token limit, memory size), discard it — it is not a node.

- Concept: an abstract technology category or capability class — the kind of thing a Component IS.
  Good: "ORM (Object-Relational Mapper)", "large language model (LLM)", "context window", "vector database".
  A Component or Package IMPLEMENTATION_OF a Concept.
  If something could be described as "what kind of thing is X?", that answer is the Concept.

3. Add a canonicalization rule as a negative test

- Component descriptions must be canonical product names only — strip all qualifiers, role descriptions,
  specs, and context. If a description contains parenthetical specs or role words, it is wrong.

This gives the model a self-check it can apply before emitting a node.

Why this works

The model isn't "forgetting" — it's being uncertain about what counts as a valid instance of the type. Obligation language removes the uncertainty escape hatch. Negative examples provide concrete boundaries. The canonicalization rule as a negative test lets the model catch its own errors before emitting.