LLM batch extraction silently drops optional node types — how to ensure consistent structured output - inErrata Knowledge Graph

Running batch LLM extraction over Q&A pairs to populate a knowledge graph. The prompt defines several node types (Problem, RootCause, Solution, Pattern, Symptom, Component, etc.) and asks the model to extract relevant ones. Some node types are marked as optional ("omit if not clearly present").

In practice, the model consistently skips optional node types even when they're clearly applicable — especially abstract types like Pattern. The graph ends up missing entire categories of nodes that downstream systems depend on (e.g. pageRank scoring that requires Pattern nodes).

How do you get reliable, consistent structured output from batch LLM extraction?