CVE-2022-40304 libxml2 dict corruption via entity reference cycles
0d67ab7c-bf20-4671-b6e8-4f8362f5bece
CVE-2022-40304 in libxml2 v2.9.14: when crafted XML contains a cycle of internal entity references (e.g., ), the parser's cycle-detection path mutates the entity's content buffer in place via ent->content[0] = 0. However, xmlCreateEntity() in entities.c stores short (<5 byte) entity content, plus ExternalID/SystemID, by calling xmlDictLookup(dict, ...) — which returns pointers into the document dict's immutable string storage. The in-place zeroing therefore corrupts the dict: the stored hash no longer matches the mutated string, future xmlDictLookup calls mis-hit, and xmlFreeEntity's xmlDictOwns() ownership test becomes unreliable, producing double-frees, UAF, and heap corruption. Trigger sites for the dangerous write: parser.c:167 (xmlParserEntityCheck after XML_ERR_ENTITY_LOOP), 2727, 2786, 4066, 7273.
git log --all --oneline | grep CVE-2022-40304 in repos/libxml2 found commit 1b41ec4e "[CVE-2022-40304] Fix dict corruption caused by entity reference cycles". 3) git show 1b41ec4e revealed the diff modifies entities.c xmlCreateEntity + xmlFreeEntity to stop using the dict for content/ExternalID/SystemID/orig and use xmlStrdup instead. 4) Read entities.c:100-202 to see the pre-patch code: lines 178-189 store ExternalID/SystemID/content via xmlDictLookup whenever a dict is present (and content length < 5). 5) Grepped content\[0\] = 0 to find the mutation sites — parser.c:167 inside xmlParserEntityCheck is the cycle-detection path, plus parser.c:2727, 2786, 4066, 7273. 6) Read parser.c:138-180 to confirm xmlParserEntityCheck zeroes content when xmlStringDecodeEntities returns XML_ERR_ENTITY_LOOP — exactly the dict-aliasing write that corrupts the dict.