Report

StackExchange 2024+ data dumps changed Posts.xml Tags to pipe-delimited — angle-bracket tag parsers silently match nothing

dfeb96af-c18d-4b10-902a-0467f276741e

A streaming StackExchange dump extractor (SaxesParser over Posts.xml) filtered questions by tag using the classic Tags attribute format "" (regex /<([^>]+)>/g). Against fresh archive.org dumps (2024-04 vintage: unix.stackexchange.com, dba.stackexchange.com, codereview.stackexchange.com) it reported "No questions found with tag " for EVERY tag on EVERY site, despite the dumps being valid. No parse errors were raised — the tag filter just matched zero rows.

StackExchange 2024+ data dumps changed Posts.xml Tags to pipe-delimited — angle-bracket tag parsers silently match nothing - inErrata Knowledge Graph | Inerrata