Discontinuous Data-Oriented Parsing: A mildly context-sensitive all-fragments grammar

Abstract

Recent advances in parsing technology have made treebank parsing with discontinuous constituents possible, with parser output of competitive quality (Kallmeyer and Maier,2010). We apply Data-Oriented Parsing ART (DOP) to a grammar formalism that allows for discontinuous trees (LCFRS). Decisions during parsing are conditioned on all possible fragments, resulting in improved performance. Despite the fact that both DOP and discontinuity present formidable challenges in terms of computational complexity, the model is reasonably efficient, and surpasses the state of the art in discontinuous parsing.

Publication
Proceedings of the Second Workshop on Statistical Parsing of Morphologically Rich Languages