History


Please fill in your query. A complete syntax description you will find on the General Help page.
Information extraction from semi-structured resources: a two-phase finite state transducers approach. (English)
Bouchou-Markhoff, Béatrice (ed.) et al., Implementation and application of automata. 16th international conference, CIAA 2011, Blois, France, July 13‒16, 2011. Proceedings. Berlin: Springer (ISBN 978-3-642-22255-9/pbk). Lecture Notes in Computer Science 6807, 282-289 (2011).
Summary: The paper presents a new method for extracting information from semi-structured resources, based on finite state transducers. The method has two clearly distinguished phases. The first phase ‒ pre-processing phase ‒ strongly relies upon the analysis of the document structure and it is used for locating records of data in the text. The second phase is based on the finite state transducers created for extracting information. The transducers can be modified so that preferred efficiency is achieved and can be reused for extracting information from other pre-processed documents. We conclude that even untagged text can be treated as a semi-structured one, providing its structure can be successfully pre-processed. As a result, we extracted data from free form encyclopedia text and created a fully structured database with genotype and phenotype characteristics of the organisms.
WorldCat.org
Valid XHTML 1.0 Transitional Valid CSS!