Published by Pablo Hermoso de Mendoza González
02/05/2009
Berners-Lee's vision of the Semantic Web is hindered by a chicken-and-egg problem, which can be best solved by a bootstrapping method: creating enough structured data to motivate the development of applications. We believe that autonomously `Semantifying Wikipedia' is the best way to bootstrap. We choose Wikipedia as an initial data source, because it is comprehensive, high-quality, modestly sized, and contains enough manually-derived structure to bootstrap an autonomous, self-supervised process. In this talk I will present our success to date in this endeavor: A novel approach for self-supervised learning of CRF information extractors Automatic construction of a comprehensive ontology via statistical-relational learning Vast improvements in extraction recall through shrinkage over this on...