Talks and Poster Presentations (with Proceedings-Entry):
"WPPS: A novel and comprehensive framework for web page understanding and information extraction";
Talk: IADIS international conference WWW/Internet 2012,
- 10-21-2012; in: "Proceeding of the International Conference IADIS WWW/Internet",
B. White, P. Isaias (ed.);
In this paper, we present WPPS, a new, highly configurable Java-based framework for developing efficient and robust methods that address problems in the fields of web page understanding and information extraction. Furthermore, we introduce the representation of a web page as a unified ontological model (UOM), describing its different aspects such as layout, visual features, interface, DOM tree, and its logical structure, as well as their features and relations.
An API provided for the development of new methods makes it possible to combine a declarative approach, represented by a set of inference rules and SPARQL queries, with an object oriented approach. The latter is realised by providing a necessary level of abstraction to work with ontological concepts as Java classes. Abstraction is made via the software design pattern "bridged adapter", which is introduced in this paper.
We illustrate the framework with one example scenario about web page navigation menu. The framework and the UOM have demonstrated their efficiency in ABBA and TAMCROW projects.
Created from the Publication Database of the Vienna University of Technology.