Conception method |
The WebContent platform is built around a Semantically-Driven Service-Oriented Architecture that:
- provides an open and extensible model for data sharing and an (extensible) set of service definitions based on that model;
- provides application building guidelines and open-source
development toolkits in Java and C++;
- is designed with scalability in mind; and
- is fully compliant with the main Web standards.
Each application is built around a set of services provided as components supplied by providers. The platform presents a service integration infrastructure: each service is thoroughly defined and can exchange data with all other
services in a format defined and adapted to the specific type of targeted applications. WebContent applications are typically designed and implemented as Web portal applications, using portlets to build rich and configurable front-end graphical user interfaces.
Each service is independent from the others. It is defined by a contract specifying its interface and conditions of use; it is normalized. The intention is to make this process open to a broader community. Similar components, providing the same kind of services, possibly based on different technologies, can be implemented by different suppliers. We consider two broad classes of services: core services are a common base for applications development; application services are used to realize functions related to actual information extraction and processing.
To accomplish a particular business objective through a complex process, several atomic services may have to be combined. For that, we use a WS-BPEL (Business Process Execution Language) engine. BPEL enables the description of a complex business process using standard operations such as service invocations, conditional blocks, loops, variable affectations, etc. In particular, service calls may be "pipelined" with a service response directly used as
input to the next service. High flexibility is achieved with
multi-level service composition and the capability to use runtime endpoint selection. A runnable process may be specified using a graphical BPEL editor. Adding, updating or removing a service implementation does not have any functional impact on the whole business process. The platform only needs to update its composite process using BPEL capability to dynamically assign the endpoint reference before invocation.
The WebContent platform uses Web service and Semantic Web technologies to provide a set of application services to collect, process and exploit structured and unstructured content; a set of core services for the management of available resources and in particular storage and querying; a middleware and a connector model enabling services to
communicate and insuring their technical interoperability; a
reference data model to normalize exchanges between services and insure their semantic interoperability; and a suite of tools to realize, integrate, deploy, orchestrate and test the services.
|
|
|
|
The Orchestration Architecture |
The orchestration architecture is built around an ESB. In this
context, the warehouse's Web services are connected to the bus, and the bus manages their interaction (or composition) according to a high-level imperative specification expressed in WS-BPEL. The bus is responsible for routing messages to and
from the Web services, and for ensuring orchestration, i.e.,
that interactions between services take place in the desired
order. The bus-centered architecture is well suited to applications in which the Web services to be used are precisely identified, and interconnected via the bus, prior to the actual activation of the application.
To facilitate service interoperability with controlled quality of service, the platform includes an ESB that allows the integration of heterogeneous components as service providers. We use Java Business Integration (JBI), a commonly used specification to implement ESBs. It makes it possible to compose heterogeneous components using normalized deployment, requesting, and messaging paradigms. JBI defines two classes of internal components for an ESB:
- Service Engines that implement core services such as XSLT
transformations, orchestration, scheduling, load balancing, etc.
- Binding Components (BC) that are used to interoperate with
external components using a specific protocol. For example, a SOAP BC, a mail BC, an XMPP BC, a CORBA BC, etc.
The ESB can thus be extended to support any messaging protocol and
implement a gateway to other platforms or middleware
technologies. Currently, the SOAP binding component is
commonly used in WebContent applications since integrated components are Web Services compatible.
Each supported service is exposed on the bus through an endpoint that a consumer can use to reach it without knowing where it is hosted.
The current version of the platform uses the open-source ESB
PetALS. But because of the compliance to JBI, other JBI buses may be prefered.
|
|
|
|
|
|
The WebContent data exchange model |
To guarantee the full interoperability of services, it does not suffice to normalize protocols and service interfaces. It is also necessary to define the structure and semantics of data shared or exchanged by services. The WebContent data model must be able to describe all kinds of source documents that go through the textual information extraction process and be extensible to other kinds of media. To reach this goal, all nineteen partners of the initial project have
contributed to the elaboration of the model by confronting its
evolutions to the specific needs of the provided services and to those of the built applications. The model has since been extended through other R&D projects, and other academic and industrial organizations are now using it (eWokHub, Vitalas, VIGIEs).
The model defines an XML format. WebContent resources can thus be stored into, and queried via, an XML repository. Their structures closely matches that of the associated source documents. Annotations are expressed in RDF and serialized in RDF/XML for easy handling and association of semantic data with each unit of the
document.
The figure shows a UML class diagram displaying the
elements of the model. Each class inherits from the Resource
class and is thus identified by a URI. RDF annotations use this URI to refer to specific units in the documents. Documents are composed of MediaUnits. Recursive structures can be captured using ComposedMediaUnits. Other elements include Text for leaf textual segments and elements to describe tables (with a terminology inspired by XHTML). Multimedia documents may also be represented using different types of BinaryMediaUnits A mechanism allowing the asynchronous transfer of binary data between
services is supported. The different kinds of Segments are used to annotate parts of media units.
The combination of data and annotations in the data model allows for a better decoupling between services, each service receiving all the data it needs to do its processing in a single call. It also closely follows the way data are processed in the applications. This model does not impose any restriction on the way the data is effectively stored and processed by each service. In fact, some WebContent
applications use an XML/RDF store and SPARQL to query
the data, while others use the annotations as pure metadata attached to documents in a full-text index. For binary or large data, annotations can be received separately.
An XML implementation of the model, examples, documentation and software utilities is available to the community in a free development kit.
|
|
|
|
|
|
Services features |
Each service is
independent (provided as units)
defined by a contract (interface + use conditions)
normalized (specification of the interface)
possibly multiply implemented
Knowing the above features, there is still two kind of services:
« Business »services allow to realize the functions identified in the functional map.
Technical services are a common base for applications development.
|
|
|
|
The WebContent "Business" services |
The figure lists all the business services that will be provided by the WebContent platform. The reference section of this site will contain an identity card for each of them with its precise API, its implementors, its legal status, etc.
 |
|
|
|
|