Funny how terms just seem to emerge out of no-where isn’t it? I came across the the term Semantic Integration the other day, via this posting from Dare Obasanjo: Semantic Integration and XML. Since then I keep tripping over it.
Basically semantic integration seems to involve using RDF/OWL to define mappings between XML vocabularies.
Recently I’ve been involved in co-ordinating a number of large scale data migrations to help integrate several systems. This inevitably involved exploring the business models in the affected applications to define a mapping layer to allow the data to be cleanly migrated. Also inevitably this mapping ends up being expressed as a bunch of (hairy!) procedural code. It would have been nice to have been able to express that mapping in a more declarative way, if no other reason than it’s easier to understand and debug possible problems. An example of not being able to see the wood for the trees.
I found that details such as syntactic variations in field formats (e.g. different date formats) were actually the most trivial aspect. And with appropriate data typing definitions in a semantic mapping layer, the need for these conversions could also be captured.
Approaching integration in this way involves stepping back from the surface syntax and looking at the business model it’s capturing. It doesn’t really matter whether the “syntax” is angle-brackets or a bunch of database DDL. XSLT isn’t the right way to capture these relationships it’s just a transform, a means to apply the mapping. And XSLT isn’t necessarily the best approach anyway, transforms can get big and ugly really quickly.
Perhaps a better comparison would be semantic integration and architectural forms. The idea behind architectural forms is that one can define an architectural schema (i.e. the business entities) and mappings to this schema from other syntaxes. In the different flavours of RSS there are many elements that mean the same thing but have different names, etc. Using architectural forms you could define a “super-schema” that normalises this data into a suitable form for your application to build upon. The idea is that you don’t write your application to use a specific syntax, you write it against the architectural form. I’ve got more notes on this in my wiki.
It’s possible to implement architectural forms in many ways including simple SAX filters or XSLT. AF proponents will be quick to point out though that architectural forms is not a general purpose transformation tool, it’s more than that.
I need to do more reading on semantic integration (pointers to examples would be interesting) to be able to compare it’s approach with architectural forms more closely. However my gut feeling is that AFs are still “too close” to the syntax which limits what you can express. For more complex mappings you need more control than simply renaming elements and attributes, hiding sub-trees, etc. And in this area RDF/OWL provides a great deal more flexibility, and the technique is not limited to XML.
I actually helped architect a commercial semantic integration toolkit. We debated, but did not use, RDF/OWL for a variety of reasons — EXPRESS and STEP were more applicable at the time and more widely understood by customers. RDF/OWL and potentially DAML may be more appropriate today, though I wonder about performance and ease of adoption by end users.
There are a few companies that attempt to tackle semantic integration/mediation, yet few businesses realize they have a need for a semantic integration toolkit yet. It’s a poorly understood area, and one that requires changing the way one thinks about integration problems.
I’m confident that the level of abstraction which people view information is constantly increasing.
Take a look at this product comparison for some actual products in this space:
Click to access TQ0303_Semantic%20Integration.PDF