Getting Your Data House in Order; the Case for Ontologies, Taxonomies and Data Dictionaries:

Having recently moved from Australia back to the UK, I am currently living in a chaotic house filled with unpacked moving boxes. Even the ample double garage is wall-to-wall with clutter and moving detritus. There are number of things that vex me about the boxes and the mess;

  1. There are boxes from my 2010 Singapore to Australia move, as yet unpacked after 5 years. (Small comfort that there are none remaining from my 2002 UK to Singapore move!)
  2. Having used two different movers, the labels on the boxes are not the same and have a different numbering system.
  3. The boxes are labelled with generic tags like “Kitchen: Implements” or “Hallway: Cupboard”.
  4. A lot of the boxes contain things that bear no resemblance to what the tag says; last time I looked I did not keep my natty business socks in the kitchen cupboards, nor the air pump for my camping mattresses in the en-suite bathroom.

Predictably, the only way I will be able to get some semblance of functional order into the arrangement of my new home is to go through each box one by one, large and small, recent and not so recent. Recognition of the fact that we need to organise things and information if we are to make sense of our world, can be traced back to Aristotle! The problem is that I’ve got my business as usual (BAU) life to run; places to go and people to see and not a whole lot of time for box sorting. Not surprising then that spending an inordinate amount of time upside down inside cardboard containers unpacking, identifying and ordering (in small piles on the floor) all my goodies is neither an appealing nor a productive use of my time and energies. “Mixed container, Check contents carefully” is not a great help!

Like the stuff in our homes, financial services firms “collect” vast amounts of data for a variety of business purposes; data that needs to be used, moved, stored and re-used. Data, like the goodies in my boxes, that is moved without adequate categorisation and stored without adequate labelling will ultimately become a huge burden that prevents the conduct of revenue generating business. As long ago as 2002 the Delphi Group noted, in a milestone report, that “Our ability to create information has substantially outpaced our ability to retrieve relevant information.” The data that should enable business execution becomes a black hole of business prevention. Without the systemic implementation and use of Ontologies, Taxonomies and Data Dictionaries; firms end up needing to spend unsustainable amounts of time and money on data archaeology just to keep the business going.

Ontology and Taxonomy

Two big words but what do they mean and how can they provide practical utility in the context of data management?

A simple dictionary definition informs us that an ontology is the systematic arrangement of all of the important categories of objects or concepts which exist in a field of discourse, showing the relations between them. When complete, an ontology is a categorisation of all of the concepts in a field of knowledge, including the objects and all of the properties, relations, and functions needed to define the objects and specify their actions. A simplified ontology may contain only a hierarchical classification i.e. a taxonomy; showing the type subsumption relations (incorporating something under a more general category) between concepts in the field of discourse.

While a taxonomy, as noted above, is a systematic arrangement of objects or concepts showing the relations between them; especially one including a hierarchical arrangement of types in which categories of objects are classified as sub-types of more abstract categories, starting from one or a small number of top categories, and descending to more specific types through an arbitrary number of levels. An ontology usually contains a taxonomy as one of the important principles of organisation.

It is important to note at this stage, that an ontology is different from a data dictionary, in that it avoids “technical” terms and casing like objectProperty; for example socks.BusinessSocks.

The key here is that, as per the EDM Council, the focus of ontologies must be on the “unambiguous shared meaning” of data content. This is because the meaning must represent real concepts (i.e. products, clients, accounts, legal entities, processes, etc.) and real obligations (contractual requirements, transaction guarantees, counterparty processes, etc.) that are the critical factors of input into business and operational processes. In other words, as with the labelling of moving boxes, the practice of superior and sustainable data management must be built on the foundations of rich, descriptive, complete and shared meanings, assigned to the data we use.

How the Described Data Hits The Road

“So, all very cool, but what about an example?” I hear you ask. In the Legal Entity data management context, the plethora of global regulations across KYC/AML, OTC trade management and various tax regimes requires the sourcing, validation and management of data-sets that run to hundreds of distinct pieces of information. For a “standard” KYC process, firms are now collecting upwards of one-hundred-and-seventy individual data elements related to the client they are onboarding. A small example of these would be:

  • Name of the Company
  • The country the company is incorporated in
  • What sort of business activity the company undertakes
  • The industry code that represents the type of activity the company undertakes

Thinking ontologically these specific pieces of information could broadly be categorised as information related to:

  1. Corporate Registration & Legal Formation and
  2. Business Information & Activities

The plain English description of these categories and their relationships when pertaining to the description of a specific company could be expressed as “A company, as a legal entity, must be registered. A company is registered in a specific country. A company conducts a specific type of business activity. Types of business activities have been assigned standard industry codes.”

The relationships described above in English can be viewed as simple boxes with relationship connectors. When we need to define these elements for use programmatically, we need to extend our system of labelling to include; from the semantic spectrum, a data dictionary layer of identification for each piece of information. In this example we can, from our ontology use “Company” as our top level concept:

  • Company.legalName
  • Company.formation.incorporationCountry
  • Company.businessActivity
  • Company.businessActivity.IndustryCode

Starting to make sense?

Benefits of Data Congruence

Even if it makes logical sense, what are the realisable benefits of using and deploying a series of increasingly precise, semantically expressive definitions for data elements? In summary these are:

  • Usage greatly facilitates data integration capabilities
  • Supports enterprise scale business process automation
  • Enables consolidated views across firms and the broader industry
  • Ability to globally define concepts and terms without ambiguity
  • Allows for mapping to XBRL. (The business language used by major regulators to standardise financial reporting terms.) Importantly XBRL itself allows universal communication through metadata taxonomies, which capture and define financial concepts, terms & relationships for regulatory reporting.

Overall we are looking at a bagful of goodies; improved data quality, easier navigation, more efficient data searches, improved information sharing, essential support for interoperability and integration, and very importantly, a better data user experience.

Hear The Chorus Sing!

At iMeta Technologies, our product development has been focussed on creating a flexible data model that is designed such that it can comfortably consume and use data created in/changed by a semantic data management system. The configuration-driven data model is easily updated without either costly or time-consuming code changes and the overhead of new release cycles.

As part of ensuring this fit-for-purpose and future-proof product design, I have in recent weeks had a number of encouraging conversations with industry participants in the US and here in the United Kingdom. The global discussion is very much focussed on aligning thoughts on both the ongoing need for, and the current momentum behind, the evolution of a Semantics Ecosystem, i.e. a system that facilitates the ongoing coordination and alignment between leaders in industry, academia, government and technology for shared semantics, data visualisation and executable business rules!

By Mark Bands, Head of Product Strategy and Regulatory Intelligence