Our societies produce knowledge and data at an ever increasing pace. These knowledge and data are generated in an independent manner by autonomous individuals or companies. They are heterogeneous and their joint exploitation requires connecting them.
However, data and knowledge have to evolve, facing changes in what they represent, changes in the context in which they are used and connections to new data and knowledge sources. These sources are currently mostly maintained by hand. As they grow and get more interconnected, this becomes less sustainable. But if knowledge does not evolve, it will freeze leading to sure obsolescence.
Beyond the production of knowledge on the semantic web and linked data, this problem applies to any domain in which knowledge is produced in a way usable by computers. For instance, smart cities or the internet of things produce a wealth of changing data. The knowledge about this data has to evolve continuously to remain up-to-date as new data sources are encountered and conditions are changing. Knowledge must evolve organically with the life of its users.
This problem lies in the lack of autonomous evolution of heterogeneous knowledge. No one waits for knowledge to be perfect before using it and agents and societies cannot be interrupted for upgrading their knowledge. Hence, knowledge has to be situated, i.e. considered with respect to its use (called situation), and evolve continuously, i.e. without interruption.
mOeX addresses the seamless evolution of knowledge representations in individuals and populations. The question at the core of our proposal is to understand how to make knowledge representation continuously evolve in presence of environment changes and new knowledge sources. Currently, no satisfactory solution to this problem exists.
To tackle this problem, we start from two specific hypotheses:
Based on such hypotheses, we study populations of agents sharing knowledge through interaction. The interactions may be carried out through precisely specified modalities (which may involve direct knowledge exchange, talking, acting together or in presence). After interacting, when they discover that constraints have changed, agents will not relearn knowledge from scratch. Instead, adaptation operators, taking into account the current knowledge and other constraints, will adapt it to the new constraints. We study how knowledge evolve when these populations:
The highly difficult problem is not to have procedures allowing such agents to converge towards a common state of knowledge, but to characterise this state by the properties satisfied by the resulting knowledge. Such properties may, for instance, be:
What is radically new here is that these problems are approached from the standpoint of the resulting knowledge representations. mOeX work will contribute to answer the following questions:
Our ambition is to spark a new approach to knowledge evolution that we call cultural knowledge evolution. It designs, studies, and experiments with mechanisms for making knowledge representations serendipitously evolve through their use. This should enable developing and sharing complex knowledge in a more robust way.
Now is the right time to start such a research programme: on the one hand, developments on the semantic web provide us with proven knowledge representation formalisms and tools which have been designed for sharing knowledge; on the other hand, work on experimental cultural evolution provides a solid methodology for carrying out this type of research. This approach has not been applied yet to knowledge representation directly. Both fields are mature enough to be associated.
To investigate the foundations of situated knowledge evolution we need an approach that:
Thus, mOeX will develop the unique combination of knowledge representation and experimental cultural evolution methods. Knowledge representation provides formal models of knowledge; experimental cultural evolution provides a well-defined framework for studying situated evolution. We do not intend to replace symbolic representation, but to complement it.
The reasons why these approaches are well adapted are the following:
Our methodology involves the following three tasks interacting together in a constant feedback:
Finally, in order to ensure the repeatability and reusability of experiments we aim at developing a software platform to support this approach.
Our cultural knowledge evolution work currently focusses on alignment evolution. Such repair experiments have been revealed that, by playing simple interaction games, agents can effectively repair random networks of ontologies or even create new alignments.
Alignments between ontologies may be established through agents holding such ontologies attempting at communicating and taking appropriate actions when communication fails. We have tested this approach on alignment repair, i.e. the improvement of incorrect alignments. For that purpose, we performed a series of experiments in which agents react to mistakes in alignments. Agents may use ontology alignments to communicate when they represent knowledge with different ontologies: alignments help reclassifying objects from one ontology to the other. Such alignments may be provided by dedicated algorithms [Da Silva 2017a, 2018a], but their accuracy is far from satisfying. Yet agents have to proceed. Agents only know about their ontologies and alignments with others and they act in a fully decentralised way. They can take advantage of their experience in order to evolve alignments: upon communication failure, they will adapt the alignments to avoid reproducing the same mistake.
Such repair experiments have been performed [Euzenat 2014c] and revealed that, by playing simple interaction games, agents can effectively repair random networks of ontologies.
We repeated these experiments and, using new measures, showed that the quality of previous results was underestimated. We introduced new adaptation operators (refine, addjoin and refadd) that improve those previously considered (delete, replace and add). We also allowed agents to go beyond the initial operators in two ways [Euzenat 2017a]: they can generate new correspondences when they discard incorrect ones, and they can provide less precise answers. The combination of these modalities satisfy the following properties:
The results above show 100% precision for all adaptation operators, i.e. all the correspondences in the alignments were correct, but were still missing some correspondences, and did not achieve 100% recall. We had conjectured that this was due to a phenomenon called reverse shadowing [Euzenat 2017a], avoiding to find specific correspondences.
We introduced a new adaptation modality, strengthening, to test this hypothesis. The strengthening modality replaces a successful correspondence by one of its subsumed correspondences covering the current instance. This modality is different from those developed so far, because it leads agents to adapt their alignment when the game played has been a success (previously, it was always when a failure occurred). We defined three alternative definitions of this modality depending on if the agent chooses the most general, most specific or a random such correspondence.
We experimentally showed that it was not interferring with the other modalities as soon as the add operator was used. This means that all properties of the previous adaptation operators are preserved. Moreover, as expected, recall was greatly increased, to the point that some operators achieve 99% F-measure. However, the agents still do not reach 100% recall.
The work on expansion suggests that, with the expansion modality, agents could develop alignments from scratch. We explored the use of expanding repair operators for that purpose. When starting from empty alignments, agents fail to create them as they have nothing to repair. Hence, we introduced the capability for agents to risk adding new correspondences when no existing one is useful [Euzenat 2017b]. We compared and discussed the results provided by this modality and showed that, due to this generative capability, agents reach better results than without it in terms of the accuracy of their alignments. When starting with empty alignments, alignments reach the same quality level as when starting with random alignments, thus providing a reliable way for agents to build alignment from scratch through communication. The evolution curves of both approaches (random and empty alignments), passed a starting phase in which figures correspond to this initial conditions, superimpose nearly exactly. This comfort a posteriori the experiments with random initialisation.
We started taking the population standpoint on experimental cultural evolution. For that purpose we introduced the concept of population within the experiments. So far, a population is characterised as a set of agents sharing the same ontology. Such agents play the same alignment repair games as before with agents of other populations.
We explored how closely these operators resemble logical dynamics. We developed a variant of Dynamic Epistemic Logic to capture the dynamics of the cultural alignment repair game. The ontologies are modelled as knowledge and alignments as beliefs in a variant of plausibility-based dynamic epistemic logic. The dynamics of the game is achieved through (public) announcement of the game issue and the adaptation operators are defined through conservative upgrades, i.e. modalities that transform models by reordering world-plausibility. This allowed us to formally establish some limitations and redundancy of the operators [van den Berg 2019a]. More precisely, for a complete logical reasoner, the operators are redundant and some may be inconsistent with the agent knowledge.
These results hold for one agent in the game but not necessarily for the other that may not know the classes by which the alignment is repaired, nor the relations between them. The former can be dealt with by declaring that agents are aware of the signature of both ontologies (public signature assumption) but this does not allow ontologies to evolve. We are currently investigating partial semantics as a more dynamic alternative solution to this problem.
Cultural evolution may be studied at a `macro' level, inspired from population dynamics, or at a `micro' level, inspired from genetics. The replicator-interactor model generalises the genotype-phenotype distinction of genetic evolution. We considered how it can be applied to cultural knowledge evolution experiments [Euzenat 2019a]. More specifically, we consider knowledge as the replicator and the behaviour it induces as the interactor. We showed that this requires to address problems concerning transmission. We discussed the introduction of horizontal transmission within the replicator-interactor model and/or differential reproduction within cultural evolution experiments.
The benchmarks, results and software are available at http://lazylav.gforge.inria.fr.
At that stage, we have developed a workable methodology and tools to investigate cultural knowledge evolution and specifically alignment repair. We have gathered a wide range of adaptation operators and modalities. We can work towards experimenting how operators and modalities can be selected by agents. We can also consider how they may be linked to 'cultural values' and what consequences this entails.
|Publications on cultural knowledge evolution|
We are continuing our work on link keys for data interlinking in two specific directions:
Link keys can also be thought of as axioms in a description logic. As such, they can contribute to infer ABox axioms, such as links, terminological axioms, or other link keys. This has important practical applications, such as link key inference, link key consistency and link key redundancy checking. Yet, no reasoning support existed for link keys.
We previously extended the tableau method designed for the ALC description logic to support reasoning with link keys in ALC [Gmati 2016a]. We showed how this extension enables combining link keys with classical terminological reasoning with and without ABox and TBox and generating non-trivial link keys. We further extended the method and have proven that this extended method terminates, is sound, complete, and that its complexity is 2ExpTime.
A first method has been designed to extract and select link keys from two classes which deals with multiple values but not object values [Atencia 2014b]. Moreover, the extraction step has been rephrased in formal concept analysis (FCA) allowing to generate link keys across relational tables [Atencia 2014d]. We also used pattern structures, an extension of FCA with ordered structures, to reformulate this problem [Abbas 2019a].
We have extended this latter work so that it can deal with multiple object values when the data set is cycle free. This encoding does not necessarily generate the optimal link keys. Hence, we use relational concept analysis (RCA), an extension of FCA taking relations between concepts into account [Atencia 2019z]. We show that a new expression of this problem is able to extract the optimal link keys even in the presence of cyclic dependencies. Moreover, the proposed process does not require information about the alignments of the ontologies to find out from which pairs of classes to extract link keys.
We implemented these methods and evaluated them by reproducing the experiments made in previous studies [Vizzini 2017a]. This shows that the method extracts the expected results as well as (also expected) scalability issues.
We investigated the use of link keys taking advantage of ontologies. This can be carried out in two different directions: exploiting the ontologies under which data sets are published, and extracting link keys using ontology constructors for combining attribute and class names.
Following the first approach, we extended our existing algorithms to extract link keys involving inverse (
For certain data sets, it may be necessary to use several link keys, even on the same pair of classes, for retrieving a more complete link set. We introduced operators to combine link keys over the same pair of classes, investigated their relations and extended measures to evaluate their quality.We specifically proposed strategies to extract disjunctions from RDF data and apply existing quality measures to evaluate them. We experimented with these strategies showing their benefits [Atencia 2019c].
|Publications on data interlinking|
Initial project proposal (2016): pdf
Synthesis report: 2016-2019
Activity report 2017: pdf, html; 2018: pdf, html; 2019: pdf, html;
Publications: our paper section (from which references are taken)
Courte introduction en Français: Bulletin de l'AFIA 105:40-43, 2019
mOeX is building on top of the results of the Exmo project whose pages may provide some background information on previous work.