Jérôme Euzenat, Yasser Bourahla, 08/2022 Revised 12/2023
This notebook contains code and results for the paper 'Measuring and controlling diversity'. It has been reengineered to separate the code from the notebook.
If you see this notebook as a simple HTML page, then it has been generated by the notebook2022 found in this archive.
This is not a maintained software package (no git repo) but all the code is available as a python file (kdiv.py).
It is provided under the MIT License.
Here are the 7 distributions of the paper (a,b,c,d,e,f,g) of 5 ontologies (A, B, C, D, E) among 10 agents. They are encoded as arrays.
We provide three extra distributions (h, i, j), for the sake of trying.
The distances between the 5 ontologies are coded into arrays.
So there are no program connection between knowledge distance and diversity.
Such measures may be found in:
The code for computing various diversity measures is provided in the knowledge-diversity python file.
It implements a signature: diversity( distrib, dissimilarity ): float
These are:
structdist
: computes the average distance between the categories of the distribution;calcdiam
: computes the diameter of the distribution;median
: computes the median of the distribution.The entropy-based diversity measures are provided into two favours:
entropy
(additional parameter q
): compute the generalised entropy-based diversity measure. This is the initial naïve version;diversity
(additional parameter q
): a better implemented version of diversity-based entropy which also includes the implementation of the limit case $q=1$.The normalised versions are now those which have been reimplemented by Adrien Bonardel (see this notebook).
Finally the results to be found in Table 2 of the paper are gathered here.
These results include, in addition of those submitted:
Here is a tentative to induce a partial order from the order of diversity.
The algorithm is quite simple:
Note: Tom Leinster mentions that he restricts this to $q\geq 0$ (for reasons he does not explain, but which are discussed on page 121 of his book).
The result is as follows:
As can be observed from these results, the different distributions cannot totally ordered.
We start with a distribution and generate distributions with lower diversity. Ideally, it should be possible to start with a high diversity distribution. Then we want to achieve some levels of diversity. This is always with respect to a specific diversity measure.
For that purpose, the algorithm modifies the distribution one agent at a time. It does it so that the diversity decreases minimally at each stage (this is local).
It can be called by
selectdistribs( [2,2,2,2,2], unstructdist, 4 )
which will provide a sequence of 4 distributions evenly spread (from the standpoint of the diversity of the non structured distance and $q=2$), from the [2,2,2,2,2]
distribution.
It returns the distributions and their (non normalised) diversity level.
The result is:
Something interesting in these modest results: