Adrien Bonnardel, 12/2023
This notebook aims at presenting the work on computing knowledge diversity.
It extends the work on 'Measuring and controlling diversity' [Bourahla 2022] which can be found here. Some of the cells of this notebook are reproduced to display the data.
If you see this notebook as a simple HTML page, then it has been generated by the notebook found in this archive.
This is not a maintained software package (no git repo) but all the code is available as a python file (kdiv.py).
It is provided under the MIT License.
The diversity
measure implements entropy-based diversity (Parameters: distribution
, similarity
, q
, result: float
) following Tom Leinster's work [Leinster 2021].
This $q$-parametrised measure computes a diversity index from a distribution ($p$) of a population in categories for which there exists a similarity measure ($Z$).
We recall below the distributions and similarities used in [Notebook].
Here are the 7 distributions of the paper (a,b,c,d,e,f,g) of 5 ontologies (A, B, C, D, E) among 10 agents. They are encoded as arrays.
We provide three extra distributions (h, i, j), for the sake of trying.
The distances between the 5 ontologies are coded into arrays.
So there are no program connection between knowledge distance and diversity.
Such measures may be found in:
Hereafter these distances $d$ are turned into a similarity $Z=e^{-d}$.
kdiv.distsim( distance )
computes this transformation.
In order to normalise diversity it is necessary to find what the maximal diversity $Dmax$ is.
It has been shown that $Dmax$ and the distributions associated to it were independent from $q$ [Leinster 2021]. It only depends on $Z$.
The process for finding $Dmax$ involves in principle:
This is implemented by the function kdiv.find_max_dist
taking the similarity $Z$ as argument and returning a tuple with: $Dmax$, $B$, $w_B$, $Z_B$, $p_B$ for $B$ the basis of the maximal distribution.
We display below, side by side, for each distance, the plot of the non-normalised measures and normalised measures varying with $q$ and the optimal distribution $p_B$.
As can be seen:
The results are also displayed as a table featuring for each distribution, its diversity and normalised diversity along similarities.
They extend those of Table 2 in the initial the paper.
The previous algorithm has a time complexity of $O(n^3)$ indeed we have to compute all the submatrices.
By [Leinster 2021, Proposition 10.3], if the matrix is semi-definite positive, then $Dmax = |Z|$. To find if a matrix is semi-definite positive we have either to check if all the eigen values are $> 0$ either cholesky decomposition that will raise an error if the matrix is not positive semi definite definite. But in np.linalg (eigval and cholesky) both have the same time complexity: $O(n^3)$ which is the same as our actual algorithm.
If the similarity matrix has no 0 on the diagonal, that which is expected from similarity matrices, it is possible to use the Gauss-Seidel method to find eigen values in a lower time complexity which seems to be ~ $O(n^2.376)$.
We have used the implementation of the method provided here
For 5 categories (here the ontologies), the example show us that almost the same weight value can be found from around 45-50 tests while the other algorithm provided by remark 7.2 takes 125.
The code right below computes the execution time of the three methods to find $Dmax$ and the maximizing distribution
We can see that the Gauss-Seidel method is more effective than the previous one used on 5*5 matrix
Amazingly, the computation on the whole $Z$ (last two columns) provide the same results as the others even for the graph-based semantic distance (the '-0.01' is also provided by the Gauss-Seidel method but is hidden by the computation).
[Bourahla 2022] Yasser Bourahla, Jérôme David, Jérôme Euzenat, Meryem Naciri, Measuring and controlling knowledge diversity, Proc. 1st JOWO workshop on formal models of knowledge diversity (FMKD), Jönköping (SE), 2022 https://moex.inria.fr/files/papers/bourahla2022c.pdf bourahla2022c
[Leinster 2021] Tom Leinster, Entropy and diversity: the axiomatic approach, Cambridge university press, Cambridge (UK), 2021 https://arxiv.org/pdf/2012.02113.pdf
[Notebook] Knowledge diversity notebook: https://moex.inria.fr/software/kdiv/index.html