Learning possibilistic networks from data: a survey

Introduction and context State of the art Conclusion & Perspectives

Learning possibilistic networks from data: a survey

Maroua Haddad1,2, Nahla Ben Amor1 and Philippe Leray2

1 Laboratoire de Recherche Opérationnelle de Décision et de Contrôle deProcessus (LARODEC), ISG Tunis, Tunisie

2 Laboratoire d’Informatique de Nantes Atlantique (LINA), UMR CNRS 6241,Université de Nantes, France

Maroua Haddad Learning possibilistic networks from data 1/21


Outline ...

1. Introduction and context2. State of the art

2.1. Possibilistic network, inference, sampling2.2. Parameter learning2.3. Structure learning

3. Conclusion & Perspectives



Background

Possibility theory [Zadeh, 1978], [Dubois and Prade, 1980]

another uncertainty theorycompleting probability theory in order to handle uncertain,imprecise and missing information

Possibility distributionπ ∶ ω → [0,1]

max is equal to 1, notthe integral

Possibility measure Π

is A coherent with π ?Π(A) = maxω∈A π(ω)

Necessity measure Nis A certainly implied by π ?N(A) = 1 −Π(¬A)



Background




R

π

1

0

0.6

x






Background




R

π

1

0

0.6

x






Background

Possibilistic scaletwo distinct understandings of a possibility distribution

numerical interpretation (PROD)a possibility distribution may encode a piece of impreciseknowledge about a situation, as in approximate reasoning(Quantitative possibility theory)

ordinal interpretation (MIN)where the only important information is the ordering overpossibility values rather than their exact values, usually inorder to describe some user preferences(Qualitative possibility theory)



Background

Possibilistic scaletwo distinct understandings of a possibility distribution

numerical interpretation (PROD)a possibility distribution may encode a piece of impreciseknowledge about a situation, as in approximate reasoning(Quantitative possibility theory)

ordinal interpretation (MIN)where the only important information is the ordering overpossibility values rather than their exact values, usually inorder to describe some user preferences(Qualitative possibility theory)



Background

R

π

1

0

π1

π2

π3

Total ignorance

Complete knowledge

Specificityπ1 more specific than π2, π2 more specific than π3

from complete knowledgeto total ignorance



Background

R

π

1

0

π1

π2

π3

Total ignorance

Complete knowledge


from complete knowledge

to total ignorance



Background

R

π

1

0

π1

π2

π3

Total ignorance

Complete knowledge


from complete knowledgeto total ignorance



Learning from data

C S R Wc1 s2 r3 w1c3 s1 r2 w2c1 s2 r3 w3c2 s1 r2 w3c2 s2 r2 w1c1 s2 r1 w1c2 s1 r2 w3c1 s2 r3 w3c3 s1 r3 w2c2 s1 r1 w3c2 s1 r2 w1

Which kind of data ?certain dataimprecise datapossibilistic data



Learning from data

C S R Wc1 s2 r3 w1c3 s1 r2 w2c1 s2 r3 w3c2 s1,s2 r2 w2,w3c2 s2 r1,r2 w1c1 s2 r1c2 s1 r2 w3c1 s2 r3 w3c3 s1 r3 w2c2 s1 r2,r3 w3c2 s1,s2 r2 w1,w2,w3




Learning from data

C S R Wc1 s2 r3 w1c3 s1 r2 w2c1 s2 r3 w3c2 [1,0.8] r2 [1,1,1]c2 s2 [1,1,1] w1c1 s2 r1 [0.2,0.5,1]c2 s1 r2 w3c1 s2 r3 w3c3 s1 r3 w2c2 s1 [0.1,0.9,1] w3c2 [0.2,1] r2 [1,1,1]




Outline ...






Possibilistic networks [Fonck, 1994]

Sprinkler Rain

Wet Grass

Π (S|C) Π (R|C)

Π (W|SR)

Π (C)

Cloudy

Product-based possibilistic networks

π(ω∣pA) =⎧⎪⎪⎨⎪⎪⎩

π(ω)Π(A) if ω ∈ A0 otherwise.

Min-based possibilistic networks

π(ω∣mA) =⎧⎪⎪⎪⎨⎪⎪⎪⎩

1 if π(ω) = Π(A) and ω ∈ Aπ(ω) if π(ω) < Π(A) and ω ∈ A0 otherwise.

Possibilistic chain rule

π(V1, ..,VN) = ⊗i=1..NΠ(Vi ∣Pa(Vi))



Possibilistic inference

Possibilistic inferencejunction tree [Fonck, 1992] [Borgelt and Kruse, 1998] PRODMINanytime propagation [Ben Amor et al., 2003] MINcompilation techniques [Ayachi et al., 2010] PROD MINloopy belief propagation [Ajroud and Benferhat, 2014] MIN



Simulation

Sampling πin the Quantitative possibilistic frameworkby combining Monte Carlo random sampling and α-cutscertain data [Chanas and Nowakowski, 1988]imprecise data [Guyonnet, 2003]

and for a PROD possibilistic network ?

forward sampling (used for BNs) seems ok for certain databut how to deal with imprecise data, i.e. sampling data fromXi when its parents don’t have a certain value ?



Simulation

Sampling πin the Quantitative possibilistic frameworkby combining Monte Carlo random sampling and α-cutscertain data [Chanas and Nowakowski, 1988]imprecise data [Guyonnet, 2003]

and for a PROD possibilistic network ?

forward sampling (used for BNs) seems ok for certain databut how to deal with imprecise data, i.e. sampling data fromXi when its parents don’t have a certain value ?



Parameter learning

Objective

for a given structure, how can we estimate Π(Xi ∣Pa(Xi)) ?satisfying Maximum Uncertainty Principle (MUP) [Klir, 1990]When a problem solution is undetermined, the possiblesolution with the highest uncertainty should be chosenin Possibility theory : Minimize Non-Specificity



... by using Probability-Possibility transformation

Direct transformationsseveral existing transformations with different properties[Klir and Parviz, 1992], [Dubois et al., 1993, 2004],[Mouchaweh et al., 2006], [Bouguelid, 2007]applicable to the joint possibility distribution

Parameter learning ?certain data → joint probability distributiontransformation into a joint possibility distributionthen marginalization in order to find the conditional possibilitydistributions

Inconvenientsworking with the joint distributions is not efficientsupposing that the probability estimation is perfect is notrealistic




Direct transformations






Direct transformations






Confidence interval[De Campos and Huete, 2001]mixture of [Klir and Parviz, 1992] and [Dubois et al., 1993,2004] transformationsdeals with 2 probability distributions min and max.

, applicable to local conditional possibility distributions

Inconvenients/ but do not conserve joint possibility distribution




Confidence interval[De Campos and Huete, 2001]mixture of [Klir and Parviz, 1992] and [Dubois et al., 1993,2004] transformationsdeals with 2 probability distributions min and max.

, applicable to local conditional possibility distributions

Inconvenients/ but do not conserve joint possibility distribution



... directly from data

Using probability transformation with confidence intervals ?

[Masson and Denœux, 2006] Certain dataconsider a confidence interval for the estimation of the jointprobability distributionfind all the possibility distributions compatible with theseconstraints... but sometimes, return no solution

Using possibilistic histograms ?

Conclusionno existing solution for parameter learningsome possible solutions (possibilistic histograms) for PRODnetworks






[Joslyn, 1994]certain data, imprecise data, interval dataconditional distributions

, satisfy Minimum Non-Specificity principle










Structure learning

Constraint-based methodsno existing algorithmhow to measure possibilistic dependency ?

possibilistic mutual information (imprecise data) [Borgelt andKruse, 2003]possibilistic χ2 (imprecise data) [Borgelt and Kruse, 2003]

Score-based methods

Hybrid methods

Conclusionexisting but not satisfying solutions for PROD networkslot of open questions : for instance, what is Markovequivalence here ?



Structure learning

Constraint-based methods

Score-based methodsadaptation of BN algorithms for certain data [Gebhardt andKruse, 1996] or imprecise data [Borgelt and Gebhardt, 1997][Borgelt and Kruse, 2003]

/ discordance between global and local scores/ do not take into account the Minimum Non-Specificity

principle

Hybrid methods




Structure learning


Score-based methods

Hybrid methods

POSSCAUSE algorithm (certain data) [Sangüesa et al., 1998]/ incoherence in the definition of the independence test (one

different formula per paper)




Structure learning


Score-based methods

Hybrid methods




In practice

Applications

automotive Industry [Gebhardt and Kruse, 1995]aerospace [Kruze and Borgelt, 1998]information retrieval [Boughanem et al. 2008]

Implementationspossibilistic inference

POSSINFER [Gebhardt and Kruse, 1995]Pulcinella [Saffiotti and Umkehrer, 1991]Possibilistic networks Toolbox for Matlab [Ben Amor, 2012]



In practice

Applications

automotive Industry [Gebhardt and Kruse, 1995]aerospace [Kruze and Borgelt, 1998]information retrieval [Boughanem et al. 2008]

Implementationspossibilistic inference

POSSINFER [Gebhardt and Kruse, 1995]Pulcinella [Saffiotti and Umkehrer, 1991]Possibilistic networks Toolbox for Matlab [Ben Amor, 2012]



Outline ...






Conclusion

Possibilistic networksanother uncertainty theory devoted to imprecision modellingpossibilistic networks essentially studied in term of possibilisticinferencelot of open problem for the learning taskMIN networks are more dedicated for expert elicitation ? ? ?

Perspectives

new learning algorithm (parameters + structure) based onconsistent theoretical propertiesvalidation ? we need benchmarks or data generation from apossibilistic distributionMatlab implementationusing more sophisticated imperfect data (imprecise data,possibilistic data, interval data...)



Conclusion

Possibilistic networks

Perspectives

new learning algorithm (parameters + structure) based onconsistent theoretical propertiesvalidation ? we need benchmarks or data generation from apossibilistic distributionMatlab implementationusing more sophisticated imperfect data (imprecise data,possibilistic data, interval data...)


Any question ?

Science

Learning possibilistic networks from data: a survey