1 Christian VIARD-GAUDIN [email protected] Séminaire IRCCyN 6 Janvier Institut de Recherche en Communications et Cybernétique de Nantes 2 0 00

1 Christian VIARD-GAUDIN [email protected] Sminaire IRCCyN 6 Janvier Institut de Recherche en Communications et Cyberntique de Nantes 2 0 00 Slide 2 Copyright IRCCyN/CVG 2 Multimdia et Rseaux Vido et Multimdia crit et Documents Modlisation psychovisuelle quipe Image et Vido Communications Modle z 0.8 0.7 0.3 0.2 0.6 0.4 0.15 0.1 0.25 0.35 Slide 3 Copyright IRCCyN/CVG 3 SOMMAIRE n TUTORIAL SUR LES MODLES MARKOV CACHS n APPLICATION A LA RECONNAISSANCE DE LCRITURE MANUSCRITE Slide 4 Copyright IRCCyN/CVG 4 A real-world signal Origins and Scopes n Late 1960s (BAUM, 1967) : Basic theory n Late 1980s : Large widespread understanding Application to speech recognition n Used to model the behavior of speech, characters, Temperature, stock market, By mean of a statistical approach Premire Partie : Modles de Markov Cachs (HMMs) Parametric random process Slide 5 Copyright IRCCyN/CVG 5 Hidden Markov Model (1) n An HMM is a double stochastic process 1) an underlying stochastic process generates a sequence of states : q 1, q 2, , q t,... q T, Wheret : discrete time, regularly spaced T : length of the sequence q t Q = {q 1, q 2,... q N } N : the number of possible states Slide 6 Copyright IRCCyN/CVG 6 Markov Chain Hypotheses 1) First order : probabilistic description is truncated to just the current state and the predecessor state : P[q t =q j |q t-1 =q i, q t-2 = q k, ] = P[q t =q j |q t-1 =q i ] 2) Stationarity : probabilities are time invariant : P[q t =q j |q t-1 =q i ] = a ij 1 i, j N This defines a square NxN matrix, A = {a ij } (state transition probability matrix) where a ij 0 and The value of each state is unobservable, but... 3) an initial state distribution = { i } should also be defined : i = P[q 1 =q i ] with 1 i Nwhere i 0 and Slide 7 Copyright IRCCyN/CVG 7 x t X = {x 1, x 2,... x M } Hidden Markov Model (2) n An HMM is a double stochastic process 1) an underlying stochastic process generates a sequence of states : q 1, q 2, , q t,... q T, 2) each state emits an observation according to a second stochastic process : qtqt x i : a discrete symbol M : number of symbols But... Slide 8 Copyright IRCCyN/CVG 8 Observation Hypothesis 1) The observation x t depends only of the present state q t : P[x t = x j | q t = q i ] = b ij This defines a N x M matrix B: observation probability matrix B = {b ij }where b ij 0 and A complete specification of an HMM ( ) requires : Two model parameters N and M A specification of the symbols to observe Three probability measures : , A, B = ( , A, B) Slide 9 Copyright IRCCyN/CVG 9 Let us play with this model now ! Example 1 : (Not Hidden, Just a Discrete Markov Model) n Model of the weather in Nantes N = M = 3 State = observation : Q = {q 1 =rain, q 2 =cloudy, q 3 =sunny} t is sampled every day, at noon for instance A Nantes = {a ij } = Slide 10 Copyright IRCCyN/CVG 10 NANTES Weather Model n Given that the weather today (t=1) is sunny (q 1 = q 3 ) Answer these questions What will be the weather tomorrow (t = 2) ? What is the probability of rain for the day after tomorrow (t = 3) ? And for the day after (t = 4) ? What is the probability the coming week will be sun-sun-rain-rain-sun-cloudy-sun Slide 11 Copyright IRCCyN/CVG 11 NANTES Weather Model Two more questions : What is the probability of rain for d consecutive days ? (ex. d = 3), What is the average number of consecutive sunny days ? Cloudy days ? Rainy days ? Slide 12 Copyright IRCCyN/CVG 12 NANTES Weather Model n Answers What will be the weather tomorrow ? A = Use of a trellis : What is the probability of rain for the day after tomorrow ? RCSRCS R C S Slide 13 Copyright IRCCyN/CVG 13 NANTES Weather Model And for the day after ? Just extend the trellis, from the previous values A = q 1 : rain q 2 : cloudy q 3 : sunny What is the probability the coming week will be P(q 4 =q 1 ) = P(q 3,q 3,q 1,q 1,q 3,q 2,q 3 ) = sun-sun-rain-rain-sun-cloudy-sun Slide 14 Copyright IRCCyN/CVG 14 What is the probability that rain lasts for d consecutive days ? (ex. d = 3), More generally : P(q i, q i, q i,q i j ) = hence P(q 1, q 1, q 1, q j 1 ) = NANTES Weather Model Slide 15 Copyright IRCCyN/CVG 15 NANTES Weather Model What is the average number of consecutive sunny days ? Cloudy days ? Rainy days ? Slide 16 Copyright IRCCyN/CVG 16 NANTES Weather Model A = Topological graphic representation : q 2 = cloudy q 1 = rain q 3 = sunny 0.4 0.3 0.6 0.2 0.1 0.8 This is an ergodic model : from one state, all other states are reachable Slide 17 Copyright IRCCyN/CVG 17 Example 2 : Extension to HMM n Coin tossing experiment : We do not see how the tossing is performed : one coin, multiple coins, biased coins ? We only have access to the result, a sequence of tosses, In that case, M = 2 and X = {x 1 = Head, x 2 = Tail} Observation sequence : (x 1,x 2, . x T ) = (H,H,T,T,T,H, H) The problem are : n How to build a model to explain the observed sequence ? n What are the states ? n How many states ? Slide 18 Copyright IRCCyN/CVG 18 n Coin tossing experiment : model 1 : assume only one biased coin 2 states (N = 2) Q = {q 1 = H, q 2 = T} Model topology : q 2 = Tail q 1 = Head P(H) 1-P(H) P(H) 1-P(H) Only one parameter is needed : P(H), it defines matrix A model 2 : assume two biased coins - 2 states (N = 2) Q = {q 1 = Coin1, q 2 = Coin2} - 2 different observations (M = 2) X = {x 1 = H, x 2 = T} Slide 19 Copyright IRCCyN/CVG 19 q 2 = Coin2 q 1 = Coin1 a 11 1-a 11 1-a 22 a 22 Model topology : Transition state Observation symbol probabilities :probabilities A = B = 4 parameters are required to define this model (A : 2, B : 2) Slide 20 Copyright IRCCyN/CVG 20 n Coin tossing experiment : model 3 : assume three biased coins 3 states (N = 3) Q = {q 1 = Coin1, q 2 = Coin2, q 3 = Coin3} 2 different observations (M = 2) X = {x 1 = H, x 2 = T} Model topology : q 2 = Coin2 q 1 = Coin1 q 3 = Coin3 a11 a12 a13 a22 a21 a23 a31 a32 a33 Observation symbol probabilities : B = 9 independent parameters are required to define this model (A : 6, B :3) Slide 21 Copyright IRCCyN/CVG 21 n Coin tossing experiment : model 3 : assume three biased coins 3 states (N = 3) Q = {q 1 = Coin1, q 2 = Coin2, q 3 = Coin3} - Observation probabilities - State transition probabilities : 1/3 - Initial state probabilities : 1/3 Consider the following example : Matrix A Vector Matrix B Slide 22 Copyright IRCCyN/CVG 22 n Coin tossing experiment : model 3 : assume three biased coins 3 states (N = 3) Q = {q 1 = Coin1, q 2 = Coin2, q 3 = Coin3} 1. You observe X = (H,H,H,H,T,H,T,T,T,T) Which state sequence , generates most likely X ? What is the joint probability, P(X, | ) of the observation sequence and the state sequence ? 2. What is the probability that the observation sequence came entirely from state q 1 ? Slide 23 Copyright IRCCyN/CVG 23 1. You observe X = (H,H,H,H,T,H,T,T,T,T) 2. What is the probability that the observation sequence came entirely from state q 1 ? Slide 24 Copyright IRCCyN/CVG 24 Example of non ergodic model (left-right model) qsqs q1q1 qfqf q3q3 q2q2 0.20.9 0.6 0.8 0.4 0.1 0.7 0.1 Three states + one starting state q s + one final state q f q s and q f are non emitting states. Assume there are 2 symbols to observe X = {x 1 =a,x 2 =b} Initial stateTransition stateObservation symbol probabilitiesprobabilitiesprobabilities P(a|q 1 ) P(b|q 3 ) Slide 25 Copyright IRCCyN/CVG 25 The most probable state sequence with this model is : q 2, q 3 resulting in the symbol sequence bb. But this sequence can also be generated by other state sequences, such as q 1, q 2. Computation of the likelihood of an observation sequence : Given X = aaa compute the likelihood for this model : P(aaa | ) The likelihood P(X | ) is given by the sum over all possible ways to generate X. qsqs q1q1 qfqf q3q3 q2q2 0.2 0.9 0.6 0.8 0.4 0.1 0.7 0.1 Slide 26 Copyright IRCCyN/CVG 26 S I R A P Using HMM for pattern recognition consists in computing the model i among a set of K models which maximizes the likehood for an observation to have been generated by this model : max = arg max P(X| i ) for i = 1, K i Character recognition : Small lexicon : as many HMModels as words Otherwise, letters are individually modeled by an HMM which can be concatenated to form word models. Word model for PARIS Slide 27 Copyright IRCCyN/CVG 27 n Problem 1 : Recognition Given X = (x 1,x 2, x T ) and the various models i How to efficiently compute P(X| ) ? n Problem 2 : Analysis Given X = (x 1,x 2, x T ) and a model, find the optimal state sequence How can we undiscovered the sequence of states corresponding to a given observation ? n Problem 3 : Learning Given X = (x 1,x 2, x T ), estimate model parameters = ( , A, B) that maximize P(X| ) How do we adjust the model parameters = ( , A, B) ? The three basic problems for HMMs Forward-Backward algorithm Viterbi algorithm Baum-Welch algorithm Slide 28 Copyright IRCCyN/CVG 28 n Problem 1 : Recognition How to efficiently compute P(X| ) ? X = x 1 x 2 x t x T : observation sequence It exists several paths ( ) which allow to obtain X : P(X| ) = P( X, | ) = P(X | , ) x P( | ) Depends only on observation probabilities : B matrix Depends only on state transition probabilities : A P(X | , ) The path is defined by a sequence of states : q 1 q 2 q t q T P(X | , )= P(x 1 x 2 x t x T | q 1 q 2 q t q T, ) = P(x 1 | q 1 q T, )xP(x 2 | x 1,q 1 q T, )P(x T | x T-1 x 1,q 1 q T, ) = P(x 1 | q 1, )xP(x 2 | q 2, )P(x T |q T, ) as x t depends only of q t Joint probability Slide 29 Copyright IRCCyN/CVG 29 P( | ) The path is defined by a sequence of states : q 1 q 2 q t q T P( | )= P(q 1 q 2 q t q T | ) = P(q 1 | ) x P(q 2 |q 1, )P(q T | q T-1 q 1, ) = P(q 1 | ) x P(q 2 | q 1, )P(q T |q T-1, ) n Problem 1 : Recognition (2) How to efficiently compute P(X| ) ? P(X| ) = P( X, | ) = P(X | , ) x P( | ) Joint probability as we assume a first order HMM At last, by replacing : Slide 30 Copyright IRCCyN/CVG 30 What about the computational complexity ? Number of multiplications for one path : Number of paths X : Total number of multiplications : Total number of additions : How long does it take ? Assume N = 23, T = 15 (word check application) Number of operations 2T.N T = 30. 23 15 10 22 Assume 1 Gops Number of seconds = 10 22 /10 9 = 10 13 Number of days = 10 13 /3600x24 = 10 8 Number of years = 10 8 /365 = 10 5 FIND SOMETHING ELSE !!! (T-1)+1+1+(T-2) =2T-1 NTNT (2T-1)N T (N T -1) Slide 31 Copyright IRCCyN/CVG 31 Forward-Backward algorithm Use a trellis structure to carry out the computations : at each node of trellis, store the forward variable t i with t i = P(x 1 x 2 x t, q t = q i | ) which is the probability of a partial observation sequence up to time t and of being in the state q i at that same time Algorithm in 3 steps : 1. Initialization 1 i = P(x 1, q 1 = q i | ) = P(x 1 | q 1 = q i, ) x P(q 1 = q i | ) 2. Recursion with 1 j N, and 1 t T-1 3. Termination Slide 32 Copyright IRCCyN/CVG 32 Forward-Backward algorithm 1...tt+1... a 1j t1t1 qN.qi.qj..q1qN.qi.qj..q1 tNtN titi a Nj a ij with 1 j N, and 1 t T-1 Total number of multiplications : Total number of additions : Assume N = 23, T = 15 Number of operations: Slide 33 Copyright IRCCyN/CVG 33 n Problem 2 : Analysis How can we undiscovered the sequence of states corresponding to a given observation X ? Choose the most likely path : Find the path (q 1,q 2,,q T ) that maximizes the probability = P(q 1,q 2,,q T | X, ) Solution by Dynamic Programming : It is an inductive algorithm that keep the best possible state sequence ending in state q i at time t. Viterbi algorithm Slide 34 Copyright IRCCyN/CVG 34 n VITERBI Algorithm Define t (i) = max P(q 1,q 2,,q t =q i, x 1,x 2,x t | ) q 1,q 2,,q t-1 t (i) is the highest probability path ending in state q i By induction, we have : t+1 (k) = max [ t (i) a ik ]. b k (x t+1 ), with 1 k N 1 i N Memorize also t+1 (k) = arg max( t (i) a ik ) 12t-1tt+1T-1T TIME x 1 x 2 x t-1 x t x t+1 x T-1 x T OBSERVATION Nkji21Nkji21 STATES a 1k a ik a Nk t+1 (k) t (i) t+1 (k) = j Tracing back the optimal state sequence max [ T (i)] 1 i N Slide 35 Copyright IRCCyN/CVG 35 n VITERBI Algorithm 1. Initialization For 1 i N 1 (i) = i x b i (x 1 ); // or -ln( i )-ln(b i (x 1 )) 1 (i) = 0; 2. Recursive computation For 2 t T For 1 j N t (j) = max [ t-1 (i) a ij ]. b j (x t );// or t (j) = min[ t-1 (i)-ln( a ij )]-ln(b j (x t )); 1 i N t (j) = arg max( t-1 (i) a ij );// or t (j) = arg min[ t-1 (i)-ln(a ij )); 1 i N 3. Termination P* = max[ t (i)];// or P* = min[ t (i)]; 1 i N q* T = arg max[ T (i)];// or q* T = arg min[ T (i)]; 1 i N 4. Backtracking For t=T-1 down to 1 q* t = t (q* t+1 ); Hence P* (or exp(-P*) ) gives the required state-optimized probability,and * = (q 1 *,q 2 *, , q T *) is the optimal state sequence. Slide 36 Copyright IRCCyN/CVG 36 n Problem 3 : Learning How do we adjust the model parameters = ( , A, B) ? Baum-Welch algorithm 1. Let initial model be 0, 2. Compute new model based on 0 and on observation X, 3. If log P(X| ) - log(P(X| 0 ) < Delta stop, 4. Else set 0 and goto step 2. Slide 37 Copyright IRCCyN/CVG 37 Joint Probability P(a, b) = P(a | b) x P(b) = P(b | a) x P(a) Slide 38 Copyright IRCCyN/CVG 38 SECONDE PARTIE : RECONNAISSANCE DE LECRITURE MANUSCRITE ON-LINEOFF-LINE Slide 39 Copyright IRCCyN/CVG 39 OFF-LINE VERSUS ON-LINE Pos de stylo Lever de stylo Intersection Retrac Slide 40 Copyright IRCCyN/CVG 40 Image dun mot HMM 2D 1D 2D 1D Modlisation graphe : Ordonnancement off-line OrdRec LettresGraphmes Colonnes de pixels LA RECONNAISSANCE Slide 41 Copyright IRCCyN/CVG 41 UN MODELE APPROPRIE POUR UNE OPTIMISATION GLOBALE 1 DEFINITION DUN NOEUD 2 GRAPHE ORIGINAL Problme du voyageur de commerce recherche dun cycle hamiltonien Modle lien inter lien intra 1 21 2 3 4 5 6 7 8 5 4 67 8 3 3 2 7 6 5 8 1 4 R2 R1 R3 Slide 42 Copyright IRCCyN/CVG 42 UN MODELE APPROPRIE POUR UNE OPTIMISATION GLOBALE 1 DEFINITION DUN NOEUD 2 GRAPHE ORIGINAL 3 GRAPHE COMPLET 5 VALUATION DU GRAPHE 4 GRAPHE FINAL 6 CHEMIN HAMILTONIEN 1 21 2 3 4 5 6 7 8 5 4 67 8 3 P Lien inter Lien intra Lien de compltude Lien de dpart et darrive 3 2 7 6 5 8 1 4 Slide 43 Copyright IRCCyN/CVG 43 Feature Extraction Feature Extraction Symbol Mapping Symbol Mapping HMM rugby : -2.15 : Result Feature Vectors Observations : 1D sequence of symbols - State transition prob. - Observations symbol prob. - Initial state prob. Synoptique du Systme de RECONNAISSANCE Slide 44 Copyright IRCCyN/CVG 44 Segmentation Ensemble de segments Squence de segments orients et de levers-poss de stylo Extraction de caractristiques Quantification vectorielle Squence de symboles 125 32 368 56 98 225 118 HMM Fichier on-line Graphe Image dun mot Ordonnancement on-line Lignes de rfrence un deux frs Normalisation Ordonnancement off-line Vraisemblances pour chaque mot du dictionnaire SYSTEMESYSTEME RECONNAISSANCERECONNAISSANCE DEDE Slide 45 Copyright IRCCyN/CVG 45 Core line Number of clusters : 300 EXAMPLES of CLUSTERS Slide 46 Copyright IRCCyN/CVG 46 IRONOFF RESULTATS Une base de donnes duales on-line et off-line 25 451 caractres isols 31 346 mots cursifs environ 700 scripteurs diffrents (64% dhommes et 34% de femmes) ge moyen dun scripteur : 26 ans et demi Dictionnaire :197 mots Apprentissage :20 898 mots Test :10 448 mots Slide 47 Copyright IRCCyN/CVG 47 Step 1 : online acquisition Step 2 : offline acquisition Step 3 : matching process Online data Gray level image IRONOFF Construction Slide 48 Copyright IRCCyN/CVG 48 IRONOFF Samples Slide 49 Copyright IRCCyN/CVG 49 COMPARAISON OffLineSeg et OnLineSeg APPROCHE ORDREC OffLineSeg Slide 50 Copyright IRCCyN/CVG 50 COMPARAISON OnLineSeg et OnLinePt Slide 51 Copyright IRCCyN/CVG 51 EXEMPLES Top(1) correct Slide 52 Copyright IRCCyN/CVG 52 EXEMPLES Top(1) correct Slide 53 Copyright IRCCyN/CVG 53 EXEMPLES Erreurs dues un pr-traitement : normalisation du mot Erreurs d annotation : mot inconnu dans une certaine casse Erreurs dues aux limitations du modle Slide 54 Copyright IRCCyN/CVG 54 EXEMPLES Erreurs dues un pr-traitement : normalisation du mot Erreurs d annotation : mot inconnu dans une certaine casse Erreurs dues aux limitations du modle Slide 55 Copyright IRCCyN/CVG 55 FIN Slide 56 Copyright IRCCyN/CVG 56 CONCATENATION DE MODELES LETTRES POUR DEFINIR DES MODELES MOTS cet tat initial tat final tats non metteurs APPROCHE LEGO Slide 57 Copyright IRCCyN/CVG 57 dia MODELE MOT : APPROCHE LEGO qDqD d ega dia t qFqF dgt 15 d div_0 dia g div_+ dia div_0 dia divDia_0 t div_F dia divDia_F 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 1 dgt Slide 58 Copyright IRCCyN/CVG 58 APPROCHE ORDREC Poss de stylo et ordre des traits Poss et levers de stylo Levers de stylo Graphe des segments Graphe des traits Slide 59 Copyright IRCCyN/CVG 59 Dimensions of the segment enclosing rectangle : L x (s) et L y (s), Relative Ordinate to the baseline of the center of the enclosing rectangle : y(s), Segment orientation : D x (s) et D y (s), Left and right extensions of the segment : O g (s) et O d (s). Segment features Pen-up/pen-down features Ordinate of the vector center of the pen-up/pen-down : y(lp), Direction of the vector of the pen-up/pen-down : D x (lp) et D y (lp).