Linear Discriminant Analysis

Linear discriminant analysis is not necessary to build a recognizer. But it is very helpful in terms of improving the recognition accuracy. We will not explain the theory behind LDA here, only that much that LDA is finding a transformation matrix A such that if every feature vector x is multiplied with A the ratio of the determinant of the total-scatter matrix and within-scatter matrix is maximized. The total scatter measures the diversity of all data, and the within scatter measures the average diversity of the data that belong to the same class. Thus finding the LDA matrix means making the data that belong to the same class move a bit closer, and making the data of different classes move a bit further apart.
If you are interested in more details about the theoretic background of LDA computation have a look at some good book or papers. On the rest of this page, we will only address the issue of how LDA matrices are computed with Janus.

Computing an LDA Matrix

Janus offers the object class LDA. The first part of an LDA computation is to establish an LDA object and define which acoustic models belong into which LDA-class. Usually we use one class for each Gaussian codebook. Generally we find that computing an LDA for a greater number of classes gives us better performance, so we recompute an LDA after switching from a context-independent to a context dependent system. The usual number of codebooks for a context independen system is three times the number of monophones, because we use three codebooks per monophone. After making the step to a context-dependent system we usually end up with thousands of codebooks. After that we usually compute a new LDA matrix that will discriminate better than the one that was computed with the context-independent system.

When an LDA object is initialized and the classes are defined, we can start training the scatter matrices. Maybe the term 'training' is not the right one, and 'computing' would be a better one, but considering that the main loop that Janus will do during LDA computation is exactly the same as when doing regular maximum likelihood training of its Gaussian mixtures, the term 'training' fits fine for LDA, too. So all that is done during training is loading an utterance, getting a path from somewhere (running Viterbi or loading labels from file), and accumulating the scatter matrices. When all the training utterances have been processed we end up with the two mentionned matrices, and the LDA object also stores the mean vectors for each class, and also the counts, i.e. the number of feature vectors the belonged to each class. The computation of the LDA matrix from the scatter matrices is called 'simultaneous diagonalization', and can be done in Janus bay calling the corresponding command. At the end, what is left to do, is just storing the LDA matrix and the counts. We can use the counts later for extracting sample vectors into separate files for the k-means or neural-gas algorithm.