Starting Up a First Recognizer

In this step we will check if the creation of the description files went all right. Will start up the newly created environment and have a look at some features, some weights, and Viterbi paths.

The startup script

Let's jump right into it (in the step3 directory):
[FeatureSet fs] setDesc @/home/islpra0/IslData/featDesc 
                  fs setAccess @/home/islpra0/IslData/featAccess

[CodebookSet cbs fs] read ../step2/codebookSet 
[DistribSet dss cbs] read ../step2/distribSet
[PhonesSet ps] read ../step2/phonesSet 
[Tags tags] read ../step2/tags 
[Tree dst ps:phones ps tags dss] read ../step2/distribTree 

SenoneSet sns [DistribStream str dss dst] 

[TmSet tms] read ../step2/transitionModels 
[TopoSet tps sns tms] read ../step2/topologies 
[Tree tpt ps:phones ps tags tps] read ../step2/topologyTree 

fs FMatrix LDAMatrix 
fs:LDAMatrix.data bload ../IslData/ldaISLci.bmat
You've seen the creation of all these objects before, so there should be nothing new for you except that this time we are not giving life to the objects ourselves by adding all items explicitely. Instead, we just read the previously written description files. The feature description and access files we use here are not the ones we have just created, but those the system used, which collected the acoustic parameters we load in the next section. For the same reason we need to load a LDA matrix, which is done in the last two lines. It is used in the feature description file to get the extracted features to the same dimensionality as in the acoustic parameters we want to load.

So start a Janus process, and have it execute the above lines. If everything works without problems, then the environment files should be fine.

Loading the Acoustic Parameters

Now let's go for some acoustic parameters (weights). Remember, that the given archive contained some generic weights. Fortunately they are accumulators for codebooks with 16 16-dimensional vectors which were trained using the same preprocessing, as we are using in our feature description file. If this were not the case we'd have to write a suitable feature descriptionor start with random weights or use some labels and continue at a stage of this tutorial where we already have labels. But let's not complicate things too much, let's just be happy that we can use the generic weights right away.
Well, actually, we have to do some minor modifications. The given weights file has parameters for 16 phonemes (the same as ours) plus for the phonemes SIL and GARBAGE. All have three codebooks, because the recognizer which created the weights used three subphone segments (beginning, mid, and end). So we will just ignore the not needed segments in the weights file. Remember that we are using the underscore for the silence phone and the plus character for the garbage phone. So if we were simply loading the weights, Janus wouldn't know what to do with the weights for "SIL". Similar problems can occur when you want to initialize the weights for a recognizer on a new language. You phoneme set will very likely not match the 16 generic phones in this generic weights file. To cope with such problems, Janus offers a set of rewrite rules, there you can define, what your system's name for a model's name in the weights file is. Do the following:
RewriteSet rws 
rws add GARBAGE-m +-m 
cbs configure -rewriteSet rws 
Now you have defined a set a rewriting rule, which will interprete the name "GARBAGE-m" as "+-m". The configure command tells the codebook set object to use the just created rewrite rules set.

The generic weights file contains codebook accumulators, i.e. the stuff that is collected during training. To get some ready-to-use weights we'll have to do the following:

cbs createAccus 
cbs loadAccus ../IslData/codebookAccus.gz
cbs update

First create an accumulator object for every codebook, then load the generic weights. Then tell the codebook set object to update its parameters according to its accumulators.

After the loading of the weights accumulator file you should get the following message from Janus:

INFO codebook.c(4615) 124 accumulators were found in the file 
INFO codebook.c(4616) 83 accumulators were loaded 
INFO codebook.c(4617) 83 codebooks were defined
INFO codebook.c(4618) 41 codebooks were undefined 
INFO codebook.c(4619) 0 codebooks had no accumulator 
INFO codebook.c(4620) 0 refN mismatches occurred 
INFO codebook.c(4621) 0 dimN mismatches occurred 
INFO codebook.c(2713) 0 subN mismatches occurred

This means that there were 124 accumulators in the file. 83 of which were loaded. That is fine, because that is the number of codebooks that we have, so all codebooks have been loaded. 41 of the 124 accumulators were not defined (these were the SIL-b, SIL-e, GARBAGE-b, GARBAGE-e, and the -b segments of the other phonemes). There were no mismatches in the size of the codebooks (refN = number of reference vectors, dimN = number of dimensions, subN = number of accumulators per codebook).

Now that the loading of the codebooks accumulators was smooth, we can save the actual weights into a file. In this weighsfile our phoneme names will be used, and we won't have to use rewrite rules any more:

cbs save codebookWeights 

Loooking at the Acoustic Parameters

Let's have a look at some of the codebooks, now. Type:
showDSS dss 
This command is named like "show distribution set", which was it's primary purpose, but it also displays the codebooks of a distribution in grayscale patterns (light patterns = low values, dark patterns = high values).

Double-click on some of the available distribution names (they are the same as the codebook names according to your system design). We do not have any mixture weight distributions yet, only codebooks. So the display of the distributions is a flat rectangle. But the codebooks do show something. Admittedly, the view is not actually breathtaking, but with a little good will you will be able to notice that the silence codebook is a bit lighter than average, which means that the spectral energy values are lower. And looking at the S codebook you might notice the the energy in the higher frequencies is a bit greater (upper pixels are darker) than in the lower frequencies.

Having a Look at some Features

We've already had a look at some features, so this won't be new to you. We'll just do it again, because this time we can use our database and our feature access rule. Do the following:

[DBase db] open ../step1/db.dat ../step1/db.idx -mode r
fs eval [db get alex_waibel.1] 
featshow fs LDA

It is not much easier than the "fs eval" command we used earlier. But is shows how things should be done. In many cases your scripts will not explicitely load recordings, instead you will just tell some procedures to do so, and you will not want to bother how the feature set will get it's features. It will be enough to define a rule once.

Having a Look at some Viterbi Paths

With a little additional stuff we can even compute our first Viterbi alignment. This is useful to see if the loaded weights are worth anything. If the resulting Viterbi-Path is too edgy, if a few state get most of the speech frames, then the weights are not really useful. Once you completed the above steps, you don't have t repeat them all over again, but you can instead load them with

cbs load codebookWeights

after creating the codebook.

 

 

To be able to run a Viterbi alignment, we'll first have to create some more objects, namely a dictionary, an acoustic model set, an HMM, and a path object:

[Dictionary diction ps:PHONES tags] read /home/islpra0/IslData/dict
HMM hmm diction [AModelSet amo tpt ROOT] 
Path path 
The dictionary object diction should be self explanatory. The Path object path will be used to hold the Viterbi alignment path, the HMM object hmm will hold the entire HMM topology of an utterance, and the acoustic models set (AModelset) named amo is an object that has little to show, it maintains a collection of ways how phonemes can be modeled, including their topologies and acoustic units (senones).

Once these object are created we can call the followin little procedure:

proc viterbi utt {
   set uttInfo [db get \$utt]
   makeArray arr \$uttInfo 
   hmm make \$arr(text) -optWord SIL 
   return [path viterbi hmm -eval \$uttInfo] 
} 
It accepts one argument utt, which is an utterance ID. It then get's the information about this utterance that is stored in the database object db. The makeArray command makes an array arrout of the list uttInfo. This array has two elements, arr(text) contains the transcription of the utterance, and arr(utt) contains the utterance ID. We could have created a mightier database earlier with more information, then this information would also be part of the array. The "make" command lets the hmm object build all its internal structures for the entire utterances topology, which consists of three subobjects, a word-graph, a phone-graph, and a state-graph. The hmm is given as an argument to the viterbi method of the path object. The option "-eval \$uttInfo", triggers the automatic creation of the needed features for the Viterbi alignment. Don't confuse the viterbi method (which is an internal hard-coded Janus function that can be applied to path objects) with the Tcl viterbi procedure that we've just defined.

With this procedure defined, we can just do the following:

puts [viterbi alex_waibel.1] 
displayLabels path hmm 
You can repeat it for other utterances. You can get a list of all utterance IDs by just typing
db 
If the displayed Viterbi paths look somehow smooth, then the weights should be usable.