Generally, the recognition rate of an automatic speech recognition (ASR) system depends largely on the discriminability of the feature vectors representing the input speech signal. To improve the recognition rate, it is therefore, desirable to increase the discriminating power of the feature vectors.
In this thesis, we propose a linear transformation of the feature vector which aims to augment the recognition rate of the ASR by increasing the discriminating power of the feature vectors. By making use of the relative entropy of each phoneme (the unit of recognition), the proposed method tries to shorten the distances between within-class feature vectors, while lengthening the inter-class distances of the feature vectors. The method is based on the observation that as the relative entropy between two classes of feature vectors becomes larger, the dissimilarity increases, and so does the discriminating power between the classes. The proposed transformation matrix of the feature vector is derived as follows: Firstly, the objective function is defined as a function of the divergence which is the average of relative entropy between classes. Then, the objective function is maximized to give the optimal linear transformation matrix by an iterative learning algorithm, the natural gradient ascent method.
To examine the effect on the discriminating power of the proposed method, two sets of experiments are performed using the TIMIT corpus: a simple phoneme classification experiment using Euclidian distance measure and a recognition experiment by an ASR system. The results are compared with those of the well known methods, such as PCA, LDA and Li’s method and shown at least 0.28% of improvement.