INTRODUCTION

2025年09月26日

版权

INTRODUCTION

The VQ-based speaker identification system consists of two processes，training and identification.In the training process，speaker models known as codebooks are separately trained by using a set of feature vectors extracted from labelled utterances（Soong，1987）.The most widely used method is the k-means algorithm（Duda，1973），where the LBG algorithm is a commonly extended version（Linde，1980）.A codebook is a set of cluster mean vectors known as codevectors.In the identification process，an input unknown utterance is analysed into a sequence of test vectors，and the task is how to identify speaker，given the vector sequence and codebooks.This task is implemented in two steps：determining distortions between vectors and codebooks；and using a classifier to select the best speaker based on these distortions.In the first step，using the nearest neighbour rule，distortion between a vector and a codebook is determined as the distance from the vector to its nearest codevector in the codebook.In the second step，the distortion between the vector sequence and the codebook is defined as the average of distortions obtained from vectors in the sequence.From the average distortions of speakers，using the nearest prototype classifier，a speaker is selected if his/her distortion is minimum（Rosenberg，1992）.

Variations of the identification process and classifiers exist，such as：

·　Using the k-nearest neighbour rule rather than the nearest neighbour rule（Cover，1967）；

·　Using a fuzzy classifier where a fuzzy membership function associated with the vector sequence for the speaker model is defined（Pal，1977）；

·　Combining the fuzzy c-means algorithm and the nearest prototype classifier as a fuzzy approach to the nearest prototype classifier（Bezdek，1977）；

·　A fuzzy k-nearest neighbour algorithm（Keller，1985）；

·　Using more robust statistics such as the median instead of the mean in computing the distortion（Rosenberg，1992）；

·　A fuzzy generalised nearest prototype classifier（Kuncheva，1997）.

Fuzzy approaches to speech and speaker recognition have achieved high recognition accuracy（Pal，1977；Tseng，1987；Choi，1996；Tran，1998a；Tran，1998b；Tran，1999a；Tran，1999b）.Therefore in this paper，we propose an alternative fuzzy classifier to the VQ-based speaker identification system.The training process is unchanged，where codebooks are trained by the LBG method，a hard clustering algorithm.In the identification process，for the first step，the distortion between a test vector and a codebook is computed in two ways：using the above-mentioned nearest neighbour rule and using the fuzzy k-nearest neighbour rule.The distortion in the latter is determined as the minimum value of the fuzzy objective function used in the fuzzy c-means method（Bezdek，1987）.It is interesting that the expression of this minimum value is very similar to that of the distance defined in the prototype-based minimum error classifier for speech recognition（Katagiri，1998）.For the second step，a fuzzy membership function associated with vectors is determined by using distortions obtained from the first step.Then the fuzzy membership associated the vector sequence is defined as the average of memberships obtained from vectors in the sequence.Using the fuzzy nearest prototype classifier，a speaker is selected if his/her average membership is maximum.Experiments on identifying 16 speakers using TI46 speech data corpus in text-independent mode show that the fuzzy nearest prototype classifier gives better results than the nearest prototype classifier.