Introduction

2025年09月26日

版权

Introduction

Three well-known statistical methods used in speech and speaker recognition are the vector quantisation（VQ），the Gaussian mixture model（GMM），and the hidden Markov model（HMM）［1］.In the VQ method，the hard clustering procedure［2］is performed and the feature vector space is partitioned into dislincf regions.The algorithm used is the k-means algorithm［3］or its version，the LBG algorithm［4］.Unlike the VQ method，a family of Gaussian probability density functions（pdfs）in the GMM method is used to model classes allowing a more realistic“soft”mapping of feature vectors into more than one class［5］.On the other hand，in the HMM method，the temporal structures of vector sequences are modelled by Markov chains［6］.Estimating the parameters of above three methods with unlabelled training vectors is known as the parametric unsupervised learning procedure and the EM algorithm proposed by Dempster，Laird，and Rubin［7］is a general approach to iterative computation of maximum-likelihood estimates.The EM-type algorithms are simple to implement and converge monotonically in terms of the log-likelihood of the observed-data model，under general conditions［8］.

One of the most widely used approaches for cluster analysis is the fuzzy c-means（FCM）.The FCM algorithm is an expsion of the k-means algorithm（the hard c-means algorithm）and was first introduced by Dunn［9］and generalised by Bezdek［10］.Gustafson and Kessel［11］proposed a modification of the FCM algorithm in terms of geometric shapes of clusters.Gath and Geva［12］defined an exponential distance for the FCM algorithm to obtain a fuzzy approach to maximum likelihood estimation.

The EM and FCM algorithms are the most widely used approaches for cluster analysis，therefore finding an unified algorithm for both approaches needs to be studied.The pioneer work of this unified algorithm was initiated by Hathaway［13］.It has been shown that the EM algorithm used for estimating model parameters may be viewed as a fuzzy clustering technique.Ambroise，Dang and Govaert［14］have recently contributed to this work in spatial clustering and Markov random fields.However，the above studies can only be referred to as the FCM-based approach lo the EM algorithm where the degree of fuzziness takes only the value of one.Values greater than one have not been considered by their approaches although these values have played an important role in the fuzzy clustering applications.We therefore propose a FCMbased approach to the EM algorithm.A class of fuzzy EM algorithms for all possible degrees of fuzziness including the EM algorithm has been considered［15］.We have also proposed the fuzzy HMM［16］，the fuzzy GMM［17］［18］［19］.Furthermore，we have applied a robust clustering method proposed by Dave［20］to the fuzzy HMM［21］and the fuzzy GMM［22］.We integrate all these works into a fuzzy approach to the statistical models for speech and speaker recognition.To present the unified fuzzy approach in this paper，we begin with the fuzzy EM algorithm and show how to obtain the fuzzy HMM，fuzzy GMM，and especially the fuzzy VQ，as well as the FCM.Applying Dave's robust clustering method is also presented here as a theoretical demonstration for further modifications.Some of speech and speaker recognition results reported here are regarded as an experimental demonstration.