Project description:
In this project, you need to design, implement and evaluate English vowel phonemes identification system using
signal processing. English has 13 basic vowel phonemes, while Arabic has only six. The following 13 words contains
13 vowels in the form /h V d/, where V is the vowel.
heed, hid, head, had, hard, hudd, hod, heard, hoard, hood, who’d, hade, hide
The English vowels are different in their magnitude spectrum (Fourier transform). The general shape of the
spectrum is distinctive for each vowel. Moreover, the first two format frequencies (F1 and F2) alone can be used
recognize each vowel. The format frequencies are the frequencies corresponding to the peaks of the magnitude
spectrum of the vowel. Therefore, F1 is the frequency corresponding to the first peak of the spectrum; F2 is the
frequency corresponding to the second peak, and so on. In most cases and depending on the context, the vowels
are different in their duration. A simple system can identify English vowels by estimating F1 and F2 for a given
input vowel and then compare them against stored F1 and F2 of the 13 vowels.
Method1: Correlation: use cross correlation to compare the signal segment of unknown vowel with the signal
template (reference) of each of the 13 vowels and the recognized vowels the one that gives the highest
correlation. To get better performance, you need to get the signal segment aligned as accurate as possible.
Method2: Fourier transform:
Now, you need to develop the same system (English vowel identification system) but this time using Fourier
transform. I.e. instead of taking cross correlation, you can take some details of the spectrum by quantizing spectrum
using FFT . By this, each vowel is represented by a feature vector. These vectors can be used to identify vowel more
accurately than just looking at two format frequencies; F1 and F2. In addition, you can append the vowel duration to
the feature vectors. Therefore, you need to compute the mean vector of each vowel and store them in a lookup
table. To recognize the input vowel, you compute its vector and match it with the stored vectors. The one that gives
you the highest match is the recognized vowel. You can use cosine similarity, Euclidean distance, etc as a matching
measure.
Method3: Filter bank:
Filter bank is an alternative method to the Fourier transform for estimating the spectrum features. Instead of using
Fourier transform, you need to design a set of bandpass filters covering the signal bandwidth, and pass the vowel
segment into each filter and compute the output. By, this you will get one output from each filter which represents
the average power of the vowel in the filter band. These outputs are then used as feature vectors and a matching
system, similar to the one described above in the Fourier transform method, is used to make the vowel recognizer.
Dataset:
You will be provided with sufficient samples of the 13 English vowels spoken by English native speakers
(10 samples for each word). You need to divide this data into two subsets; training and testing. For
example, you can use 7 for training and 3 for testing for each vowel. So, you build your system (three
methods) on the training recordings and use testing recordings for testing your system and find its accuracy
(percentage of the correctly identified vowels to the total testing vowels).
Dear Client
I have read your project description and noticed that I am the only one who can make your project successful.
I have more than 10 years of experience in Java/Python/C/C++ language and developed 100+ big/small projects using these languages.
I can complete your project in time with high quality and will try to make more creative utilities to your project.
Thank you!