Home > Preview
The flashcards below were created by user
on FreezingBlue Flashcards.
What are the levels of speech?
- Linguistic for speaker
- Physiological for speaker
- Acoustic- sound waves
- Physiological for listener
- Linguistic for listener
Which level is the easiest to study?
Linear source-filter theory
- expresses articulatory-acoustic relationships
- *one of the most important/best theories in our field
What is involved in speech production?
- need a power source (breath support)
- we get a complex periodic signal from the vocal folds (the vibrate in 3 different ways and come together
- speech is changed by changing the shapes of your cavities
What is the source of sound for speech?
- vocal folds (vibration)
- *for some consonants, the source is more complex (can be in the vocal tract or a combination of both- voiceless sounds)
What makes you sound like you?
shapes of pharynx, nasal, and oral/mouth cavities
What is the filter for speech?
- vocal tract (frequency dependent like all filters)
- resonator (air filled cavity)
What does the resonator do for you?
- natural frequencies change in resonator (ear does everything for you so you can perceive differences)
- 3-6 syllables per second
How are the source and filter related?
- they are assumed to be independent of each other (an assumption made for convenience)
- this implies that you can change the output of the vocal folds without changing the vocal tract and vice-versa
What do the vocal folds and vocal tract give you?
- vocal folds- fundamental frequency, harmonics, and amplitude changes
- vocal tract- articulation
How are vowels modeled?
as a tube closed at one end and open at the other
What is the formula to calculate where the resonant frequencies will be?
- Fn = (2n-1)c/41
- Fn = resonant frequency
- n = integer (if looking for 1 you put 1, if looking for 2 you put 2, etc)
- c = speed of sound/4 times the length of tube
What is the first resonant frequency with a tube length of 17 cm and speed of sound is 34,000 cm/s?
- Fn = (2n-1)c/41
- (2-1)*34000/(4*17) = 500 Hz
- *the longer the tube the lower the resonant frequencies, the shorter the tube the higher the resonant frequencies
How many resonances are there for a tube?
- we only need to consider the first 3 or 4 (the model is valid to only about 5 kHz)
What happens when the shape of the tube changes going from one vowel to another?
resonant frequencies change
Why doesn't changing the frequency/energy of the source of vibration change the resonant frequencies of the pipe/vocal tract?
the source and filter are independent of one another
What are formant frequencies?
- resonant frequencies of vowels
- *do NOT confuse with fundamental frequency!
How do a curved tube (vocal tract) and a straight tube (model) behave out to 5 kHz?
- indentically acoustically
- the curve begins to affect acoustic signals with a short wavelength
What happens if the tube has uniform cross sectional area?
the resonances are equally spaced
Does all of the energy come from the source or filter?
- vocal fold vibration for vowels
What does changing the length of the tube do?
- changes the resonance frequencies
- influenced by age and sex
- l = 14.5 cm for females
- l = 8.75 cm for children
What does every formant/resonant/natural frequency have?
its own frequency, amplitude, and bandwidth
How are different vowels modeled?
acoustically by different vocal tract shapes
Phonetically, how are vowels distinguished?
position of the tongue
What happens if a constriction is placed on the tube/vocal tract?
the resonances change
What happens if you change the articulation?
you change the vocal tract shape, and the resonance frequencies, amplitudes, and bandwidths
The output energy of a vowel is the product of:
- the source energy
- the size and shape of the resonator
- the radiation characteristics (adds 6 dB)
- increases in frequency by 6 dB + 6 dB (constant)- output is actually -6 dB
What are glottal source characteristics for vowels?
- vocal fold vibration is periodic
- fo or F0 is used to indicate the vocal fundamental frequency
- the amplitude of the harmonics decreases by -12 dB/octave
What gives you amplitude changes?
- only changing source and not filter makes resonant frequencies stay the same
What are filter characteristics for vowels?
- the vocal tract is a dynamic filter (changes constantly)
- it is frequency dependent
- it has, theoretically, an infinite number of resonances (only care about 1st 3 or 4 for vowels)
- each resonance has a center frequency, and amplitude and a bandwidth
- for speech, these resonances are called formants
- formants are numbered in succession from the lowest (F1, F2, F3, etc)
- the formants together form the transfer function (input-output relationship; formants become physically evident only when energized)
Which harmonic has the highest amplitude?
the one closest to the vowel
What is radiation characteristic?
- acoustic effect when a sound leaves a small area and enters a large one (like speaker)
- the effect is to raise the slope of the spectrum by +6 dB/octave
*What are the acoustic phonetic relationships for vowels?
- F1 is inversely related to tongue height (raise tongue, low F1 and vice versa)
- F2 is directly related to tongue advancement (back vowels have low F2, front vowels have high F2)
- lip rounding lowers all formant frequencies (because you're making the vocal tract longer)
- you can calculate how close a person is to the sound they are trying to make
What does perturbation mean?
What is the perturbation theory?
- volume velocity variations reflect the way air particles vibrate at a particular point in the vocal tract (how the air is passing through vocal folds)
- at some points, vibration is minimal (node); at others, maximal (antinodes)
- for F1, the antinode is at the open end of the tube (mouth) and the node is at the closed end (vocal folds)
- for F2, there are 2 antinodes and 2 nodes, etc
Where is there always an antinode?
Where is there always a node?
What happens when there is a constriction near a node?
formant frequency will increase
What happens when there is a constriction near an antinode?
formant frequency will decrease
Perturbation theory, if a change in cross sectional area is applied (a perturbation):
- the acoustic effect depends on proximity to a node or an antinode (antinode = lower freq.; node = higher freq.)
- lip constrictions lower all formant frequencies
- laryngeal constrictions raise all formant frequencies
What do amplitudes depend on?
If F1 is lowered (raised), what happens to A1?
it lowers (rises)
If 2 formant frequencies move closer together:
both peaks increase in amplitude
How do you raise or lower formant frequencies?
change articulators (3-6 syllables per second)
What are source-filter interactions?
- independent of one another
- BUT some vocal tract shapes may affect vocal fold vibration:
- singers' formant (to be heard over background noise)
- high impendance constrictions require greater subglottal air pressure
- vocal tract - vocal fold coupling during open phase of vibratory cycle
What can the linear source-filter theory be used to describe?
the acoustics of consonants as well as vowels
Why, for consonants, is the source not always at the level of the vocal folds?
- some sources are in the vocal tract
- these sources are aperiodic
- durations and amplitudes also are different from vowels
What does the source-filter theory give us?
a series of expectations for the acoustic characteristics for consonants
How are fricatives modeled?
as a tube with a very severe constriction
What are characteristics of fricatives?
- the air exiting the constriction is turbulent
- zeros or antiformants can be found in the spectrum
- because of the turbulence, there is no periodicity unless accompanied by voicing
What are characteristics of nasal consonants?
- velopharyngeal port is open and the oral cavity is completely blocked at some point
- the side-branch resonator produces antiformants (zeros)
- the overall vocal tract is longer than for vowels
- oral formants, nasal formants, nasal antiformants
- nasal murmur
What are characteristics of stops?
- the tube model is not altered very much
- time domain is critical
- there is a complete closure of the vocal tract somewhere
- pressure builds up behind the closure
- rapid release
- articulation results in a burst and transitions
What does analog mean?
storing ALL the information on a wave
What does digital mean?
samples at specific times along wave at each frequency and takes few points and stores the information (connects the dots for you and doesn't record amplitudes)
What is a spectrograph?
- an instrument the can capture the dynamics of speech
- acoustic signals vary only in frequency, amplitude and time; the sound spectrograph captures all of these
What is a spectrogram?
the output (usually a hardcopy) of a spectrograph
What is a wide-band filter good for?
looking at formant frequencies
What is a narrow-band filter good for?
looking at harmonics and fundamental frequency
What do black areas of a spectrogram indicate?
What do white areas on a spectrogram indicate?
the noise floor
What do shades of gray in a spectrogram indicate?
- amplitudes between highest amplitude and noise floor
- the more intense the signal is at a particular frequency and time, the darker the trace
What is the Nyquist theorum?
- in order to represent a signal faithfully, it must be sampled at a rate equal to twice its highest frequency
- if you don't pick the right sampling rate, you don't get accurate output (if you get the wrong output, all your measurements are wrong)
What is presampling or brickwall filtering?
- removes all of the energy above the nyquist frequency
- the clinician/researcher determines the Nyquist frequency
- some knowledge of speech and speech and language disorders is required
What is aliasing?
- when the output doesn't match the input
- when you don't follow Nyquist rule
What are discrete numbers?
dots along wave (not continuous measurement)
What is sampling rate?
how many times you take a discrete number
What is sampling?
how many times per second the amplitude will be recorded
What does sampling for digital signal processing do?
- analog-to-digital conversion
- signal must be sampled at the Nyquist rate
- sampling rate decides the times at which the signal will be sampled
- sampling converts the acoustic signal into a series of numbers
- instead of amplitudes at all instances of time, no matter how small the time interval, amplitudes in the digital world exist only at the sampling interval
What happens to the samples determined by the sample rate?
they are chopped into discrete numbers (converting amplitude variations into discrete numbers)
What is quantization?
- discrete number of amplitude levels
- the more quantizer levels available, the more the discrete signal represents the original analog signal (higher the rate, smaller the interval)
- in our applications, 16 -bit quantizers over a 20-volt range are typical (this yields an amplitude resolution of 300 microvolts and a signal to noise ratio of 96 dB)
What happens after A/D (analog to digital) conversion?
- the signal is stored as a stream of numbers
- time is related by the index to the sampling rate
- the amplitude is the stored number (quantization process)
- in this form, many operations can be performed (you can do anything you want)
What is involved in a waveform display?
- duration measurements (speech changes gradually)
- signal editing
- amplitude measurements (rms is most common)
- vocal fundamental frequency
- *some consistent rules need to be adopted for duration and signal editing
What is a digital spectrograph?
a series of spectra based on the FFT (fast Fourier analysis) or LPC (linear predictive coding)
How is amplitude depicted in a digital spectrograph?
as shades of gray
What is an example of a digital spectrograph?
- does the work for us
What is linear predictive coding (LPC)?
- you can predict where the next dot (amplitude) will be based on previous cycles (as few as 10 to 15 previous samples is all that is required)
- speech does not generally vary wildly from sample to sample (highly predictable)
What is the equation for LPC?
- y = a0 + a1(x-1) + a2 (x-2)+....
- y = amplitude of the next sample
- x = one of the previous samples
- a = estimates of the resonances of vocal tract (can represent sections of vocal tract)
- allows you to talk on the phone (can guess what speech will be so it only has to transfer so many numbers)
- individuals with voice/hearing problems have problems with being understood on the phone
What is a wideband spectrogram?
- short time window (.005, .007, .009)
- good for measuring formant frequencies (of vowels)
What is a narrowband spectrogram?
- long time window (.1, .05)
- good for showing and measuring harmonics