Formants and their extraction

N.B. I wrote this article in the process of participating in Project Spectra. We originally envisioned including resonance-related exercises at the time, but due to limitations on our access to professional speech therapist input, knowledge, and implementation we only managed to finish pitch-based exercises.

What are formants?

A formant is a concentration of acoustic energy around a particular frequency in the speech wave. There are several formants, each at a different frequency, roughly one in each 1000Hz band for average men. The corresponding range for average women is one formant every 1100Hz. The true range depends on the actual length of the vocal tract. Each formant corresponds to a resonance mode of the vocal tract.

Seen this way, the sound spectra look like mountain landscapes and the formants appear as peaks, a metaphor that is often used for formants. [12]

The frequency of the first formant is mostly determined by the height of the tongue body:

high F1 = low vowel (i.e., high frequency F1 = low tongue body)
low F1 = high vowel (i.e., low frequency F1 = high tongue body)

The frequency of the second formant is mostly determined by the frontness/backness of the tongue body:

high F2 = front vowel
low F2 = back vowel

https://web.archive.org/web/20190213064736/https://home.cc.umanitoba.ca/~krussll/phonetics/acoustic/formants.html

F3: The lower of the formant frequency, the rounder shape of the lip e.g. /U/, /uù/, but F3 is not as frequently used as F1 and F2.

Since the 1950s, and even before, attempts have been made to visualize the vowel space by plotting at least F1 and F2 after collecting large corpuses of data. (Keywords to search for: vowel chart, vowel diagram) One such corpus with 1520 samples of American English[1] was collected by Peterson & Barney in 1952. This particular dataset can be found in various open source libraries such as Praat (http://web.archive.org/web/20190831081419/https://raw.githubusercontent.com/praat/praat/master/dwtools/Table_extensions.cpp) or packages in CRAN.

…, men, women, and children have vocal tracts of markedly different sizes, so that naturally their formants are different. Yet we identify a child’s vowels correctly in spite of this. An /i/ said by a man and an /i/ said by a woman are felt to be “the same sound” and are equated, as far as phonetic quality goes, by the phonetician. On the usual F1/F2 plot they have quite different positions, but on an “articulatory” vowel-diagram they have the same position. Peterson has suggested that a more realistic acoustic diagram is achieved by plotting the ratio of F1 to F3 along the vertical axis and the ratio of F2 to F3 along the horizontal axis, all values being expressed in mels. Then men’s, women’s and children’s “same” vowels are claimed to come out with approximately the same positions.[2]

First foray into Haskell.

Defining the combination formula (nCr) recursively.

1
2
3
4
5
Prelude> let ncr n k | k == 0 = 1 | n == k = 1 | otherwise = ncr (n-1) k + ncr (n-1) (k-1)
Prelude> ncr 3 2
3
Prelude> ncr 15 4
1365

This works because http://www.cs.nott.ac.uk/~vxc/g51mcs/ch05_combinatorics.pdf , page 9.

How I understand RSA

wpid-wp-1402514522645-1024x735.jpeg Euler’s totient function is defined as the number of positive integers relatively prime to n (including 1). E.g. φ(12) = 4 ( 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12), and φ(15) = 8. http://www.thescienceforum.com/mathematics/14111-modular-multiplicative-inverse-context-rsa.html https://en.wikibooks.org/wiki/Algorithm_Implementation/Mathematics/Extended_Euclidean_algorithm https://docs.google.com/viewer?url=www.math.utah.edu/~fguevara/ACCESS2013/Euclid.pdf as to why the extended Euclid’s algo can be used to find the modular multiplicative inverse wpid-wp-1402514649564-1024x767.jpeg wpid-wp-1402514730357-1024x491.jpeg https://docs.google.com/viewer?url=ftp://ftp.rsasecurity.com/pub/pkcs/pkcs-1/pkcs-1v2-1.pdf https://docs.google.com/viewer?url=ftp://ftp.rsasecurity.com/pub/rsalabs/rsa_algorithm/rsa-oaep_spec.pdf In particular, ^ page 9. OAEP is the padding scheme ( http://crypto.stackexchange.com/questions/10145/rsa-pcks1-v2-1-rsaes-oaep-algorithm http://crypto.stackexchange.com/questions/2074/rsa-oaep-input-parameters ) , whereas I2OSP and OS2IP (on page 4); what really helped things come full circle for me is realizing how they represent arbitrary data as an integer (first converting it to an octet string). Without further do, let’s test our generated keys by encrypting and decrypting the number 521 (any number smaller than 527, our modulus, will do. ( http://stackoverflow.com/questions/10061626/message-length-restriction-in-rsa ) )

1
2
3
4
5
6
7
8
$ python
Python 2.7.5+ (default, Feb 27 2014, 19:37:08)
[GCC 4.8.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> print 521**41 % 527
346
>>> print 346**281 % 527
521

Note that if we try with a larger number…

1
2
3
4
>>> print 1000**41 % 527
411
>>> print 411**281 % 527
473

Nope. What really helped things come full circle once again (or full sphere..) http://en.wikipedia.org/wiki/Pretty_Good_Privacy#Confidentiality http://superuser.com/questions/383732/how-does-ssh-encryption-work It makes more sense to use a symmetric encryption algorithm with high throughput to encrypt the data first, then use PKI to encrypt and transfer the key. And that is how the world works.

The Strong Induction Axiom

These are notes taken from the Lec 3 | MIT 6.042J Mathematics for Computer Science, Fall 2010 video lecture, which can be found at https://www.youtube.com/watch?v=NuGDkmwEObM#t=4658

What exactly does the Strong in Strong Induction mean? Let $P(n)$ be any predicate. if $P(0)$ is true then $\forall n (P(0) \wedge P(1) \wedge \ldots \wedge P(n)) \Rightarrow P(n+1)$ is true, then $\forall n : P(n)$ is true. Remember the normal induction axiom? Let $P(n)$ be a predicate. If $P(0)$ is True and $\forall n \in \mathbb{N} : (P(n) \Rightarrow P(n+1))$ is True, then $\forall n \in \mathbb{N}$ $P(n)$ is True. The difference is that you’re assuming that all the cases starting from the base case til $n$ are true, in order to prove that $P(n+1)$ is true. The Unstacking game is demonstrated: here’s a brief recap (no, actually, this is just another LaTex exercise for me)

The score one would get is (4*4)+(2*2)+(3*1)+(1*1)+(1*1)+(2*1)+(1*1) = 28. There must be a strategy to beat this, is there?

(7*1)+(6*1)+(5*1)+(4*1)+(3*1)+(2*1)+(1*1) = 28. Now one should immediately jump to the conclusion (if one is an MIT student) that all combinations lead to a score of 28. But one is not if one is reading this, so: Theorem: All strategies $S(n)$ for the Unstacking Game with $n$ starting blocks lead to the same score. Proof by strong induction: Inductive Hypothesis: $P(n)$ is the Theorem. Base Case: $S(1)$ is zero (One can’t play the game with 0 blocks). Inductive step: Assume $P(1), P(2), \ldots, P(n)$ in order to prove $P(n+1)$. With $(n+1)$ blocks, for $1 \leq k \leq n$:

Our score for $P(n+1)$ is $k(n+1-k)+P(k)+P(n+1-k)$ (think recursion, after each division of blocks each block is an instance of the Game). One has hit a dead end here, as one is trying to prove that $P(n+1)$ is dependent on $n$, not $k$. Honestly this example (IMHO) is rather haphazard, as the next step, which not so intuitively, is to make a good guess as to what $S(n)$ might be, and in classic MIT intuition style, our first guess yields $\frac{n(n-1)}{2}$. Testing a few values of $n$.. $S(2) = 1$, $S(3) = 3$, $S(4) = 6$, $S(8) = 28$. Let’s put $P(n)$ through its runs again…. Base case, where $P(n) = \frac{n(n-1)}{2}$: $P(1) = 0$. Going back to the previous expression which we derived for the score, now that we have an expression for $P(n)$: $k(n+1-k) + \frac{k(k-1)}{2} + \frac{(n+1-k)(n-k)}{2}$ Simplifying this expression (aka plugging it into WolframAlpha…) results in $\frac {n(n+1)}{2}$, which is $S(n+1)$. In the words of the professor, “The $k$ disappears.“ We have worked backwards (to my understanding), deriving an expression for the score, reaching a dead end, where we have to stop assuming ($k$), come up with an expression for $P(n)$, and filling in the blank to prove $P$(or $S$)$(n+1)$.

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×