Formants and their extraction

N.B. I wrote this article in the process of participating in Project Spectra. We originally envisioned including resonance-related exercises at the time, but due to limitations on our access to professional speech therapist input, knowledge, and implementation we only managed to finish pitch-based exercises.

What are formants?

A formant is a concentration of acoustic energy around a particular frequency in the speech wave. There are several formants, each at a different frequency, roughly one in each 1000Hz band for average men. The corresponding range for average women is one formant every 1100Hz. The true range depends on the actual length of the vocal tract. Each formant corresponds to a resonance mode of the vocal tract.

Seen this way, the sound spectra look like mountain landscapes and the formants appear as peaks, a metaphor that is often used for formants. [12]

The frequency of the first formant is mostly determined by the height of the tongue body:

high F1 = low vowel (i.e., high frequency F1 = low tongue body)
low F1 = high vowel (i.e., low frequency F1 = high tongue body)

The frequency of the second formant is mostly determined by the frontness/backness of the tongue body:

high F2 = front vowel
low F2 = back vowel

https://web.archive.org/web/20190213064736/https://home.cc.umanitoba.ca/~krussll/phonetics/acoustic/formants.html

F3: The lower of the formant frequency, the rounder shape of the lip e.g. /U/, /uù/, but F3 is not as frequently used as F1 and F2.

Since the 1950s, and even before, attempts have been made to visualize the vowel space by plotting at least F1 and F2 after collecting large corpuses of data. (Keywords to search for: vowel chart, vowel diagram) One such corpus with 1520 samples of American English[1] was collected by Peterson & Barney in 1952. This particular dataset can be found in various open source libraries such as Praat (http://web.archive.org/web/20190831081419/https://raw.githubusercontent.com/praat/praat/master/dwtools/Table_extensions.cpp) or packages in CRAN.

…, men, women, and children have vocal tracts of markedly different sizes, so that naturally their formants are different. Yet we identify a child’s vowels correctly in spite of this. An /i/ said by a man and an /i/ said by a woman are felt to be “the same sound” and are equated, as far as phonetic quality goes, by the phonetician. On the usual F1/F2 plot they have quite different positions, but on an “articulatory” vowel-diagram they have the same position. Peterson has suggested that a more realistic acoustic diagram is achieved by plotting the ratio of F1 to F3 along the vertical axis and the ratio of F2 to F3 along the horizontal axis, all values being expressed in mels. Then men’s, women’s and children’s “same” vowels are claimed to come out with approximately the same positions.[2]

Hosting a static site quickly as a Tor hidden service with docker-compose

This sample bakes a private key into the resulting docker image that contains the Tor daemon. The only thing you need to edit are args and volumes in docker-compose.yml.

docker-compose.yml:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
version: "3"
services:
hidden_service:
# we want to pass in the details of our hs AT BUILD TIME..
build:
context: .
dockerfile: Dockerfile.hidden_service
args:
TARGET_PORT: 8123
ONION_HOSTNAME: abcdefghijklmnop.onion
ONION_PRIVATE_KEY: -----BEGIN RSA PRIVATE KEY-----\nMIIC[REDACTED]\n...\n...\n-----END RSA PRIVATE KEY-----
restart: always
web_host:
image: nginx:alpine
volumes:
- "~/my/static_site:/usr/share/nginx/html"
ports:
- "8123:80"
restart: always

ONION_PRIVATE_KEY is what belongs in /var/lib/tor/hidden_service/private_key, ONION_HOSTNAME is what belongs in /var/lib/tor/hidden_service/hostname.

Dockerfile.hidden_service

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
FROM alpine:latest
ARG TARGET_PORT
ARG ONION_HOSTNAME
ARG ONION_PRIVATE_KEY
RUN apk update && apk add bind-tools && apk add curl && apk add \
tor \
--update-cache --repository http://dl-3.alpinelinux.org/alpine/edge/testing/ \
&& rm -rf /var/cache/apk/*
EXPOSE 9050
RUN mkdir -p /etc/tor
RUN chown -R tor /etc/tor
RUN echo $'HiddenServiceDir /var/lib/tor/hidden_service \n\
HiddenServicePort 80 web_host:80' > /etc/tor/torrc
run mkdir -p /var/lib/tor/hidden_service
run chmod 700 /var/lib/tor/hidden_service
RUN echo -e $ONION_PRIVATE_KEY > /var/lib/tor/hidden_service/private_key
# RUN cat /var/lib/tor/hidden_service/private_key
RUN chmod 600 /var/lib/tor/hidden_service/private_key
RUN echo ${ONION_HOSTNAME} > /var/lib/tor/hidden_service/hostname
run chown -R tor /var/lib/tor/hidden_service
USER tor
ENTRYPOINT [ "tor" ]
CMD [ "-f", "/etc/tor/torrc" ]

Dockerfile.web_host

1
FROM nginx:alpine

Copy these three into a folder, then do docker-compose up from within said folder.

nginx proxy_pass based on domain

Nginx supports multiple server blocks listening on the same port; this is how Virtual Hosts work; thus we simply proxy_pass virtual hosts to our desired target.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
server {
listen 80;
server_name soundcloud.com;
location / {
proxy_pass http://soundcloud.com;
}
}
server {
listen 80;
server_name api.soundcloud.com;
location / {
proxy_pass http://api.soundcloud.com;
}
}

https://stackoverflow.com/questions/21064401/route-different-proxy-based-on-subdomain-request-in-nginx

Moving House

Previously I had this blog hosted on RedHat’s OpenShift PaaS using WordPress. I haven’t been blogging frequently for about 2 years, and I missed RedHat’s memo about the migration because I registered using an obscure email from a shell account without webmail. https://blog.openshift.com/migrate-to-v3-v2-eol/ Lo and behold my blog has vanished, although thankfully I have the entire SQL dump (and most of the crucial media) from the beginning of this year. To minimize the chance of this happening ever again (and to minimize the effort of getting my site up and running again, knock on wood) SSGs seem like the best solution since they have the content in a portable, versionable format, although I would have preferred a Ghost blog.

Hexo caught my eye over other static site generators due to:

  • Complete compatibility with Octopress plugins
  • hexo server -d watches the filesystem and automatically generates new content upon reload
  • Prebuilt themes like the one I’m using now have the basics of what a nerdy blog needs OOTB:
    • $\textbf{MathJax}$
    • Instant support with an npm install hexo-tag-fontawesome
    • SEO basics like a configurable sitemap and newfangled generator plugins for stuff like Google AMP which will close the gap between CMSes
  • One line deploy to Git, (S)FTP, among other targets

nVidia CUDA samples on Ubuntu 16.04 LTS

Here they are in case anyone else needs to download them separately (and the rest of cuda-repo-ubuntu1504-7-5-local_7.5-18_amd64.deb):

https://drive.google.com/open?id=0B_SnrcTvZzIXX2dkM0pwT2E3U2s
https://mega.nz/#F!dVBghK7J!6nvh-XvvoiqqeGp144jouw

The file you’re looking for is var/cuda-repo-7-5-local/cuda-samples-7-5_7.5-18_amd64.deb

To extract and compile the samples (make sure you have your nVidia GPU active if you’re using Optimus, e.g. by using

sudo prime-switch nvidia

or otherwise;

nvidia-smi

should show your GPU’s details.

1
2
3
4
ar x cuda-samples-7-5_7.5-18_amd64.deb
tar -xf data.tar.gz
cd /usr/local/cuda-7.5/samples/5_Simulations/smokeParticles/
CUDA_PATH=/usr CUDA_SEARCH_PATH=/usr/lib/x86_64-linux-gnu/ make -j5

First Foray into MIPS Assembly

Task: Print Hello World 10 times.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
.data
hello_str: .asciiz "Hello World!n"
.text
.globl main
main:
subu $sp, $sp, 4 # create a word on the stack
sw $ra, 4($sp) # store the return address
# put main function code here
li $t0, 10 #the number at which we want to end our loop.
li $t1, 0 #start counting from 0; we are going to increment this counter 10 times.
li $v0, 4 # set $v0 to print_string; http://courses.missouristate.edu/kenvollmar/mars/Help/SyscallHelp.html
la $a0, hello_str # load the string
loop:
beq $t1, $t0, end # if t1 == 10 we are done
syscall # execute the function described by
addi $t1, $t1, 1 # add 1 to t1
j loop # jump back to the top
end:
li $v0, 10
syscall

recorded_compressed

Concatenating FLV files with ffmpeg

First foray into Haskell.

Defining the combination formula (nCr) recursively.

1
2
3
4
5
Prelude> let ncr n k | k == 0 = 1 | n == k = 1 | otherwise = ncr (n-1) k + ncr (n-1) (k-1)
Prelude> ncr 3 2
3
Prelude> ncr 15 4
1365

This works because http://www.cs.nott.ac.uk/~vxc/g51mcs/ch05_combinatorics.pdf , page 9.

How to fix popping on audio start/stop/resume with Intel HDA audio in Linux

https://wiki.archlinux.org/index.php/Alsa#Pops_when_starting_and_stopping_playback My ears were nearly wrecked by this awfulness. On a side note, the Conexant Audio CX20751/2 isn’t the best integrated sound card around. Even lowly Realteks have far better dynamic range and equalization. There is virtually no soundstage, mids are very muffled, even with headphones. It can be found on some lower end Thinkpads (e.g. S440/E440); considering that these are cheap SMB laptops it’s not surprising that they had to cut corners somewhere.

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×