I have a keen interest on the intersection of software and language, so I’ve been collecting these resources about speech recognition and natural language processing for your enjoyment too.

Hidden markov model (HMM)

Lemma: same stem, part of speech rough word sense
Token vs Type
N = number of tokens
V = vocabulary = set of types
|V| is the size of the vocabulary

