## Videos: Lectures on n-grams and Language Modelling

## Gist of what details videos host

D. Refaeli11 months ago (edited)

This could get a bit confusing, so I wrote the summary of what I understood from this chapter on LM (Language Modeling) in broad scope:

- P(Sentence) = We want to model a probability of a sentence to occur.
- Markov – in order to simplify our model, we assume only k (= 0, 1, 2, … or n) previous words affect the probability of any word
- nGram – this leads to Unigram (k=0 previous words), Bigram (k=1 previous word), Trigram (2 previous words), …. nGram models
- MLE (maximum likelihood estimator) – We can compute the estimators for the probability from our (training) data
- Zero’s – But what if in the testing data we encounter a uni/bi/tri/n/gram that didn’t exist in the training data? It will zero our probabilities…
- Solution: Smoothing – we can use smoothing to fix the zero problem

6.1. Unigram Smoothing – Use the unknown words method for un-encountered unigrams, this requires another held-out/development data

6.2. Bigram/Trigram/N-gram smoothing – the simplistic method is add-1 smoothing, and there’s also a variant of it with add-k, or add prior smoothing

6.3. Stupid Back-off: if 0 for trigram – go to bigram, if 0 probability for bigram – go to unigram, etc. Used for very large corpus (i.e. training data)

6.4. Interpolation – instead of backing off, use a linear combination of all the different n-grams (i.e. λ1*P(trigram)+ λ2*P(bigram) +…) For this you need to compute ngrams probabilities on the training data, and the λ’s on the held-out/development data.

6.5 Good Turing – an advanced smoothing method

6.6 Absolute Discount – the Good Turing can be simplified to just a discount number; This method can be combined with the Interpolation smoothing

6.7 Kneser-Ney Smoothing – Takes the Absolute Discount combined with the Interpolation method – but instead of using a unigram probability, it uses the continuation probability (i.e. how likely a word is a continuation of any word)

Show less