DS – NLP – Language Modelling, N-Grams

Videos: Lectures on n-grams and Language Modelling

Gist of what details videos host

D. Refaeli11 months ago (edited)
This could get a bit confusing, so I wrote the summary of what I understood from this chapter on LM (Language Modeling) in broad scope:

P(Sentence) = We want to model a probability of a sentence to occur.
Markov – in order to simplify our model, we assume only k (= 0, 1, 2, … or n) previous words affect the probability of any word
nGram – this leads to Unigram (k=0 previous words), Bigram (k=1 previous word), Trigram (2 previous words), …. nGram models
MLE (maximum likelihood estimator) – We can compute the estimators for the probability from our (training) data
Zero’s – But what if in the testing data we encounter a uni/bi/tri/n/gram that didn’t exist in the training data? It will zero our probabilities…
Solution: Smoothing – we can use smoothing to fix the zero problem
6.1. Unigram Smoothing – Use the unknown words method for un-encountered unigrams, this requires another held-out/development data
6.2. Bigram/Trigram/N-gram smoothing – the simplistic method is add-1 smoothing, and there’s also a variant of it with add-k, or add prior smoothing
6.3. Stupid Back-off: if 0 for trigram – go to bigram, if 0 probability for bigram – go to unigram, etc. Used for very large corpus (i.e. training data)
6.4. Interpolation – instead of backing off, use a linear combination of all the different n-grams (i.e. λ1P(trigram)+ λ2P(bigram) +…) For this you need to compute ngrams probabilities on the training data, and the λ’s on the held-out/development data.
6.5 Good Turing – an advanced smoothing method
6.6 Absolute Discount – the Good Turing can be simplified to just a discount number; This method can be combined with the Interpolation smoothing
6.7 Kneser-Ney Smoothing – Takes the Absolute Discount combined with the Interpolation method – but instead of using a unigram probability, it uses the continuation probability (i.e. how likely a word is a continuation of any word)
Show less

S	M	T	W	T	F	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Python Data Engineer Notes

Python, Sql, Data Engineering, Data Science, Big Data Processing, Application Development, Data Analytics, Machine Learning, Airflow, Mircoservices

Menu

DS – NLP – Language Modelling, N-Grams

Videos: Lectures on n-grams and Language Modelling

Gist of what details videos host

Menu

Videos: Lectures on n-grams and Language Modelling

Gist of what details videos host

Share this: