Perplexity in Markov N-Gram Models

While implementing the perplexity function on Markov n-gram models as it is described on page 14 of PDFJurafsky & Martin's SLP to appear, I came across some floating point overflow, underflow issues and had to come up with equations to avoid them.

Here is PDFmy solution in detail and C#its bigram implementation.

It took me a lot of whiteboarding and a few hours to figure this one out but the resulting libcorsis code is %100 foreign intellectual property free ^_^.

I am now looking for ways to analyze distributions graphically and testing different encapsulations of probability values and their interaction with the public API.

Comments

Popular posts from this blog

Levenshtein Distance Algorithm: Fastest Implementation in C#

WordSmith Tools 5.0, Tenka Text in China

Mono 1.2.5 binaries for Solaris 10/x86