Perplexity in Markov N-Gram Models
While implementing the perplexity function on Markov n-gram models as it is described on page 14 of Jurafsky & Martin's SLP to appear , I came across some floating point overflow, underflow issues and had to come up with equations to avoid them. Here is my solution in detail and its bigram implementation . It took me a lot of whiteboarding and a few hours to figure this one out but the resulting libcorsis code is %100 foreign intellectual property free ^_^. I am now looking for ways to analyze distributions graphically and testing different encapsulations of probability values and their interaction with the public API.