Performance Optimizations for Frequency Lists


TT5 -> TT8 performance difference

I conducted a performance test on one of the remotely accesible computers of the University of Heidelberg. (2 physical/4 logical cpus and 2 GB Ram)

The test was performed by creating a frequency list based on the helsinki corpus (9.793 KB, single text file) and then sorting it.

As you can see below, my optimization efforts seem to have paid off well.

WordSmith Tools 4.0.0.374 took about 13 seconds to create and sort a word list into:
  • Alphabetical order
  • Frequency order
    • Alphabetical order between types with the same frequency value

TT5 (svn revision 17, binary release: 2006-11-25)
required 2,51 seconds to create and 10,07 seconds to sort the list into:

TT8 (svn revision 66) needed 1,10~ seconds to create and 2,40~ seconds to sort the list into:

Performance Comparison Table



WS4
0.0.374
TT5
SVN17
TT8
SVN62
TT8.1
SVN64
TT8.2
SVN66
Create11?2,51~1,33~1,30~1,10~
Sort2?10,07~2,40~2,40~2,40~
Total13~12,58~3,73~3,703,50~


Please note that due to the restrictions which applied to my user account, I had no permission to create native images of TT5 or TT8 on the test computer. Both tests were jitted. This was not the case in a previous performance test post where everything was native. TT7 did not perform any alphabetical subsorting between types with the same frequency value.

A Word on Terminology

TT5 and TT7, TT8 and a future TTx are all short names for source code revisions in the SVN repository. Tenka Text is still in the pre-alpha development stages.

Comments

Popular posts from this blog

Levenshtein Distance Algorithm: Fastest Implementation in C#

WordSmith Tools 5.0, Tenka Text in China

Mono 1.2.5 binaries for Solaris 10/x86