Performance Optimizations for Frequency Lists

January 28, 2007

TT5 -> TT8 performance difference

I conducted a performance test on one of the remotely accesible computers of the University of Heidelberg. (2 physical/4 logical cpus and 2 GB Ram)

The test was performed by creating a frequency list based on the helsinki corpus (9.793 KB, single text file) and then sorting it.

As you can see below, my optimization efforts seem to have paid off well.

WordSmith Tools 4.0.0.374 took about 13 seconds to create and sort a word list into:

Alphabetical order
Frequency order
- Alphabetical order between types with the same frequency value

TT5 (svn revision 17, binary release: 2006-11-25) required 2,51 seconds to create and 10,07 seconds to sort the list into:

TT8 (svn revision 66) needed 1,10~ seconds to create and 2,40~ seconds to sort the list into:

Performance Comparison Table

	WS4 0.0.374	TT5 SVN17	TT8 SVN62	TT8.1 SVN64	TT8.2 SVN66
Create	11?	2,51~	1,33~	1,30~	1,10~
Sort	2?	10,07~	2,40~	2,40~	2,40~
Total	13~	12,58~	3,73~	3,70	3,50~

Please note that due to the restrictions which applied to my user account, I had no permission to create native images of TT5 or TT8 on the test computer. Both tests were jitted. This was not the case in a previous performance test post where everything was native. TT7 did not perform any alphabetical subsorting between types with the same frequency value.

A Word on Terminology

TT5 and TT7, TT8 and a future TTx are all short names for source code revisions in the SVN repository. Tenka Text is still in the pre-alpha development stages.

Search This Blog

CORSIS

Performance Optimizations for Frequency Lists

Comments

Popular posts from this blog

Levenshtein Distance Algorithm: Fastest Implementation in C#

Tiny F# EDSL for creating system / hardware IDs on Windows using WMI classes

Mono 1.2.5 binaries for Solaris 10/x86