Performance Optimizations for Frequency Lists
TT5 -> TT8 performance difference
I conducted a performance test on one of the remotely accesible computers of the University of Heidelberg. (2 physical/4 logical cpus and 2 GB Ram)
The test was performed by creating a frequency list based on the helsinki corpus (9.793 KB, single text file) and then sorting it.
As you can see below, my optimization efforts seem to have paid off well.
WordSmith Tools 4.0.0.374 took about 13 seconds to create and sort a word list into:
- Alphabetical order
- Frequency order
- Alphabetical order between types with the same frequency value
TT5 (svn revision 17, binary release: 2006-11-25) required 2,51 seconds to create and 10,07 seconds to sort the list into:
TT8 (svn revision 66) needed 1,10~ seconds to create and 2,40~ seconds to sort the list into:
Performance Comparison Table
WS4 0.0.374 | TT5 SVN17 | TT8 SVN62 | TT8.1 SVN64 | TT8.2 SVN66 | |
Create | 11? | 2,51~ | 1,33~ | 1,30~ | 1,10~ |
Sort | 2? | 10,07~ | 2,40~ | 2,40~ | 2,40~ |
Total | 13~ | 12,58~ | 3,73~ | 3,70 | 3,50~ |
Please note that due to the restrictions which applied to my user account, I had no permission to create native images of TT5 or TT8 on the test computer. Both tests were jitted. This was not the case in a previous performance test post where everything was native. TT7 did not perform any alphabetical subsorting between types with the same frequency value.
A Word on Terminology
TT5 and TT7, TT8 and a future TTx are all short names for source code revisions in the SVN repository. Tenka Text is still in the pre-alpha development stages.
Comments