Posts

Showing posts from January 28, 2007

Performance Optimizations for Frequency Lists

Image
TT5 -> TT8 performance difference I conducted a performance test on one of the remotely accesible computers of the University of Heidelberg. (2 physical/4 logical cpus and 2 GB Ram) The test was performed by creating a frequency list based on the helsinki corpus (9.793 KB, single text file) and then sorting it. As you can see below, my optimization efforts seem to have paid off well. WordSmith Tools 4.0.0.374 took about 13 seconds to create and sort a word list into: Alphabetical order Frequency order Alphabetical order between types with the same frequency value TT5 (svn revision 17, binary release: 2006-11-25) required 2,51 seconds to create and 10,07 seconds to sort the list into: Alphabetical order Frequency order TT8 (svn revision 66) needed 1,10~ seconds to create and 2,40~ seconds to sort the list into: Alphabetical order Frequency order Alphabetical order between types with the same frequency value * Performance Comparison Table WS4 0.0.374 TT5 SVN17 TT8 SVN62 TT8.1 S

演歌 - Enka

While investing some time in improving my new frequency list implementation, I also started to watch some new Japanese Dramas from Winter 2007 : Erai Tokoro ni Totsuide Shimatta! Hana Yori Dango 2 Enka* no Joou I just fell in love with the songs in Enka no Joou and decided to cut and retime subs of one the enkas in it. Here you can download this wonderful song. Two great open-source programs helped me do the work in under 2 minutes: VirtualDub and Aegisub . Both are extremely easy to use!