Posts

Showing posts from June 3, 2007

Multi-threading, Part of Speech, Matoed 2005

Image
I am busy with my studies and cannot spend much time on development for the time being. All little what I do is test some ideas and the design of the class library. One of the tests I conducted was to see how much of a performance gain my single-file frequency listing routines might achieve on multi-core processors with multi-threading. So I decided to write some methods that help split a 140MB file into segments which can be processed on individual threads and added together once all threads return. You can see the code here . Below is the result: creating a frequency list for a 140~MB-large file utilizing four threads on a quad-core machine can be 83% faster than single-threading the same operation: You should expect to see this discovery taken into account in the next release. Another major change I'm planning for the next release is making the Segmenter, Clusterer and FrequencyList classes generic so that ' print ' as verb can be analyzed as a separate entity from