Multi-threading, Part of Speech, Matoed 2005
I am busy with my studies and cannot spend much time on development for the time being. All little what I do is test some ideas and the design of the class library.
One of the tests I conducted was to see how much of a performance gain my single-file frequency listing routines might achieve on multi-core processors with multi-threading. So I decided to write some methods that help split a 140MB file into segments which can be processed on individual threads and added together once all threads return. You can see the code here. Below is the result: creating a frequency list for a 140~MB-large file utilizing four threads on a quad-core machine can be 83% faster than single-threading the same operation:
You should expect to see this discovery taken into account in the next release.
Another major change I'm planning for the next release is making the Segmenter, Clusterer and FrequencyList classes generic so that 'print' as verb can be analyzed as a separate entity from 'print' as noun etc. Automatic part of speech recognition is not something I'm planning to implement in the near future but I want to make sure that the architecture is ready for anything beyond simple strings.
Oh, by the way I found a photo of mine on the net taken in 2005 at the election of the Mannheim Turkish Students' Association.
One of the tests I conducted was to see how much of a performance gain my single-file frequency listing routines might achieve on multi-core processors with multi-threading. So I decided to write some methods that help split a 140MB file into segments which can be processed on individual threads and added together once all threads return. You can see the code here. Below is the result: creating a frequency list for a 140~MB-large file utilizing four threads on a quad-core machine can be 83% faster than single-threading the same operation:
You should expect to see this discovery taken into account in the next release.
Another major change I'm planning for the next release is making the Segmenter, Clusterer and FrequencyList classes generic so that 'print' as verb can be analyzed as a separate entity from 'print' as noun etc. Automatic part of speech recognition is not something I'm planning to implement in the near future but I want to make sure that the architecture is ready for anything beyond simple strings.
Oh, by the way I found a photo of mine on the net taken in 2005 at the election of the Mannheim Turkish Students' Association.
Comments