Reflection Emit

I've been trying to integrate reflection emit into Tenka Text recently and that's how far I have come in code: Builder.cs.

You can use reflection emit to compile and build types at runtime. A pretty amazing way of using this technology is having an abstract class whose implementation you provide at runtime.



Tenka Text is going to use reflection emit to provide custom segmentation. Using reflection emit instead of going for a simpler approach has numerous distinguishing advantages.

I quote my friend Mike Scott from the documentation of his WordSmith Tools 4 here:

[...] you may wish to allow certain additional characters within a word. For example, in English, the apostrophe in father's is best included as a valid character as it will allow processing to deal with the whole word instead of cutting it off short. (If you change language to French you might not want apostrophes to be counted as acceptable mid-word characters.)

Examples:

' (only apostrophes allowed in the middle of a word)

'% (both apostrophes and percent symbols allowed in the middle of a word)

'_ (both apostrophes and underscore characters allowed in the middle of a word)

You can include up to 10.

If you want to allow fathers' too, check the allow to end of word box. If this is checked, any of these symbols will be allowed at either end of a word as long as the character isn't all by itself (as in " ' ").

The italic part of this quote makes the inherent limitation of his programmatically simple implementation in WordSmith Tools 4 clear. It is an array-based approach and thus as the size of the array of custom characters increases so does the time spent in loops - exponentially.

To overcome this limitation Tenka Text will emit IL assembly code compiled at runtime according to the settings the user has specified. Some of the tests I performed last week with two character sets of different sizes confirmed that the runtime compilation approach is the way to go. I will publish more on the performance gap between the two approaches later.

Comments

Popular posts from this blog

Levenshtein Distance Algorithm: Fastest Implementation in C#

Mono 1.2.5 binaries for Solaris 10/x86

WordSmith Tools 5.0, Tenka Text in China