Levenshtein Distance Algorithm: Fastest Implementation in C#
While reading some interesting stuff about minimum edit distances in preparation for today's lecture (ECL/ICL), which is just about 45 minutes ahead in time as I'm writing this, I had the chance to test 5 different implementations of the Levenshtein minimum edit distance algorithm.
I'll get into details later but let me announce the winner!
And the winner is ... gLDp!
gLDp is a funny display name for a levenshtein implementation from a C project.
original implementation in C: | levenshtein.c |
my C# port: | libcorsis code |
C vs CIL vs C#
Now I want to get mercilessly picky with my own port and today's C# compilers.
The ternary conditional expressions in my port (lines: #516, #524, #533) are there to circumvent the following restriction:
// valid C
int x = 0;
int y = 0;
int z = 0;
z += x == y;
x, y and z are initialized 0 and in the final line z gets incremented by 1.
This is valid code in C but causes a compile-time error in C#: C# does not let you evaluate boolean expressions as integers (false 0, true 1). So to circumvent this type safety restriction in syntax, I use ternary conditional expressions.
// valid C#
int x = 0;
int y = 0;
int z = 0;
z += x == y ? 1 : 0;
Depending on whether C# compilers special-case this situation, these TCEs may or may not mean a very slight performance overhead.
On the other hand, the native typed assembly language of .NET: CIL treats boolean values on stack as integers. I'll add some CIL code to illustrate this later. For libcorsis, I am eventually going to switch to such an assembly version.
So what have I learned today?
- C# is a true C descendant.
- If you can't C# a piece of C, you can always CIL it.
Comments