Levenshtein Distance Algorithm: Fastest Implementation in C#

While reading some interesting stuff about minimum edit distances in preparation for today's lecture (ECL/ICL), which is just about 45 minutes ahead in time as I'm writing this, I had the chance to test 5 different implementations of the Levenshtein minimum edit distance algorithm.

Here is a screenshot first:

I'll get into details later but let me announce the winner!

And the winner is ... gLDp!

gLDp is a funny display name for a levenshtein implementation from a C project.

original implementation in C:C levenshtein.c
my C# port:C# libcorsis code




C vs CIL vs C#

Now I want to get mercilessly picky with my own port and today's C# compilers.

The ternary conditional expressions in my C# port (lines: #516, #524, #533) are there to circumvent the following restriction:

// valid C
int x = 0;
int y = 0;
int z = 0;
z += x == y;

x, y and z are initialized 0 and in the final line z gets incremented by 1.

This is valid code in C but causes a compile-time error in C#: C# does not let you evaluate boolean expressions as integers (false 0, true 1). So to circumvent this type safety restriction in syntax, I use ternary conditional expressions.

// valid C#
int x = 0;
int y = 0;
int z = 0;
z += x == y ? 1 : 0;

Depending on whether C# compilers special-case this situation, these TCEs may or may not mean a very slight performance overhead.

On the other hand, the native typed assembly language of .NET: CIL treats boolean values on stack as integers. I'll add some CIL code to illustrate this later. For libcorsis, I am eventually going to switch to such an assembly version.

So what have I learned today?

  1. C# is a true C descendant.
  2. If you can't C# a piece of C, you can always CIL it.

Comments

Popular posts from this blog

Levenshtein Distance Algorithm: Fastest Implementation in C#

WordSmith Tools 5.0, Tenka Text in China

Mono 1.2.5 binaries for Solaris 10/x86