Posts

New Segmenter Compiler: Benihime, 紅姫

Image
A quick update on API refactorings! Here is a snapshot of what the code examined in the last post would look like with refactorings and improvements I have made so far: Before: the code examined in the last post After: For segmenting streams into, say, words for example, one could also use something like SED on GNU/Linux, some regular expressions implementation of a programming language or whatever. So why am I such an otaku ? Why not just go with the given the naive and easy way? Well, I am a performance and control freak and CIL is great fun and I feel 'pleasure' writing assembly code for a VM but most importantly, Benihime, 紅姫 makes the perfect training ground for learning language and compiler design. Prior to getting into deep hack mode on Benihime, 紅姫, I had no idea about the differences between 'expressions', 'statements', 'branches' or 'stacks'. Implementing complex boolean expressions in conditional statements like if((c && pm) ...

New Segmenter Compiler: Benihime, 紅姫

Image
An interesting design concern has brought the development of my new segmenter compiler to a temporary standstill for tonight: parallelization . I was trying to refactor and improve the design of my new segmenter compiler Benihime, 紅姫. (It is named after Urahara's sword from the Japanese anime Bleach . Benihime means crimson princess , what is more suitable to call a state-of-the-art segmenter. ^o^ theheee~~) One of my major concerns with the new implementation was decoupling the flow control logic from the segmenter builder and the flow direction. This is essential for being able to reuse the same logic to compile two segmenters that run in opposite directions for example. Let me illustrate the problem with the help of a file I happened to submit as a practice for our introduction to computational linguistics course just 5 days ago: // /home/sert/Projects/hw1/hw1/Main.cs created with MonoDevelop // // project created on 10/21/2007 at 3:19 AM using System; using System.IO; using...

New Project Openings!

Image
Corsis is seeking code reviewers, documentation writers and framework designers. If you want to work with us on non-commercial, voluntary terms, apply for a position today! Project Openings Corsis started as an open-source answer to commercial corpus analysis software and now constitutes a test bed for research in various subjects such as compiler design , performance optimization and parallel computation .

Character Frequency Calculations

sample plain text in Devanāgarī script character frequency analysis results source code sample program binaries *You can use the sample program not only with devanagari but with other scripts as well. See example. A student of final year B.Tech at National Institute of Technology, Hamirpur, India wrote to me today: (emphasis added) I came across Tenka-text while searching through the net for existing concordance softwares. I wish to develop a similar kind of free software for the Dev Nagri script which is the script for Hindi, our national Language. Initially I plan to develop just the character frequency calculation functionality which is not provided by any currently available product . Could you please guide me a little on how to go about this. I shall value your suggestions very much. To which I first replied: ... I'm at work right now and thus do not have access to my development tools and personal library and because of this cannot reply with an example C# program immediately...

New Logo/Wordmark Concept and Links

Image
I just wanted to post a new logo/wordmark concept I've come up with recently. It looks too damn serious ^__^ right? But I like it that way (*^o^*) By the way I happened to find three interesting new links to Tenka Text. First one on the links page of the Collection of Electronic Resources in Translation Technologies from the University of Ottowa , where they call my software "Un nouveau concurrent de WordSmith que vous pouvez télécharger gratuitement chez vous !" — "A new competitor of WordSmith which you can download for free!". Second one on the tools page of the Institut für Dokumentologie und Editorik at the University of Cologne . Third one on the tools page of the Dutch Language Union . It's very encouraging to see something you do as a hobby get such recognition. ^__^

Japanese Dramas - Fall 2007

Image
Two dramas I'm looking forward to watching in the new season: モップガール - Mop Girl (wiki) ジョシデカ! - Joshi Deka! (wiki)

Mono 1.2.5 binaries for Solaris 10/x86

I think I've finally managed to botch it all up. I created a JRE-style zipped package, which you can download here 1,2 at your own risk ;). Whoa, it's taken me 4 days just to get so far with mono/solaris. ^o^ I wanna sleep like the relakkuma now hehe... These binaries are compiled for 32-bit/x86 processors. They also run on 64-bit/x86-64 processors. They are not compiled for running winforms applications. To extract the archive use: "gtar -zxvf"