HashSet - a new high performance set collection from Orcas October CTP
I tested performances of several set implementations. A set is an unordered collection of unique elements. Time required to create a set of unique words with different implementations (Corpus Size: 278,675 tokens of 21,828 types): a cheap set implementation which derives from the BCL generic list collection and imposes Contains(T item) checks on each add / insert operation [ SVN ] 21,20~ seconds a set implementation which is a reflected / disassembled partial copy of the BCL generic list collection and performs manually inlined Contains checks on each add / insert operation 20,70~ seconds System.Collections.Specialized.StringCollection : if (!set.Contains(word)) set.Add(word); 18.96~ seconds HashSet from Orcas October CTP 0,17~ seconds ^_^ just who can beat this? I immediately decided to switch to the new generic HashSet from the BCL guys. Tenka.Text will greatly benefit from this development especially when sorting word-frequency dictionaries* on their frequencies. (* Your ...