The better word count extension for BlogEngine.NET

What is its usage?

Displays count of the words in the post.

How does this extension come out?

The other day I found a word count extension for BlogEngine.NET that can display counts of the words in the post, and I installed it. But it only works well with Western Language posts, for example, with English posts. If you publish an Asian Language post, the count would be incorrect. For example, I posted a 2500 words long article in Chinese, the word count extension displays only 17 words long! After I investigated its source code, I understood the algorithm. It only splits article into pieces by space, and then calculates the count of the pieces. That’s OK with Western Languages, but for Asian Languages, the words are continous without delimiting by space. So I decided to improve this extension to be compatible with Asian Languages.

How to install this BETTER version extension?

WordCount.cs (2.45 kb)

Download the WordCount.cs from the above link and put it into your blog directory: ~\App_Code\Extensions\WordCount.cs.

Then add this piece of code to your PostView.ascx file under your theme directory. For example: ~\themes\Standard\PostView.ascx.

<%= WordCount.GetWordCount(Post) %>

The better word count extension for BlogEngine.NET


The better word count extension for BlogEngine.NET

How did you improve the original word count extension?

I just changed the algorithm for calculating the words. The algorithm is came from The C# implementation of this algorithm is as below:

    /// Calculate word count
    private static int CalculateWordCount(string article){
        var sec = Regex.Split(article, @"\s");
        int count = 0;
        foreach (var si in sec)
            int ci = Regex.Matches(si, @"[\u0000-\u00ff]+").Count;
            foreach (var c in si)
                if ((int)c > 0x00FF) ci++;

            count += ci;

        return count;

A little more description:

It is evolved from Word Count 2.5, which is made by GENiALi. The original version has a critical bug - it does NOT work well with Asian Languages. For example, it displays only 17 words for a 2500 words long Chinese post. The cause is its word counting algorithm, it treats all characters separated by space as words, but Asian words are not even separated by space at all! So I improved its word counting algorithm and make it be compatible with Asian words post.

Add comment