Elegant string searches

A long time ago I wrote a simple game.

The current version of the game is here: wordgame.

The current version was to give me a chance to play with canvas and Ajax.

Anyway, I wrote the original version a long time ago - and what I needed to do was find all words in a dictionary that could be made from the 9-letter grid. The original version was written in C and at the time the only option was to work with arrays (no hashes, STL, sets etc to help).

After a few attempts, I “discovered” the idea of using prime numbers. I have since noticed that this idea is not original. In fact it is similar to Godel numbering. Today I came across this blog entry google-interviewing-story.

One of the comments mentioned an idea that should be more efficient than using primes. The idea is to use two arrays. We will assume we have a ‘secret’ word and a dictionary full of words we want to search. Our first array contains the frequency of letters in our secret word and the second array is the frequency of the letters in the word we are checking.

Here is the python code for the prime method:

dictionary = '../scrabble.txt'

secret = 'chocolate'
primes = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31,
    37, 41, 43, 47, 53, 59, 61, 67, 71, 73,
    79, 83, 89, 97, 101]

def prime_val(ch):
    return primes[ord(ch.lower()) - ord('a')]

def get_val(word):
    total = 1
    for ch in word:
        total *= prime_val(ch)

    return total

magic = get_val(secret)
for word in open(dictionary):
    word = word.rstrip('\n')
    if magic % get_val(word) == 0:
        print word

Here is the code using arrays:

#!/usr/bin/python

dictionary = '../scrabble.txt'
secret = 'chocolate'

def xord(ch):
    return ord(ch) - ord('a')

def get_word_array(word):
    word_array = [0] * 26

for letter in word:
    word_array[xord(letter)] += 1
return word_array


magic_array = get_word_array('chocolate')

for word in open(dictionary):
    word = word.rstrip('\n')
    word_array = get_word_array(word)
    magic_clone = magic_array[:]

if min([magic_clone[i] - word_array[i] for i in range(26)]):
    print word

So, which method was faster?

Here are the timings (on my iMac)

First program using primes:

    real    0m1.391s
    user    0m1.355s
    sys     0m0.020s

The more efficient program:

    real    0m3.569s
    user    0m3.509s
    sys     0m0.024s

So, the first program would appear to be noticeably quicker when implemented in Python. There was a good argument why the second program should be quicker.

I guess that it comes down to the language used for the implementation and the specifics of the problem. If this story has a moral, it would probably be this: Never assume that one algorithm will be more efficient than another when applied to a specific problem with a specific language.