Elegant string searches
A long time ago I wrote a simple game.
The current version of the game is here: wordgame.
The current version was to give me a chance to play with canvas and Ajax.
Anyway, I wrote the original version a long time ago - and what I needed to do was find all words in a dictionary that could be made from the 9-letter grid. The original version was written in C and at the time the only option was to work with arrays (no hashes, STL, sets etc to help).
After a few attempts, I “discovered” the idea of using prime numbers. I have since noticed that this idea is not original. In fact it is similar to Godel numbering. Today I came across this blog entry google-interviewing-story.
One of the comments mentioned an idea that should be more efficient than using primes. The idea is to use two arrays. We will assume we have a ‘secret’ word and a dictionary full of words we want to search. Our first array contains the frequency of letters in our secret word and the second array is the frequency of the letters in the word we are checking.
Here is the python code for the prime method:
dictionary = '../scrabble.txt'
secret = 'chocolate'
primes = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31,
37, 41, 43, 47, 53, 59, 61, 67, 71, 73,
79, 83, 89, 97, 101]
def prime_val(ch):
return primes[ord(ch.lower()) - ord('a')]
def get_val(word):
total = 1
for ch in word:
total *= prime_val(ch)
return total
magic = get_val(secret)
for word in open(dictionary):
word = word.rstrip('\n')
if magic % get_val(word) == 0:
print word
Here is the code using arrays:
#!/usr/bin/python
dictionary = '../scrabble.txt'
secret = 'chocolate'
def xord(ch):
return ord(ch) - ord('a')
def get_word_array(word):
word_array = [0] * 26
for letter in word:
word_array[xord(letter)] += 1
return word_array
magic_array = get_word_array('chocolate')
for word in open(dictionary):
word = word.rstrip('\n')
word_array = get_word_array(word)
magic_clone = magic_array[:]
if min([magic_clone[i] - word_array[i] for i in range(26)]):
print word
So, which method was faster?
Here are the timings (on my iMac)
First program using primes:
real 0m1.391s
user 0m1.355s
sys 0m0.020s
The more efficient program:
real 0m3.569s
user 0m3.509s
sys 0m0.024s
So, the first program would appear to be noticeably quicker when implemented in Python. There was a good argument why the second program should be quicker.
I guess that it comes down to the language used for the implementation and the specifics of the problem. If this story has a moral, it would probably be this: Never assume that one algorithm will be more efficient than another when applied to a specific problem with a specific language.