Jan 19 2012
<script src=”gaddag.js” />
Words with Friends on Facebook is a recent addiction and I was wondering what it would take to automate the suggestion of valid words given a player’s rack and a letter on the board that the player wants to hook into. A quick search led me to the DAWG and GADDAG data structures which are most commonly used for problems like this. GADDAG is better suited for the Scrabble (or Words with Friends) situation and is faster than a DAWG based approach even if it requires about 5 times more space on an average.
I spent some time reading Steven Gordon’s paper from 1994 in which he presented the GADDAG data structure. The link is at the end of this post. For understanding how the GADDAG data structure works, let’s look at the Trie (which can be considered a special kind of DAWG) based representation of the words CAR, CARE and CARREL:

In the image, EOW stands for End-of-word. Tracing the paths in the tree, one can arrive at the words represented by the tree. This saves space as it groups words with common prefixes which results in fewer nodes. If someone wants all words starting with ‘C’, they just have to traverse the subtree starting with ‘C’. This tree can also answer queries to find all words with a specific prefix. However, answering queries such as “all words containing E” is not as straightforward. While one can traverse the entire tree looking for subtrees with an ‘E’, it is not efficient and for large trees such as those that store entire dictionaries for word games like Scrabble, the search time would be very high. This is where the GADDAG comes in.
For representing a word, the GADDAG data structure has as many subtrees as letters in that word. In other words, as many subtrees as the number of prefixes in that word. For example, to represent the word CARE, the first subtree would be for C>ARE. Here, C is the prefix and > separates the prefix from the rest of the word. However, the subsequent subtrees would be for AC>RE, RAC>E and ERAC>, where the prefixes “CA”, “CAR” and “CARE” are reversed before being stored in the tree. This is better explained with a diagram:

To reconstruct the word, one reads the letters before > in reverse and the letters appearing after > in correct order. This is the key to GADDAG. For example, to find all words containing an ‘R’, one descends the subtree rooted at R, reads RAC and reverses it to get CAR and then reads E, appending it to form CARE. Extending this to our original example of CAR, CARE and CARREL, we get the tree shown below.

This is the key to GADDAG. To find all words which contain ‘E’, one just has to descend the subtree rooted at ‘E’.
I wrote a Javascript based implementation of GADDAG which is based on John Resig’s work on Trie. Given our example words CAR, CARE and CARREL, John’s JSON Trie structure looks like:
{ "C": { "A": { "R": { "$": 0, "E": 0, "R": { "E": { "L": 0 } } } } } }
In this structure, 0 or $:0 is used to indicate EOW. The output JSON gets complicated really fast as the number of words increase. I used Chris Nielsen’s (amazing) JSON visualizer while implementing this. It is an indispensable tool if you do any JSON related work.
The JSON GADDAG output, which is too big to be include in its pretty form, is given below. Graphically, it looks like this.
{{"C":{">":{"A":{"R":{"$":0,"E":0,"R":{"E":{"L":0}}}}}},"A":{"C":{">":{"R":{"$":0,"E":0,"R":{"E":{"L":0}}}}}},"R":{"A":{"C":{">":{"$":0,"E":0,"R":{"E":{"L":0}}}}},"R":{"A":{"C":{">":{"E":{"L":0}}}}}},"E":{"R":{"A":{"C":{">":0}},"R":{"A":{"C":{">":{"L":0}}}}}},"L":{"E":{"R":{"R":{"A":{"C":{">":0}}}}}}}
The Trie “class” looks like this. I modified John’s code to pack the functions I need into one class. John’s blog post contains detailed description of the Trie implementation.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | // The Trie "class" function Trie() { // Structure that actually holds the tree var trie = {}; // Adds all words in the array passed as parameter this.addAll = function (words) {...} // Adds one word to the Trie this.add = function (word) {...} // Returns the internal tree structure this.getTrie = function () {...} // Returns all words contained in the Trie this.getWords = function () {...} // Returns the JSON string representation of the tree (like the one I posted above) this.getJson = function () {...} } |
Limitation (and a possible improvement) of addAll() is that input array must contain words in ascending order. My Gaddag class inherits from the Trie class.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | function Gaddag() { var separator = ">"; // Adds a word to the GADDAG this.add = function (word) {...} // Finds all words containing a hook letter this.findWordsWithHook = function (hook) {...} // Find all words containing a hook letter which can be formed using the letters in the array passed as rack this.findWordsWithRackAndHook = function (rack, hook) {...} // Splits the input array at index and returns the character at that index and the array minus that character as an object this.spliceArray = function (arr, index) { } // Inherit from Trie Gaddag.prototype = new Trie(); |
Adding a word to a GADDAG is only slightly different from adding it to a Trie. However, words containing duplicate letters are added as many times to the tree. This may be required for some queries to work (ones that involve length of prefix as a parameter) but for other cases, just one entry might be sufficient. This is one possible improvement depending on the usage.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | this.add = function (word) { if (word.length === 0) return; // add every (reversed-prefix + ">" + suffix) combination for (var i = 1; i < word.length; i++) { var prefix, ch; prefix = word.substring(0, i); ch = prefix.split(''); ch.reverse(); // call base class's add() to add this string to the tree Gaddag.prototype.add(ch.join('') + separator + word.substring(i)); } // add the whole word as prefix ch = word.split(''); ch.reverse(); Gaddag.prototype.add(ch.join('') + separator + word.substring(i)); }; |
The findWordsWithHook(hook) function is shown below. Explanation is inline.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | this.findWordsWithHook = function (hook) { var trie = Gaddag.prototype.getTrie(); // descend down the subtree containing the hook node var starterNode = trie[hook]; // output array var words = []; if (typeof starterNode === 'undefined') return; // recurse through this subtree to collect all words dig(hook, starterNode, 'left'); return words; /* word - word being currently formed cur - current node being traversed direction - direction of movement ('left' before > and 'right' after >) */ function dig(word, cur, direction) { // for every subtree from current node for (var node in cur) { // get all child nodes var val = cur[ node ], ch = (node === separator || node === "$" ? '' : node); // if we have reached the end of subtree, add complete word to output array if (val === 0) { words.push(word + ch); } else { // nodes after > form the suffix so change direction if (node === separator) direction = 'right'; // add current letter to word var part = (direction === 'left' ? ch + word : word + ch); // traverse the remaining nodes to form rest of word dig(part, val, direction); } // done with the previous subtree, reset direction to indicate we are in the prefix part of next subtree if (node === separator) direction = 'left'; } } } |
The findWordsWithRackAndHook(rack,hook) function is shown below. It is similar to the above function except that we have to loop through the rack and check if any of the child nodes appear in the rack before descending.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 | this.findWordsWithRackAndHook = function (rack, hook) { var trie = Gaddag.prototype.getTrie(); var words = []; /* To avoid recursing down duplicate characters more than once, sort the array and check whether we have already processed a letter before descending the subtree. */ rack.sort(); if (hook === '') { // each character in the rack acts as a hook with the following characters as the new rack while(rack.length > 1) { var h = rack.shift(); findWordsRecurse("", rack, h, trie, 'reverse'); } } else { findWordsRecurse("", rack, hook, trie, 'reverse'); } return words.unique(); function findWordsRecurse(word, rack, hook, cur, direction) { var hookNode = cur[ hook ]; if (typeof hookNode === 'undefined') return; var hookCh = (hook === separator || hook === "$" ? '' : hook); word = (direction === "reverse" ? hookCh + word : word + hookCh); for (var nodeKey in hookNode) { var nodeVal = hookNode[ nodeKey ]; var nodeCh = (nodeKey === separator || nodeKey === "$" ? '' : nodeKey); // if we have reached the end of this subtree, add the word (+ last character) to output array if (nodeVal === 0) { words.push(word + nodeCh); } else { // if this is the character separating the prefix, change direction and continue recursing if (nodeKey === separator) { findWordsRecurse(word, rack, separator, hookNode, 'forward'); } else { processRack(word, rack, nodeKey, hookNode, direction); } } } } // descend down the next subtree that is rooted at any letter in the rack (which is not a duplicate) function processRack(word, rack, nodeKey, hookNode, direction) { for (var i = 0; i < rack.length; i++) { if (nodeKey === rack[i]) { var duplicate = (i > 0 ? (rack[i] === rack[i - 1] ? true : false) : false); if (!duplicate) { var newRack = rack.slice(0); newRack.remove(i); findWordsRecurse(word, newRack, nodeKey, hookNode, direction); } } } } } |
These functions are not as straightforward as the underlying concept of GADDAG appears to be and it took me quite some time to get them working. I will be tweaking the code over the next few days and test it with full dictionaries. The code can be downloaded here. You can either run test-gaddag.js from the prompt using node.js or load test-gaddag.html in the browser.
Update (1/23/2012):
1. Bug fixes
2. Added check for start of game (no hook)
3. Added check to avoid processing duplicate letters in rack more than once
Update (3/27/2013):
James had reported a bug (see comment #10 below) long back. The tree seems to be constructed correctly and the problem was with the traversal code. Specifically lines 37,38 and 39 of findWordsWithRackAndHook.
Replace
37 38 39 | if (nodeVal === 0) { words.push(word + nodeCh); } else { |
with
37 38 39 40 41 42 43 44 | if (nodeVal === 0) { if(nodeCh != '' && rack.indexOf(nodeCh) === -1) { continue; } else { words.push(word + nodeCh); } } else { |
Before adding the last character to the word, we need to check if it is in the rack. If it is not present in the rack, we skip that character and move to the next one at the same level. I have not updated the downloaded files. Make sure you edit gaddag.js and include this fix.
Thanks for reporting the bug James!


