Spelling checker using a b-tree
12th October 2015
For several of my projects I wanted a simple spelling checker, that would check each word against a wordlist, and simply return t if the word was in the wordlist, or nil otherwise.
For efficiency I decided to represent the dictionary as a b-tree; here's a description of the Lisp program to implement the whole spellchecker application.
Adding a word to the dictionary
The dictionary is stored in *dict*:
(defparameter *dict* nil)
Here's the routine to add a word to the dictionary:
(defun %add-word (word tree) (let* ((char1 (char-downcase (elt word 0))) (len (length word))) (flet ((findchar (x) (and (listp x) (eq (car x) char1)))) (if (= len 1) (if (find char1 tree) tree `(,char1 ,@tree)) ;; ;; length word > 1 (let ((it (find-if #'findchar tree))) (if it `((,char1 ,@(%add-word (subseq word 1) (cdr it))) ,@(remove-if #'findchar tree)) `((,char1 ,@(%add-word (subseq word 1) nil)) ,@tree)))))))
This is called by add-word:
(defun add-word (word) (setq *dict* (%add-word word *dict*)))
For example, after executing:
(map nil #'add-word '("one" "two" "three" "four" "five" "six" "seven" "eight"))
we have:
CL-USER >*dict* ((#\e (#\i (#\g (#\h #\t)))) (#\s (#\e (#\v (#\e #\n))) (#\i #\x)) (#\f (#\i (#\v #\e)) (#\o (#\u #\r))) (#\t (#\h (#\r (#\e #\e))) (#\w #\o)) (#\o (#\n #\e))
Looking up a word
Here's the routine find-word to look up a word in the dictionary. It uses this helper function:
(defun %find-word (word tree) (if (zerop (length word)) t (let* ((char1 (char-downcase (elt word 0)))) (if (= (length word) 1) (find char1 tree) ;; ;; length word > 1 (let ((it (dolist (x tree) (when (and (listp x) (eq (car x) char1)) (return x))))) (when it (%find-word (subseq word 1) (cdr it))))))))
Here's the routine itself:
(defun find-word (word) (%find-word word *dict*))
CL-USER > (find-word "ten") NIL CL-USER > (find-word "three") #\e
Deleting a word from the dictionary
This is the routine to delete a word from the dictionary:
(defun %delete-word (word tree) (let* ((char1 (char-downcase (elt word 0)))) (flet ((findchar (x) (and (listp x) (eq (car x) char1)))) (if (= (length word) 1) (if (find char1 tree) (remove char1 tree) tree) ;; ;; length word > 1 (let ((it (find-if #'findchar tree))) (if it `((,char1 ,@(%delete-word (subseq word 1) (cdr it))) ,@(remove-if #'findchar tree)) `((,char1 ,@(%delete-word (subseq word 1) nil)) ,@tree)))))))
This is called with:
(defun delete-word (word) (cond ((%find-word word *dict*) (setq *dict* (%delete-word word *dict*)) t) (t nil)))
Applying a function to all words in the dictionary
Finally a routine to apply a function to each of the words in the dictionary:
(defun %map-words (tree path function) (cond ((atom tree) (funcall function (map 'string #'identity (reverse (cons tree path))))) (t (dolist (x (cdr tree)) (%map-words x (cons (car tree) path) function)))))
This is called with:
(defun map-words (function dictionary) (dolist (x dictionary) (%map-words x nil function)))
For example:
CL-USER > (map-words #'print *dict*) "eight" "seven" "six" "five" "four" "three" "two" "one" NIL
blog comments powered by Disqus