Spelling checker using a b-tree
12th October 2015
For several of my projects I wanted a simple spelling checker, that would check each word against a wordlist, and simply return t if the word was in the wordlist, or nil otherwise.
For efficiency I decided to represent the dictionary as a b-tree; here's a description of the Lisp program to implement the whole spellchecker application.
Adding a word to the dictionary
The dictionary is stored in *dict*:
(defparameter *dict* nil)
Here's the routine to add a word to the dictionary:
(defun %add-word (word tree)
(let* ((char1 (char-downcase (elt word 0)))
(len (length word)))
(flet ((findchar (x) (and (listp x) (eq (car x) char1))))
(if (= len 1)
(if (find char1 tree) tree `(,char1 ,@tree))
;;
;; length word > 1
(let ((it (find-if #'findchar tree)))
(if it
`((,char1 ,@(%add-word (subseq word 1) (cdr it)))
,@(remove-if #'findchar tree))
`((,char1 ,@(%add-word (subseq word 1) nil)) ,@tree)))))))
This is called by add-word:
(defun add-word (word) (setq *dict* (%add-word word *dict*)))
For example, after executing:
(map nil #'add-word '("one" "two" "three" "four" "five" "six" "seven" "eight"))
we have:
CL-USER >*dict* ((#\e (#\i (#\g (#\h #\t)))) (#\s (#\e (#\v (#\e #\n))) (#\i #\x)) (#\f (#\i (#\v #\e)) (#\o (#\u #\r))) (#\t (#\h (#\r (#\e #\e))) (#\w #\o)) (#\o (#\n #\e))
Looking up a word
Here's the routine find-word to look up a word in the dictionary. It uses this helper function:
(defun %find-word (word tree)
(if (zerop (length word)) t
(let* ((char1 (char-downcase (elt word 0))))
(if
(= (length word) 1)
(find char1 tree)
;;
;; length word > 1
(let ((it
(dolist (x tree)
(when (and (listp x) (eq (car x) char1)) (return x)))))
(when it (%find-word (subseq word 1) (cdr it))))))))
Here's the routine itself:
(defun find-word (word) (%find-word word *dict*))
CL-USER > (find-word "ten") NIL CL-USER > (find-word "three") #\e
Deleting a word from the dictionary
This is the routine to delete a word from the dictionary:
(defun %delete-word (word tree)
(let* ((char1 (char-downcase (elt word 0))))
(flet ((findchar (x) (and (listp x) (eq (car x) char1))))
(if
(= (length word) 1)
(if (find char1 tree) (remove char1 tree) tree)
;;
;; length word > 1
(let ((it (find-if #'findchar tree)))
(if it
`((,char1 ,@(%delete-word (subseq word 1) (cdr it)))
,@(remove-if #'findchar tree))
`((,char1 ,@(%delete-word (subseq word 1) nil)) ,@tree)))))))
This is called with:
(defun delete-word (word)
(cond
((%find-word word *dict*) (setq *dict* (%delete-word word *dict*)) t)
(t nil)))
Applying a function to all words in the dictionary
Finally a routine to apply a function to each of the words in the dictionary:
(defun %map-words (tree path function)
(cond
((atom tree)
(funcall function (map 'string #'identity (reverse (cons tree path)))))
(t (dolist (x (cdr tree))
(%map-words x (cons (car tree) path) function)))))
This is called with:
(defun map-words (function dictionary) (dolist (x dictionary) (%map-words x nil function)))
For example:
CL-USER > (map-words #'print *dict*) "eight" "seven" "six" "five" "four" "three" "two" "one" NIL
blog comments powered by Disqus
