About me

About me


RSS feed

Implementing character classes

7th December 2015

The obvious way of implementing a set of characters in Lisp is to use a string. To test whether a character is in the set you can use the find function:

(defun vowel-p (character)
  (find character "aeiou"))

I recently wanted to implement a more flexible way of specifying sets of characters, like the character classes in regular expression syntax. This would include the ability to specify a range of characters, such as "A-Z", meaning any character from A to Z inclusive, or "0-9", meaning any digit. Because the character classes would be specified by the user I wanted the program to ignore illegal strings without giving an error.

Supporting character ranges

The following routine in-class-p tests whether character is in the character class specified by pattern, and returns nil or non-nil accordingly. It supports any number of character ranges included in the string of characters:

(defun in-class-p (character pattern)
  (let ((end (length pattern))
        (start 0))
     (let ((dash (position #\- pattern :start start)))
        ((or (null dash) (= dash (1- end)))
         (return (find character pattern :start start)))
        ((= dash start) (setq start (+ 1 dash)))
         (when (find character pattern :start start :end (1- dash)) (return t))
         (when (char<= 
                (char pattern (1- dash)) 
                (char pattern (1+ dash))) 
           (return t))
         (setq start (+ 2 dash))))))))

If the pattern doesn't contain any ranges the routine behaves like the simple find version:

CL-USER > (in-class-p #\c "abcde")
CL-USER > (in-class-p #\f "abcde")

Any number of ranges can be included anywhere in the pattern:

CL-USER > (in-class-p #\f "a-z")

Illegal combinations, such as a dash at the start or the end of the pattern, or two dashes in a row, are ignored:

CL-USER > (in-class-p #\f "-f-")


I'd welcome any suggestions for improvements.

blog comments powered by Disqus