Implementing character classes
7th December 2015
The obvious way of implementing a set of characters in Lisp is to use a string. To test whether a character is in the set you can use the find function:
(defun vowel-p (character) (find character "aeiou"))
I recently wanted to implement a more flexible way of specifying sets of characters, like the character classes in regular expression syntax. This would include the ability to specify a range of characters, such as "A-Z", meaning any character from A to Z inclusive, or "0-9", meaning any digit. Because the character classes would be specified by the user I wanted the program to ignore illegal strings without giving an error.
Supporting character ranges
The following routine in-class-p tests whether character is in the character class specified by pattern, and returns nil or non-nil accordingly. It supports any number of character ranges included in the string of characters:
(defun in-class-p (character pattern) (let ((end (length pattern)) (start 0)) (loop (let ((dash (position #\- pattern :start start))) (cond ((or (null dash) (= dash (1- end))) (return (find character pattern :start start))) ((= dash start) (setq start (+ 1 dash))) (t (when (find character pattern :start start :end (1- dash)) (return t)) (when (char<= (char pattern (1- dash)) character (char pattern (1+ dash))) (return t)) (setq start (+ 2 dash))))))))
If the pattern doesn't contain any ranges the routine behaves like the simple find version:
CL-USER > (in-class-p #\c "abcde") #\c
CL-USER > (in-class-p #\f "abcde") NIL
Any number of ranges can be included anywhere in the pattern:
CL-USER > (in-class-p #\f "a-z") T
Illegal combinations, such as a dash at the start or the end of the pattern, or two dashes in a row, are ignored:
CL-USER > (in-class-p #\f "-f-") #\f
Improvements
I'd welcome any suggestions for improvements.
blog comments powered by Disqus