Implementing character classes
7th December 2015
The obvious way of implementing a set of characters in Lisp is to use a string. To test whether a character is in the set you can use the find function:
(defun vowel-p (character) (find character "aeiou"))
I recently wanted to implement a more flexible way of specifying sets of characters, like the character classes in regular expression syntax. This would include the ability to specify a range of characters, such as "A-Z", meaning any character from A to Z inclusive, or "0-9", meaning any digit. Because the character classes would be specified by the user I wanted the program to ignore illegal strings without giving an error.
Supporting character ranges
The following routine in-class-p tests whether character is in the character class specified by pattern, and returns nil or non-nil accordingly. It supports any number of character ranges included in the string of characters:
(defun in-class-p (character pattern)
(let ((end (length pattern))
(start 0))
(loop
(let ((dash (position #\- pattern :start start)))
(cond
((or (null dash) (= dash (1- end)))
(return (find character pattern :start start)))
((= dash start) (setq start (+ 1 dash)))
(t
(when (find character pattern :start start :end (1- dash)) (return t))
(when (char<=
(char pattern (1- dash))
character
(char pattern (1+ dash)))
(return t))
(setq start (+ 2 dash))))))))
If the pattern doesn't contain any ranges the routine behaves like the simple find version:
CL-USER > (in-class-p #\c "abcde") #\c
CL-USER > (in-class-p #\f "abcde") NIL
Any number of ranges can be included anywhere in the pattern:
CL-USER > (in-class-p #\f "a-z") T
Illegal combinations, such as a dash at the start or the end of the pattern, or two dashes in a row, are ignored:
CL-USER > (in-class-p #\f "-f-") #\f
Improvements
I'd welcome any suggestions for improvements.
blog comments powered by Disqus
