Alexander Gromnitsky's Blog

Forgotten finite automata techniques

Latest update:

"I built the earliest demos [of Google Code Search in 2006] using Ken Thompson's Plan 9 grep, because I happened to have it lying around in library form. The plan had been to switch to a "real" regexp library, namely PCRE, probably behind a newly written, code reviewed parser, since PCRE's parser was a well-known source of security bugs.

"The only problem was my then-recent discovery that none of the popular regexp implementations - not Perl, not Python, not PCRE - used real automata. This was a surprise to me, and even to Rob Pike, the author of the Plan 9 regular expression library. (Ken was not yet at Google to be consulted.) I had learned about regular expressions and automata from the Dragon Book, from theory classes in college, and from reading Rob's and Ken's code. The idea that you wouldn't use the guaranteed linear time algorithm had never occurred to me. But it turned out that Rob's code in particular used an algorithm only a few people had ever known, and the others had forgotten about it years earlier."

(From Regular Expression Matching with a Trigram Index by Russ Cox.)


Tags: quote, ойті
Authors: ag