relkom/small-regex

For the project Sandbox for FreeBSD there was required a regex core to match patterns against the input parameter. The library kokke/tiny-regex-c which was found on the GitHub seemed to be suitable, but after some testats it was found out that this has major problems:

  • BRANCH working incorectly (does not work at all)
  • No grouping support
  • when matching it is using stack which is not suitable for the kernel

It was decided to develop a new version on top of the kokke's library. It is available there GitHub small-regex At the moment, the most of the kokkes code was removed. When matching, the libarary uses loop instead of the recursion and it is allocates the place for the states stack in the heap.

The code is ready to work in the FreeBSD kernel space.

The structure of the compiled regex was modified:

typedef struct regex_objs_t
{
    uint8_t  type;          // CHAR, STAR, etc.
    uint32_t trueoffset;    //offset when eval is true
    uint32_t falseoffset;   //offset when eval is false
    union
    {
        uint8_t  ch;        //the character itself
        uint32_t ccl;       //an offset to characters in class
    };
} regex_objs_t;

/*
 * struct small_regex
 * An instance of the compiled regex
 */
typedef struct small_regex
{
    uint32_t objoffset;     //indicates the start of the regex objs
    uint32_t totalsize;     //total size of data[]
    uint32_t pstsize;       //predicted stack size
    uint8_t data[];         //data
} small_regex_t;

where the struct small_regex is a container where the instances of the struct regex_objs_t are stored. The instance of the struct small_regex is allocated in heap.
The trueoffset and falseoffset are offsets to the next node according to the evaluation of the current node.

API

struct small_regex * regex_compile(const char* pattern);
int32_t regex_matchp(struct small_regex * pattern, const char* text);
int32_t regex_match(const char* pattern, const char* text);

Next Post Previous Post