For the project: "Sandbox utility for FreeBSD" which is developed by the Relkom s.r.o up to this day there was an unfinished task to implement a simple regular expression matching mechanism which should work in the kernel space. It was needed in order to handle the rules with filter-key: regex. Doing some research on this problem, it was found out that the kernel does not have build in regular expression implementation. The only available option was to port the libC regex implementation which would definatly took a lot of time.

Searching the web it was found two suitable external regex libraries:

It was decided to work on kokke's implementation of the regex library because:

  • the code is quite simple which makes it easier to debug and audit it
  • there is no need to introduce any code from libC because the code depends only on stdio which makes realisation easyly portable

The modified version is available at (no longer available)

The tiny-regex-c repository was moved to the company RELKOM S.R.O github account

It is not a kernel module, but the code was ported to work in with FreeBSD kernel code base.

The following structures and code were modified (comparing with the original version):

  • The definition of the struct regex_t was moved to the header file and renamed to the struct re_regex with typedef re_regex_t
  • The characters for the match class is now stored in the instance of the struct re_regex. Basically, this approach consumes more memory, but it is easier to export the data this way.
  • The struct re_regex returned by the regex compiler is stored in heap labeled MC_RE. To deallocate the compiled regex, the free(, MC_RE) or KFREE() should be invoked.
  • Added assertion of the input arguments to avoid NULL pointer arguments
  • The preprocessor definitions adds or removes features or controls the runtime behaviour

The re_regex has some problems with matching the complex match classes, the problems are described on the github page, but later it will be fixed.

The API of the tiny-regex:

// typedef for the regex structure
typedef struct re_regex re_regex_t;

// Compile regex string pattern to a regex_t-array.
// The returned result is an array located in HEAP! use free();
// The size of the returned array is o_reg_cnt
// But, the array can be readed until the re_compiled[j].type == UNUSED
// The o_reg_cnt is returned for the further export
struct re_regex * re_compile(const char* pattern, uint32_t * o_reg_cnt);

//Find matches of the compiled pattern inside text.
//return 0 on success -1 on error or larger than 0 is ???
int32_t re_matchp(struct re_regex * pattern, const char* text);

// Find matches of the txt pattern inside text (will compile automatically first).
int32_t re_match(const char* pattern, const char* text);

//debug functions
void re_trace(re_regex_t * pattern, uint32_t count);
void re_print(re_regex_t * pattern, uint32_t count);

//---------TUNABLES-----
//for small stack machines use loop, for large the recursion can be used
#define RE_MATCH_RECURSIVE              0   //use recursive or loop compare
#define RE_REGEX_INSTANCE_REALLOCATE    1   //1 - reallocate on overuse, 0 - exit
#define RE_TRUNC_ARRAY_IF_POSSIBLE      1   //1 - truncate array to reduce size, 0 - do not
#define RE_CHAR_CLASS_LENGTH            20  //per struct regex_t instance
#define MIN_REGEXP_OBJECTS              30  //min number of regex symbols in expression.
#define RE_BUILDWITH_DEBUG              1   //compile with print and trace functions

//The RE_MATCH_RECURSIVE when set to 1 uses recursion instead of the cycle to match
//  make sure that the depth of the recursion calls is not controlled
//  on low stack (in kernel) use loop
//The RE_REGEX_INSTANCE_REALLOCATE when set to 1 allocates more memory when run out of it
//The RE_TRUNC_ARRAY_IF_POSSIBLE when set to 1 truncates array if unused nodes were left
//The RE_CHAR_CLASS_LENGTH defines the size of the storage for class data
//The MIN_REGEXP_OBJECTS defines the inital amount of the allocated regex nodes

The reason why the function returnes struct re_regex instead of its typedef re_regex_t is mostly personal preference, because I hate playing the game called guess what sort of typedefined variable type is it.

The example:

#include <stdio.h>
#include "re.h"

int main(void)
{
    //nodes count storage
    uint32_t o_reg_cnt = 0;

    //compile regex
    struct re_regex * re = re_compile("^/Users/[a-zA-Z0-9_]+/Library/Preferences$", &o_reg_cnt);

    //printing out to the stdout
    re_print(re, o_reg_cnt);

    //mathing, if returned 0 then matched, -1 not matched
    if (re_matchp(re, "/Users/alex/Library/Preferences") == 0)
    {
        //matched
    }
    else
    {
        //unmatched
    }

    free((void*) re);
}

The additonal files:

  • assert.h a variable assertion realization. It also prints message to the stdout. see example below
    //code form assert.h
    #define ASSERT_NULL_R(_VAR_, _RETURN_)\
    if (_VAR_ == NULL)\
    {\
    printf("ASSERT_NULL:[%s:%u] var: %s\r\n", __FUNCTION__, __LINE__, #_VAR_);\
    return _RETURN_;\
    }
    //return 1 if var1 is NULL (0)
    ASSERT_NULL_R(var1, 1)
  • e_enum.h a helper which helps to avoid extra work when working with enums. see example below
    
    #define FOREACH_FOO(DV)\
        DV(FOO_BAR1)\
        DV(FOO_BAR2)

enum FOO { FOREACH_FOO(GENERATE_ENUM) }

//----another example

define FOREACH_FOO(DV)\

    DV(FOO_BAR1, 1)\
    DV(FOO_BAR2, 3)
enum FOO
{
    FOREACH_FOO(GENERATE_ENUM_VAL)
}

const char* foos[] = 
{ 
    FOREACH_FOO(GENERATE_VAR_NAMES_VAL)
};

The test files located at directories ./tests and ./scripts were left. The testcases are also outputs execution time.

Next Post Previous Post