FreeBSD Capsicum article

I will start with a digression from the topic of the article.

Modern OSes provides different mechanisms to isolate userland applications from each other. This is important because the CPU provides only limited protections which mostly protects from the basic improper access. The recent vulnerabilities in Intel and other vendors and architectures CPUs clearly showed that the security was not and will never be in the first place. The performance is in the first place, but it drops down because we have to apply more and more security layers, apply countermeasures i.e isolations and other algorithms to be sure that everything is limited to its role on the system.

The following mechanisms are available for the following OSes.

  • Windows [Windows ACLs and SIDs]
  • Linux [chroot, SELinux, seccomp, AppArmor]
  • FreeBSD [jail, chroot, capsicum, sandbox(developed by me)]
  • OpenBSD[chroot, pledge]
  • Mac OS X [seatbelts-sandbox, chroot]

Linux has more programs for supporting access control security policies and other mechanisms. OpenBSD is secure by design i.e auditing and removing unnecessary code. Since OpenBSD 5.9 release, there was added a pledge(2) mechanism for putting a process into a “restricted-service operating mode”. Windows itself is not really secure, no comments. Mac OS X has a seatbelts-sandbox which is based on the path and used to isolate applications on iOS. FreeBSD has Jais. This is actually a OS-level virtualization and not necessary isolates running programs from each other or applies any policies. A sandbox, which is developed by me, is based on the MAC Framework is like the Mac OS X seatbelts, but mine version is less elegant. A Capsicum, which is supported by the FreeBSD since version 10 released. There are plenty of articles available about the mechanisms on how the Capsicum is functioning, so there is no need to describe it once again. In this article, a practical usage of the Capsicum sandbox will be demonstrated.

The main drawback of using Capsicum technologies, in my opinion, is a need to build it into your program. Also at the moment Capsicum in capability mode has one limitation for fexecv(). When you try to execute execv() from your program when it is already in capability mode, the following message will be displayed:

ELF interpreter /libexec/ld-elf.so.1 not found, error 94

This happens because in capability mode normal access to the filesystem is prevented. In the FreeBSD kernel sources it is said:

"While capability mode can't reach this point via direct path arguments to execve(), we also don't allow interpreters to be used in capability mode (for now). Catch indirect lookups and return a permissions error."

For now, the sandbox program can not execute any other programs. This should be done before entering capability mode.

Practice?

In order to properly sandbox you program, it is necessary to know to which resources a program should gain access. If your program is large, this will require some time to figure everything out. In the sys/sys/capsicum.h header file, can be found defined "Possible rights on capabilities".

At first, it is required to initialize a cap_rights_t structure, where the rights are defined.

struct cap_rights 
{
   uint64_t cr_rights[CAP_RIGHTS_VERSION + 2];

};

typedef struct cap_rights cap_rights_t;

The cap_rights_t structure is initialized with cap_rights_init(). The necessary capability rights are provided as a second argument to the function. For instance:

//initializing capsicum sandbox for program itself, but this is not necessary.
cap_rights_t self_rights;

cap_rights_init(&self_rights, CAP_FCNTL, CAP_FSTAT, CAP_IOCTL, CAP_READ);

// limit base capabilities
if (cap_rights_limit(0, &self_rights)  < 0)
{
    err(1, "cap_rights_limit() failed, could not restrict capabilities");
}

OR limiting the specific fd, but not the global namespace

//usr.bin/cmp/cmp.c
cap_rights_init(&rights, CAP_FCNTL, CAP_FSTAT, CAP_MMAP_R);

if (cap_rights_limit(fd1, &rights) < 0 && errno != ENOSYS)

err(ERR_EXIT, "unable to limit rights for %s", file1);

if (cap_rights_limit(fd2, &rights) < 0 && errno != ENOSYS)

err(ERR_EXIT, "unable to limit rights for %s", file2);

The ioctls and fcntls are limited separatly:

// allow selected ioctls
unsigned long cmds[] = { TIOCGETA, TIOCGWINSZ };
if (cap_ioctls_limit(0, cmds, nitems(cmds)) < 0)
{
    err(1, "cap_ioctls_limit() failed, could not restrict capabilities");
}

// allow selected fcntls
if (cap_fcntls_limit(0, CAP_FCNTL_GETFL) < 0)
{
   err(1, "cap_fcntls_limit() failed, could not restrict capabilities");
}

The returned value of the *limit functions should be checked to verify these operations actually succeed like for the setgroups(2), setgid(2).

And finally, the mysterious functions which triggers the magic of isolation! As any "magic", it will operate as much precisely as you have casted it, lol.

printf("enering cap mode\n");

//entering cap mode
if (cap_enter() < 0)
{
    //Failed to sandbox
    //CAPABILITY_MODE enabled fatal
    if (errno != ENOSYS)
    {
        err(1, "failed to enter security sandbox");
    }
}
else
{
    if (cap_sandboxed() == false)
    {
        err(1, "we are not in sandbox");
    }

}

Ater calling calling cap_enter() the program becomes sandboxed. The cap_sandboxed() is used to confirm if program was sandboxed.

Example

In this example, we are going to sandbox program which is working with the sockets and network itself.

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <errno.h>
#include <string.h>
#include <termios.h>
#include <err.h>
#include <sys/ioctl.h>
#include <sys/capsicum.h>
#include <netdb.h>
#include <stdint.h>
#include <netinet/in.h>
#include <sys/types.h>
#include <arpa/inet.h>
#include <sys/socket.h>

int bindserver(char * ip, uint16_t portno, char * testfailmsg, char * testsuccessmsg, cap_rights_t * sock_right);

int bindserver(char * ip, uint16_t portno, char * testfailmsg, char * testsuccessmsg, cap_rights_t * sock_right)
{
    struct sockaddr_in serv_addr;

    int sockfd = socket(AF_INET, SOCK_STREAM, 0);

    if (sockfd < 0)
    {
        perror("ERROR opening socket  socket(AF_INET, SOCK_STREAM, 0)");
        exit(1);
    }

    if (sock_right != NULL)
    {
        if (cap_rights_limit(sockfd, sock_right) < 0)
        {
            err(1, "cap_rights_limit() failed, could not restrict capabilities");
        }
    }

    bzero((char *) &serv_addr, sizeof(serv_addr));

    serv_addr.sin_family = AF_INET;
    serv_addr.sin_addr.s_addr = inet_addr(ip);//INADDR_ANY;
    serv_addr.sin_port = htons(portno);

    // Now bind the host address using bind() call.
    if (bind(sockfd, (struct sockaddr *) &serv_addr, sizeof(serv_addr)) < 0)
    {
        printf("ERROR on binding %s:%u %s\r\n", ip, portno, testfailmsg);
        close(sockfd);
    }
    else
    {
        printf("OK binding %s:%u %s\r\n", ip, portno, testsuccessmsg);
    }

   return sockfd;
}

int main(void)
{

    int newsockfd, clilen;
    char buffer[256];
    struct sockaddr_in serv_addr, cli_addr;
    int  n;

    //initializing capsicum socket for the sandbox
    cap_rights_t socket_rights;

    //init it as a socket
    cap_rights_init(&socket_rights, CAP_SOCK_SERVER);

    int sockfd1 = bindserver("0.0.0.0", 80, "TEST FAILED", "TEST SUCCESS", &socket_rights);
    int sockfd2 = bindserver("127.0.0.1", 443, "TEST FAILED", "TEST SUCCESS", &socket_rights);
    int sockfd3 = bindserver("0.0.0.0", 8080, "TEST FAILED", "TEST SUCCESS", &socket_rights);

    //we will listen on 0.0.0.0:8080
    if (sockfd3 < 0)
    {
       return 1;
    }

    printf("enering cap mode\n");
    //entering cap mode
    if (cap_enter() < 0)
    {
        //Failed to sandbox
        //CAPABILITY_MODE enabled fatal
        if (errno != ENOSYS)
        {
            err(1, "failed to enter security sandbox");
        }
    }
    else
    {
        if (cap_sandboxed() == false)
        {
            err(1, "we are not in sandbox");
        }
    }

    //try to bind the tcp:0.0.0.0:8081 after entering capability mode
    int sockfd4 = bindserver("0.0.0.0", 8081, "TEST FAILED", "TEST SUCCESS", NULL);

    listen(sockfd3,5);
    clilen = sizeof(cli_addr);

    while(1)
    {
        printf("Waiting for conncection\r\n");
        /* Accept actual connection from the client */
        newsockfd = accept(sockfd3, (struct sockaddr *)&cli_addr, &clilen);

        if (newsockfd < 0)
        {
            err(1, "ERROR on accept");
        }

        printf("Connected...\r\n");

        /* If connection is established then start communicating */
        int32_t done = 0;
        do
        {
            bzero(buffer,256);
            n = read( newsockfd,buffer,255 );

            if (n < 0) {
                err(1, "ERROR reading from socket");
            }
            else
            {
                printf("Here is the message: %s\n",buffer);

                // Write a response to the client
                n = write(newsockfd,"I got your message",18);

                if (n < 0)
                {
                    perror("ERROR writing to socket");
                }
            }
            done = 1;
        }
        while(done == 0);

        close(newsockfd);

    }

    close(sockfd1);
    close(sockfd2);
    close(sockfd3);

    return 0;
}

What is going on in the listing above. The cap_rights_t socket_rights is initialized with the CAP_SOCK_SERVER which is a definition.

//expanding the CAP_SOCK_SERVER

#define CAP_SOCK_SERVER \
(CAP_ACCEPT | CAP_BIND | CAP_GETPEERNAME | CAP_GETSOCKNAME | \
CAP_GETSOCKOPT | CAP_LISTEN | CAP_PEELOFF | CAP_RECV | CAP_SEND | \
CAP_SETSOCKOPT | CAP_SHUTDOWN)

Each socket fd is limited to prepared cap_rights with the following line of code

cap_rights_limit(sockfd, sock_right)

The meaning of the "TEST FAILED" and "TEST SUCCEED": if bind fails, the message "TEST FAILED" will be printed. On success "TEST SUCCESS" will be printed.

After entering capability mode, we are repeating our attempt to bind socket again - sockfd4 which will fail.

Results:

make: stopped in /root/capsicum_tests
root@:~/capsicum_tests # ./main
OK binding 0.0.0.0:80 TEST SUCCESS
OK binding 127.0.0.1:443 TEST SUCCESS
OK binding 0.0.0.0:8080 TEST SUCCESS
enering cap mode
ERROR on binding 0.0.0.0:8081 TEST FAILED
Waiting for conncection
Connected...
Here is the message: hello 123

Waiting for conncection

I think, at the moment, this is all what I wanted to say about the capsicum.

Comparing with MAC sandbox

Unfortunately, the sandbox which I am currently developing is still unstable for the following reasons:

  • vnode to path conversion falure because deadlock, namecache mismatch etc...
  • it was decided to introduce some improvements to the schema

But, anyway thats how would the sandbox scheme looks like when sandboxing with MAC sandbox:

(var version 1)

(var title "bindtest")

(deny default)

(limit limit-global 0)
(limit limit-fork 0)

(allow file-lookup)

(allow file-link
     (literal "/lib/libc.so.7") 
     (literal "/etc/libmap.conf") 
)

(allow file-read-data
     (literal "/var/run/ld-elf.so.hints") 
)

(allow file-read-metadata
     (literal "/dev/pts/1") 
     (literal "/lib/libc.so.7") 
     (literal "/var/run/ld-elf.so.hints") 
     (literal "/usr/local/etc")
     (literal "/usr/local") 
     (literal "/usr") 
     (literal "/etc/libmap.conf") 
     (literal "/etc")
)

(allow sysctl-read
     (sysctl "vm.overcommit")
)

(allow system-socket
     (dtp pf-inet sock-stream ipproto-hopopts)
)

(allow network-inbound
     (local tcp "192.168.122.102:8080")
;     (remote tcp "192.168.122.1:*")
)

(allow network-outbound
;     (remote tcp "192.168.122.1:*")
)

(allow network-bind
     (local tcp "0.0.0.0:8080")
)

(allow network-accept
)

(allow system-priv
     (privs-id priv_netinet_reuseport "root" "wheel" ); priv:504 uid:0 gid:0
     (privs-id priv_netinet_reservedport "root" "wheel" ); priv:490 uid:0 gid:0
     (privs-id priv_vfs_generation "root" "wheel" ); priv:326 uid:0 gid:0
)

; all|deny

As in the capsicum model of sandboxing, there we also allows network-accept and network send/receive. Also allow to create socket of a specific type. Access to read only one sysctl OID from plenty available. And limit application privs.

Thank you for reading!

Next Post Previous Post