index

Article posted on: 2023-01-06 19:09
Last edited on: 2023-01-06 19:09
Written by: Sylvain Gauthier

My approach to rigorous resource management in C

No one asked but here is the way I write C code in a way that I think is pretty rigorous and thorough, and leads to good and clean software that will always have predictable outcomes even when the code gets large or when the weirdest errors happen.

Preliminary considerations

The structure of a function

In my experience, functions always boil down to three main steps:

The second step, the resource usage, is always dependent on the fact that the resources were correctly allocated, and should be skipped if that was not the case. So we can refine this simple method to the following pseudo code:

my_function() {
    allocate_resources();
    if (resources correctly allocated) {
        do_useful_stuff_with_resources();
    }
    free_up_resources();
}

Of course, the user of the function would like to know if the function succeeded. If the resource allocation step failed, it’s certain than the user won’t get the expected result, and should be notified so. There are two common mechanisms to return an error in C:

Now, consider that all functions in the stdlib and pretty much all other common C libraries have such a mechanism. Since allocating resources is generally done via such functions, we can immediately check on resource creation whether it failed or not.

Hence our pseudo code becomes:

/* function returning pointer */
ptr* my_function() {
    ptr* result = NULL;

    if (allocate_resource() fails) {
        handle_error();
    } else {
        do_stuff();
        result = something;
    }
    free_up_resources();
    return result;
}

/* function returning status */
int my_function() {
    int status;

    if (allocate_resource() fails) {
        handle_error();
        status = not_ok;
    } else {
        do_stuff();
        status = ok;
    }
    free_up_resources();
    return result;
}

This way, we leverage the error reporting mechanism of existing functions and we wrap it into a similar one for higher level functions.

You will notice that even when the resource allocation fails, we still go through to the resource clean up.

Proper resource management

Unfortunately, for too many programmers it seems to be obvious that once a malloc fails, it’s acceptable to stop caring about resource management and take the fastest way to crashing the program, in some extreme cases using a violent exit(-1) or similar. In my view, this is very bad practice in general. The caller of the function may actually very much like to handle a malloc failure or any other failure gracefully, it may need to store some critical information, remove some file lock, delete some UNIX domain socket file, etc.

So that means that no failure we detect in any function should ever get a special treatment. There should always be only one error mechanism, one of the two presented above. The function either fails or doesn’t but it should always return a status. And furthermore, whether it fails or not, it always need to clean up its resources.

Everything that is allocated by the function is either freed-up or handed back to the user, with a clear and documented way of freeing it up, whether it fails or not.

Let’s now consider the following C code:

char* get_some_data(const char* filename) {
    FILE* f;
    const char* prefix = "directory/";
    char *path, *result;

    if (!(path = malloc(strlen(prefix) + strlen(filename) + 1))) {
        return NULL;
    }

    strcpy(path, prefix);
    strcpy(path + strlen(prefix), filename);

    if (!(f = fopen(path))) {
        return NULL;
    } else if (!(result = find_data(f))) {
        return NULL;
    }
    free(path);
    fclose(f);
    return result;
}

This may look somewhat correct at first, we do check that resources were properly allocated before attempting to use them, once we are done we do free everything, if the function fails at any point it does return NULL instead of a valid pointer.

However, it is not rigorous resource management as conceptualized above. What if the file at directory/filename does not exist and fopen fails? The function returns without freeing path which was just allocated. The user is not aware this resource exists and thus will never be able to free it.

Similarly, what if the function find_data fails, for instance because the file is improperly formatted? Both path and f will not be freed correctly.

A first naive attempt to solve this would be the following:

char* get_some_data(const char* filename) {
    FILE* f;
    const char* prefix = "directory/";
    char *path, *result;

    if (!(path = malloc(strlen(prefix) + strlen(filename) + 1))) {
        return NULL;
    }

    strcpy(path, prefix);
    strcpy(path + strlen(prefix), filename);

    if (!(f = fopen(path))) {
        free(path);
        return NULL;
    } else if (!(result = find_data(f))) {
        fclose(f);
        free(path);
        return NULL;
    }
    free(path);
    fclose(f);
    return result;
}

Now, for every failed resource allocation, we free all previous resources that were allocated. However, this is not scalable and maintainable as every new resource requires to change a factor of n line of codes where n is the current number of resources allocated.

This illustrates another important point I want to make: try to always only have one return statement in each function. Of course, this is a rule of thumb and there are plenty of legitimate cases for having multiple returns in a function, however one can easily see the benefit of guaranteeing a single point of entry and exit in a function: it makes resource management a lot more, well, manageable.

Now let’s consider this variant:

char* get_some_data(const char* filename) {
    FILE* f;
    const char* prefix = "directory/";
    char *path, *result;

    if (!(path = malloc(strlen(prefix) + strlen(filename) + 1))) {
        result = NULL;
    } else {
        strcpy(path, prefix);
        strcpy(path + strlen(prefix), filename);

        if (!(f = fopen(path))) {
            result = NULL;
        } else {
            result = find_data(f);
        }
    }
    free(path);
    fclose(f);
    return result;
}

Now, because we have removed all other exit points of the function, we now for sure that no matter what happens, we will get to the resource freeing section at the end.

However, there is still a big problem. Can you see it?

If the first malloc fails, f will be let uninitialized and given to fclose. This way of structuring the code requires to initialize all resources to something that can be detected as “unallocated” in the resource freeing section.

But would initializing f to NULL be enough? No! As mentioned in the introduction, fclose doest not accept NULL as an input, so we need to add a check before calling fclose.

Finally, this version is something I could get by:

char* get_some_data(const char* filename) {
    FILE* f = NULL;
    const char* prefix = "directory/";
    char *path = NULL, *result = NULL;

    if (!(path = malloc(strlen(prefix) + strlen(filename) + 1))) {
        result = NULL;
    } else {
        strcpy(path, prefix);
        strcpy(path + strlen(prefix), filename);

        if (!(f = fopen(path))) {
            result = NULL;
        } else {
            result = find_data(f);
        }
    }
    free(path);
    f (f) fclose(f);
    return result;
}

While initializing path is not strictly necessary (if you think about it, in the final section it can only be NULL or a valid pointer), it’s a good habit to initialize it, in case we add more resources before it in the future.

Compacting code even more

One downside of this structure is that every time we need to do some stuff between two resource allocations, typically formatting an allocated file path before opening the corresponding file, we add one scope level. This might not be a problem for some people, but for others like me who like to wrap their code at the ol' reliable 80 chars, horizontal space is a scarce resource. A neat trick (but prone to very confusing code) is to use the comma-separated statements.

As indicated above, a series of statements separated by a comma , will evaluate to the last one. So we can do:

char* get_some_data(const char* filename) {
    FILE* f = NULL;
    const char* prefix = "directory/";
    char *path = NULL, *result = NULL;

    if (!(path = malloc(strlen(prefix) + strlen(filename) + 1))) {
        result = NULL;
    } else if (strcpy(path, prefix),
               strcpy(path + strlen(prefix), filename),
               !(f = fopen(path))) {
        result = NULL;
    } else {
        result = find_data(f);
    }
    free(path);
    f (f) fclose(f);
    return result;
}

Conclusion

I don’t believe there is a perfect, error-proof method to write C code, but I have plenty of experience with this one and sticking to those simple rules seem to have been consistently profitable to the quality and robustness of my code.

I should say that if you intend to get into formal verification using for instance frama-c, this code structure is a good fit, since it’s fairly easy to prove, in the last else scope, that all resources are valid.

If you want more examples using this structure, I do have a lot of code formatted this way in my repositories, for instance the SNDC parser here