Article posted on: 2023-01-06 19:09
Last edited on: 2023-01-06 19:09
Written by: Sylvain Gauthier
No one asked but here is the way I write C code in a way that I think is pretty rigorous and thorough, and leads to good and clean software that will always have predictable outcomes even when the code gets large or when the weirdest errors happen.
malloc
: returns NULL
on error.realloc
: returns NULL
on error.fopen
: returns NULL
on error.read
and write
set necessary if not critical information in errno
which must be checked.NULL
, some don’t:
free
accepts NULL
.fclose
doesn’t.close
doesn’t accept fd = -1
.(a = 3) == 3
is true.a(param) || b(param)
: here b(param)
will never be called if a(param)
is true.a(param) && b(param)
: here b(param)
will never be called if a(param)
is false.,
will evaluate to the last
expression:
(expr1(param1), expr2(param2), expr3(param3))
will execute all three
functions but will evaluate to the result of expr3(param3)
.In my experience, functions always boil down to three main steps:
The second step, the resource usage, is always dependent on the fact that the resources were correctly allocated, and should be skipped if that was not the case. So we can refine this simple method to the following pseudo code:
my_function() {
allocate_resources();
if (resources correctly allocated) {
do_useful_stuff_with_resources();
}
free_up_resources();
}
Of course, the user of the function would like to know if the function succeeded. If the resource allocation step failed, it’s certain than the user won’t get the expected result, and should be notified so. There are two common mechanisms to return an error in C:
NULL
in case
of error.int
indicating the error: commonly -1
is used
on error and 0
on success, but I also like using 1
on success and 0
on
error for the clarity of if (!do_something()) { /* handlle error */ }
.Now, consider that all functions in the stdlib
and pretty much all other
common C libraries have such a mechanism. Since allocating resources is
generally done via such functions, we can immediately check on resource creation
whether it failed or not.
Hence our pseudo code becomes:
/* function returning pointer */
ptr* my_function() {
ptr* result = NULL;
if (allocate_resource() fails) {
handle_error();
} else {
do_stuff();
result = something;
}
free_up_resources();
return result;
}
/* function returning status */
int my_function() {
int status;
if (allocate_resource() fails) {
handle_error();
status = not_ok;
} else {
do_stuff();
status = ok;
}
free_up_resources();
return result;
}
This way, we leverage the error reporting mechanism of existing functions and we wrap it into a similar one for higher level functions.
You will notice that even when the resource allocation fails, we still go through to the resource clean up.
Unfortunately, for too many programmers it seems to be obvious that once a
malloc
fails, it’s acceptable to stop caring about resource management and
take the fastest way to crashing the program, in some extreme cases using a
violent exit(-1)
or similar. In my view, this is very bad practice in general.
The caller of the function may actually very much like to handle a malloc
failure or any other failure gracefully, it may need to store some critical
information, remove some file lock, delete some UNIX domain socket file, etc.
So that means that no failure we detect in any function should ever get a special treatment. There should always be only one error mechanism, one of the two presented above. The function either fails or doesn’t but it should always return a status. And furthermore, whether it fails or not, it always need to clean up its resources.
Everything that is allocated by the function is either freed-up or handed back to the user, with a clear and documented way of freeing it up, whether it fails or not.
Let’s now consider the following C code:
char* get_some_data(const char* filename) {
FILE* f;
const char* prefix = "directory/";
char *path, *result;
if (!(path = malloc(strlen(prefix) + strlen(filename) + 1))) {
return NULL;
}
strcpy(path, prefix);
strcpy(path + strlen(prefix), filename);
if (!(f = fopen(path))) {
return NULL;
} else if (!(result = find_data(f))) {
return NULL;
}
free(path);
fclose(f);
return result;
}
This may look somewhat correct at first, we do check that resources were
properly allocated before attempting to use them, once we are done we do free
everything, if the function fails at any point it does return NULL
instead of
a valid pointer.
However, it is not rigorous resource management as conceptualized above. What
if the file at directory/filename
does not exist and fopen
fails? The
function returns without freeing path
which was just allocated. The user is
not aware this resource exists and thus will never be able to free it.
Similarly, what if the function find_data
fails, for instance because the file
is improperly formatted? Both path
and f
will not be freed correctly.
A first naive attempt to solve this would be the following:
char* get_some_data(const char* filename) {
FILE* f;
const char* prefix = "directory/";
char *path, *result;
if (!(path = malloc(strlen(prefix) + strlen(filename) + 1))) {
return NULL;
}
strcpy(path, prefix);
strcpy(path + strlen(prefix), filename);
if (!(f = fopen(path))) {
free(path);
return NULL;
} else if (!(result = find_data(f))) {
fclose(f);
free(path);
return NULL;
}
free(path);
fclose(f);
return result;
}
Now, for every failed resource allocation, we free all previous resources that
were allocated. However, this is not scalable and maintainable as every new
resource requires to change a factor of n
line of codes where n
is the
current number of resources allocated.
This illustrates another important point I want to make: try to always only have
one return
statement in each function. Of course, this is a rule of thumb and
there are plenty of legitimate cases for having multiple returns in a function,
however one can easily see the benefit of guaranteeing a single point of entry
and exit in a function: it makes resource management a lot more, well,
manageable.
Now let’s consider this variant:
char* get_some_data(const char* filename) {
FILE* f;
const char* prefix = "directory/";
char *path, *result;
if (!(path = malloc(strlen(prefix) + strlen(filename) + 1))) {
result = NULL;
} else {
strcpy(path, prefix);
strcpy(path + strlen(prefix), filename);
if (!(f = fopen(path))) {
result = NULL;
} else {
result = find_data(f);
}
}
free(path);
fclose(f);
return result;
}
Now, because we have removed all other exit points of the function, we now for sure that no matter what happens, we will get to the resource freeing section at the end.
However, there is still a big problem. Can you see it?
If the first malloc
fails, f
will be let uninitialized and given to
fclose
. This way of structuring the code requires to initialize all
resources to something that can be detected as “unallocated” in the resource
freeing section.
But would initializing f
to NULL
be enough? No! As mentioned in the
introduction, fclose
doest not accept NULL
as an input, so we need to
add a check before calling fclose
.
Finally, this version is something I could get by:
char* get_some_data(const char* filename) {
FILE* f = NULL;
const char* prefix = "directory/";
char *path = NULL, *result = NULL;
if (!(path = malloc(strlen(prefix) + strlen(filename) + 1))) {
result = NULL;
} else {
strcpy(path, prefix);
strcpy(path + strlen(prefix), filename);
if (!(f = fopen(path))) {
result = NULL;
} else {
result = find_data(f);
}
}
free(path);
f (f) fclose(f);
return result;
}
While initializing path
is not strictly necessary (if you think about it, in
the final section it can only be NULL
or a valid pointer), it’s a good habit
to initialize it, in case we add more resources before it in the future.
One downside of this structure is that every time we need to do some stuff between two resource allocations, typically formatting an allocated file path before opening the corresponding file, we add one scope level. This might not be a problem for some people, but for others like me who like to wrap their code at the ol' reliable 80 chars, horizontal space is a scarce resource. A neat trick (but prone to very confusing code) is to use the comma-separated statements.
As indicated above, a series of statements separated by a comma ,
will
evaluate to the last one. So we can do:
char* get_some_data(const char* filename) {
FILE* f = NULL;
const char* prefix = "directory/";
char *path = NULL, *result = NULL;
if (!(path = malloc(strlen(prefix) + strlen(filename) + 1))) {
result = NULL;
} else if (strcpy(path, prefix),
strcpy(path + strlen(prefix), filename),
!(f = fopen(path))) {
result = NULL;
} else {
result = find_data(f);
}
free(path);
f (f) fclose(f);
return result;
}
I don’t believe there is a perfect, error-proof method to write C code, but I have plenty of experience with this one and sticking to those simple rules seem to have been consistently profitable to the quality and robustness of my code.
I should say that if you intend to get into formal verification using for
instance frama-c, this code structure is a good fit,
since it’s fairly easy to prove, in the last else
scope, that all resources
are valid.
If you want more examples using this structure, I do have a lot of code formatted this way in my repositories, for instance the SNDC parser here