Solarian Programmer

My programming ramblings

C99/C11 dynamic array that mimics C++'s std::vector - API improvements

Posted on January 8, 2017 by Sol

In my last article I wrote a portable C dynamic array, auto-resizable and theoretically capable to store any C representable data type. While the original implementation works, it is a bit cumbersome to use in practice. A better approach is to create a dynamic array that looks like a normal C array, but it is capable to automatically adjust its size as needed. I took from Sean Barrett’s stb libraries the idea that you can store the size information of an array at the beginning of the array, while publicly exposing a pointer to where the array data resides in memory. The advantage of this approach is that from a user point of view a dynamic array can be used like a C array, except for the creation and destruction phases.

It goes without saying that this article is written for C programmers or people that use a C compiler. If you are using C++, you usually don’t need to implement a vector type, you already have one in the STL, use it.

It will be easier to understand how the general case works if we start with an explicit implementation for a dynamic array of integers. Conceptually, our code has three phases

 1 #include <stdio.h>
 2 #include <stdlib.h>
 3 
 4 int main() {
 5     // 1. Create and initialize the array
 6     // ...
 7 
 8     // 2. Fill the array with data and do something with this data
 9     // ...
10 
11     // 3. Destroy the array, freeing the memory
12     // ...
13 
14     return 0;
15 }

In phase 1, we will start by initializing the array:

1 // 1. Create and initialize the array
2 int *arr = NULL;
3 do {
4     size_t *raw = malloc(2 * sizeof(size_t));
5     raw[0] = 0;        // size - no. of elements currently in the array
6     raw[1] = 0;        // capacity - how many elements can the array currently store
7     arr = (void *) &raw[2];
8 } while(0);

The do while loop that runs only once, is currently used to hide the raw pointer initialization. The user can access directly only the arr pointer that, after the initialization phase, points to the first address, after the size and capacity information. Later, we will move the do while loop in a macro that will abstract the initialization phase.

Next, let’s put some data in our array:

 1 // 1. Create and initialize the array
 2 int *arr = NULL;
 3 do {
 4     size_t *raw = malloc(2 * sizeof(size_t));
 5     raw[0] = 0;        // size - no. of elements currently in the array
 6     raw[1] = 0;        // capacity - how many elements can the array currently store
 7     arr = (void *) &raw[2];
 8 } while(0);
 9 
10 int data = 10;
11 
12 // 2. Fill the array with data
13 do {
14     size_t *raw = ((size_t *) (arr) - 2);
15     raw[0] = raw[0] + 1;
16     // Increase the array capacity if necessary
17     if(raw[1] == 0) {
18         raw[1] = 1;
19         raw = realloc(raw, 2 * sizeof(size_t) + raw[1] * sizeof((data)));
20         (arr) = (void *) &raw[2];
21     }
22     if(raw[0] > raw[1]) {
23         raw[1] = 2 * raw[1];
24         raw = realloc(raw, 2 * sizeof(size_t) + raw[1] * sizeof((data)));
25         (arr) = (void *) &raw[2];
26     }
27     // Put the data in place
28     arr[raw[0] - 1] = (data);
29 } while(0)

Please note the pattern that we use to access the raw pointer that points to the beginning of our array. After that, we resize the array if necessary and copy the data to the array buffer. At this point, the user can use a simple for loop to iterate over the content of the array. In order to do this we could implement a macro functions that will get the size of the array:

1 #define ARRAY_SIZE(ARR) (*((size_t *) ARR - 2))

It is interesting to observe how the array is resized: after 1 insertion the array will have a capacity of 1, after 2 insertions the capacity will be 2, after 3 insertions the capacity will be 4 and so on. The array is not resized after every insertion but only when it is necessary.

And here is an example of how you can loop over the array and modify the data:

1 // Loop over the array and double every element:
2 for(size_t i = 0; i < ARRAY_SIZE(arr)); ++i) {
3     arr[i] *= 2;
4 }

In order to destroy the array and free the memory we could use something like this:

1 // Free array
2 do {
3     size_t *raw = ((size_t *) (arr) - 2);
4     free(raw);
5     arr = NULL;
6 } while(0);

Now, we can take most of the above code and create a few macros that could create, push any type of data and destroy the array. I will create a header file, array.h, that will contain the generalized helper macros:

 1 // array.h - Helper macros to create and manipulate a dynamic C array
 2 
 3 #pragma once
 4 #include <stdlib.h>
 5 
 6 #define ARRAY_CREATE(T, arr) \
 7     T *arr = NULL;\
 8     do {\
 9         size_t *raw = malloc(2 * sizeof(size_t));\
10         raw[0] = 0;\
11         raw[1] = 0;\
12         arr = (void *) &raw[2];\
13     } while(0)
14 
15 #define ARRAY_DESTROY(arr)    \
16     do {\
17         size_t *raw = ((size_t *) (arr) - 2);\
18         free(raw);\
19         arr = NULL;\
20     } while(0)
21 
22 #define ARRAY_SIZE(ARR) (*((size_t *) ARR - 2))
23 #define ARRAY_CAPACITY(ARR) (*((size_t *) ARR - 1))
24 
25 #define ARRAY_PUSH(arr, value)\
26     do {\
27         size_t *raw = ((size_t *) (arr) - 2);\
28         raw[0] = raw[0] + 1;\
29         if(raw[1] == 0) {\
30             raw[1] = 1;\
31             raw = realloc(raw, 2 * sizeof(size_t) + raw[1] * sizeof((value)));\
32             (arr) = (void *) &raw[2];\
33         }\
34         if(raw[0] > raw[1]) {\
35             raw[1] = 2 * raw[1];\
36             raw = realloc(raw, 2 * sizeof(size_t) + raw[1] * sizeof((value)));\
37             (arr) = (void *) &raw[2];\
38         }\
39         arr[raw[0] - 1] = (value);\
40     } while(0)

I will leave as an exercise for the reader the implementation of an ARRAY_RESIZE and ARRAY_SHRINK_TO_FIT macros. Hint, in order to implement ARRAY_SHRINK_TO_FIT use realloc to resize the memory buffer in order to have size/raw[0] equal to capacity/raw[1].

Next, let’s try a more challenging example that will use one array of integers and one array of a user defined data type. We’ll initialize the arrays, add 50 elements to each array and print the content of the arrays. At the end we will release the memory used by our two dynamic arrays:

 1 #include <stdio.h>
 2 #include "array.h"
 3 
 4 typedef struct {
 5     int x;
 6     int y;
 7 } Vector2i;
 8 
 9 int main() {
10     ARRAY_CREATE(int, arr);
11     ARRAY_CREATE(Vector2i, arr2);
12 
13     // Check resize and push data
14     for(int i = 0; i < 50; ++i) {
15         ARRAY_PUSH(arr, i*i - i);
16         ARRAY_PUSH(arr2, ((Vector2i){i*i, -i}));
17     }
18 
19     for(size_t i = 0; i < ARRAY_SIZE(arr); ++i) {
20         printf("%d ", arr[i]);
21     }
22     printf("\n\n");
23 
24     for(size_t i = 0; i < ARRAY_SIZE(arr2); ++i) {
25         printf("(%d, %d)", arr2[i].x, arr2[i].y);
26     }
27     printf("\n\n");
28 
29     // Free array
30     ARRAY_DESTROY(arr);
31     ARRAY_DESTROY(arr2);
32     return 0;
33 }

This is the result of running the above code on my machine:

1 0 0 2 6 12 20 30 42 56 72 90 110 132 156 182 210 240 272 306 342 380 420 462 506 552 600 650 702 756 812 
2 870 930 992 1056 1122 1190 1260 1332 1406 1482 1560 1640 1722 1806 1892 1980 2070 2162 2256 2352
3 
4 (0, 0)(1, -1)(4, -2)(9, -3)(16, -4)(25, -5)(36, -6)(49, -7)(64, -8)(81, -9)(100, -10)(121, -11)(144, -12)
5 (169, -13)(196, -14)(225, -15)(256, -16)(289, -17)(324, -18)(361, -19)(400, -20)(441, -21)(484, -22)(529, -23)
6 (576, -24)(625, -25)(676, -26)(729, -27)(784, -28)(841, -29)(900, -30)(961, -31)(1024, -32)(1089, -33)(1156, -34)
7 (1225, -35)(1296, -36)(1369, -37)(1444, -38)(1521, -39)(1600, -40)(1681, -41)(1764, -42)(1849, -43)(1936, -44)
8 (2025, -45)(2116, -46)(2209, -47)(2304, -48)(2401, -49)

Please note that all the above code was tested with Clang 3.8, GCC 6.3 and Visual Studio 2017. Normally, it should work with any recent C99/C11 compiler, with the notable exception of Visual Studio, for which parts of the C99 standard were added starting with the 2013 version. If you are using Visual Studio 2017 and up, you should be able to use the full C11 Clang toolchain from Visual Studio instead of the CL compiler.

If you want to learn more about C99/C11 I would recommend reading 21st Century C: C Tips from the New School by Ben Klemens:

or the classic C Bible, The C Programming Language by B.W. Kernighan, D.M. Ritchie:


comments powered by Disqus