C struct names with no verbosity

In the C language, a compound, heterogeneous type is defined according to the pattern

        struct S { ... };

where the user-selected identifier S is the name of the new type. However, to designate the type for whatever purpose, e.g. in declaring variables, struct/union fields, function parameters and results, etc., S alone does not work; the phrase struct S has to be used instead. For many C programmers, having to type two words to designate a single notion is a sign of redundance which they would rather avoid.

Notably, the problem does not occur in C++, where any of struct S and S can be used to refer to the same type. Ideally, we would like to achieve the same or as close as possible in C, and with minimum coding and language distortion.

In the following note, after briefly discussing some ways of avoiding struct-related verbosity, I propose a simple, concise, and universally applicable solution that, surprisingly, seems to be missing in the C literature.



The way to enforce a single name as an equivalent of struct S is, of course, to introduce that name as an alias with a typedef. For example, along with struct S { ... }, we may define

        typedef struct S T;

In fact, there is no need for the new name T. The names given to structs, such as S here, belong to a different namespace than the names introduced with a typedef – here, T, and therefore no name clash occurs if we say

        typedef struct S S;

Now S designates the same type as struct S, and we avoid inventing a new name for the struct.

Even better, the typedef can be merged with the struct definition:

        typedef struct { ... } S;

so that the struct itself remains anonymous – a nameless type expression, and only through typedef that type receives a name.


•  •

So far, so good, but what if the struct has to contain, among other things, a pointer to the same struct type? Self-referencing is typical of structs representing nodes of linked data structures, such as lists and trees. Unfortunately,

        typedef struct { ... S* ptr; ... } S;    // incorrect

is incorrect, because, within the struct body, S is unknown – it only gets known after typedef is completed and the respective name binding becomes effective. The definition can be mended at the cost of reverting to a named struct and using struct S rather than just S in the struct body. In essence, we are embedding a complete named struct definition within the typedef:

        typedef struct S { ... struct S* ptr; ... } S;

Another possibility is to separate again the typedef from the struct definition, but this time placing the former before the latter. This is a bit unintuitive since, within the typedef, the S in the struct S part is not yet defined. But it does work, and the advantage is that the alias S introduced by the typedef is known when the struct definition is processed, so it can be made use of:

        typedef struct S S;
        struct S { ... S* ptr; ... };

Now that we use S as a type name not only in the remainder of the program but also within the struct definition, there is only a relatively small amount of verbosity remaining: repeating struct S twice, and the need to attach typedef to each struct. Could we do still better?


•  •

What we have in the above two lines is a pair of coupled constructs – each one referring to the other – and it is desirable to unite them in a single one. Although C the language cannot help further, its preprocessor can. The following macro definition does the job:

        #define struct(n) typedef struct n n; struct n

Having equiped ourselves with it, any struct type S can be defined as follows:

        struct(S) { ... S* ptr; ... };

Not only we now need not type anything superfluous when defining self-referential data structures, but the typedefs also disappear from the text: there is only a single typedef – in the above macro, written once for all uses in all programs. Furthermore, we have deliberately used struct as the name of the macro, so that no new keyword need be remembered and the deviation from the customary syntax of a struct definition is minimal.

It may seem that having a macro named struct would prohibit the use of struct as a keyword, but it doesn't. In fact, the macro itself makes use of struct precisely as a keyword, and any other use of struct as a keyword also remains unaffected. This is important, because otherwise the above macro definition would be ruinous to the parts of the program, perhaps already written, which happen to define and use struct types in the traditional way.


•  •

As a final observation, some textbook authors and programmers prefer to typedef a pointer type rather than the struct which is being pointed at:

        typedef struct S* P;

and then use P in place of struct S* within the S's body

        struct S { ... P ptr; ... };

and elsewhere, while still referring to the original type as struct S. There seems to be no benefit in this approach, compared to the above one. There is even an additional disadvantage in having to deal with two different names related to the same type.


bbb, 2016