C struct
names with no verbosity
In the C language, a compound, heterogeneous type is defined according to the pattern
struct S { ... };
where the user-selected identifier S
is the name of the new type.
However, to designate the type for whatever purpose, e.g. in declaring variables, struct
/union
fields, function parameters and results, etc., S
alone does not work; the phrase struct S
has to be used instead.
For many C programmers, having to type two words to designate a single notion is a sign of redundance which they would rather avoid.
Notably, the problem does not occur in C++, where any of struct S
and S
can be used to refer to the same type.
Ideally, we would like to achieve the same or as close as possible in C, and with minimum coding and language distortion.
In the following note, after briefly discussing some ways of avoiding struct
-related verbosity, I propose a simple, concise, and universally applicable solution that, surprisingly, seems to be missing in the C literature.
The way to enforce a single name as an equivalent of struct S
is, of course, to introduce that name as an alias with a typedef
.
For example, along with struct S { ... }
, we may define
typedef struct S T;
In fact, there is no need for the new name T
.
The names given to struct
s, such as S
here, belong to a different namespace than the names introduced with a typedef
– here, T
, and therefore no name clash occurs if we say
typedef struct S S;
Now S
designates the same type as struct S
, and we avoid inventing a new name for the struct
.
Even better, the typedef
can be merged with the struct
definition:
typedef struct { ... } S;
so that the struct
itself remains anonymous – a nameless type expression, and only through typedef
that type receives a name.
So far, so good, but what if the struct
has to contain, among other things, a pointer to the same struct
type?
Self-referencing is typical of struct
s representing nodes of linked data structures, such as lists and trees.
Unfortunately,
typedef struct { ... S* ptr; ... } S; // incorrect
is incorrect, because, within the struct
body, S
is unknown – it only gets known after typedef
is completed and the respective name binding becomes effective.
The definition can be mended at the cost of reverting to a named struct
and using struct S
rather than just S
in the struct
body.
In essence, we are embedding a complete named struct
definition within the typedef
:
typedef struct S { ... struct S* ptr; ... } S;
Another possibility is to separate again the typedef
from the struct
definition, but this time placing the former before the latter.
This is a bit unintuitive since, within the typedef
, the S
in the struct S
part is not yet defined.
But it does work, and the advantage is that the alias S
introduced by the typedef
is known when the struct
definition is processed, so it can be made use of:
typedef struct S S; struct S { ... S* ptr; ... };
Now that we use S
as a type name not only in the remainder of the program but also within the struct
definition, there is only a relatively small amount of verbosity remaining: repeating struct S
twice, and the need to attach typedef
to each struct
.
Could we do still better?
What we have in the above two lines is a pair of coupled constructs – each one referring to the other – and it is desirable to unite them in a single one. Although C the language cannot help further, its preprocessor can. The following macro definition does the job:
#define struct(n) typedef struct n n; struct n
Having equiped ourselves with it, any struct
type S
can be defined as follows:
struct(S) { ... S* ptr; ... };
Not only we now need not type anything superfluous when defining self-referential data structures, but the typedef
s also disappear from the text: there is only a single typedef
– in the above macro, written once for all uses in all programs.
Furthermore, we have deliberately used struct
as the name of the macro, so that no new keyword need be remembered and the deviation from the customary syntax of a struct
definition is minimal.
It may seem that having a macro named struct
would prohibit the use of struct
as a keyword, but it doesn't.
In fact, the macro itself makes use of struct
precisely as a keyword, and any other use of struct
as a keyword also remains unaffected.
This is important, because otherwise the above macro definition would be ruinous to the parts of the program, perhaps already written, which happen to define and use struct
types in the traditional way.
As a final observation, some textbook authors and programmers prefer to typedef
a pointer type rather than the struct
which is being pointed at:
typedef struct S* P;
and then use P
in place of struct S*
within the S
's body
struct S { ... P ptr; ... };
and elsewhere, while still referring to the original type as struct S
.
There seems to be no benefit in this approach, compared to the above one.
There is even an additional disadvantage in having to deal with two different names related to the same type.
bbb, 2016