133

In the Linux kernel code I found the following thing which I can not understand.

 struct bts_action {
 u16 type;
 u16 size;
 u8 data[0];
 } __attribute__ ((packed));

The code is here: http://lxr.free-electrons.com/source/include/linux/ti_wilink_st.h

What's the need and purpose of an array of data with zero elements?

S.S. Anne
15.7k8 gold badges42 silver badges82 bronze badges
asked Feb 1, 2013 at 9:42
8
  • I'm not sure if there should be either a zero-length-arrays or struct-hack tag ... Commented Feb 1, 2013 at 11:42
  • @hippietrail, because often when someone asks what this struct is, they don't know that it is referred to as "flexible array member". If they did, they could have easily found their answer. Since they don't, they can't tag the question as such. That is why we don't have such a tag. Commented Feb 1, 2013 at 12:34
  • 12
    Vote to reopen. I agree that this was not a duplicate, because none of the other posts addresses the combination of a non-standard "struct hack" with zero length and the well-defined C99 feature flexible array member. I also think it is always of benefit for the C programming community to shed some light on any obscure code from the Linux kernel. Mainly since many people have the impression that the Linux kernel is some sort of state of the art C code, for reasons unknown. While in reality it is a terrible mess flooded with non-standard exploits that never should be regarded as some C canon. Commented Feb 1, 2013 at 13:01
  • 5
    Not a duplicate - isn't the first time I've seen someone close a question unnecessarily. Also I think this question adds to the SO Knowledge base. Commented Feb 1, 2013 at 13:15
  • 1
    Possible duplicate of What happens if I define a 0-size array in C/C++? Commented May 14, 2018 at 13:31

5 Answers 5

151

This is a way to have variable sizes of data, without having to call malloc (kmalloc in this case) twice. You would use it like this:

struct bts_action *var = kmalloc(sizeof(*var) + extra, GFP_KERNEL);

This used to be not standard and was considered a hack (as Aniket said), but it was standardized in C99. The standard format for it now is:

struct bts_action {
 u16 type;
 u16 size;
 u8 data[];
} __attribute__ ((packed)); /* Note: the __attribute__ is irrelevant here */

Note that you don't mention any size for the data field. Note also that this special variable can only come at the end of the struct.


In C99, this matter is explained in 6.7.2.1.16 (emphasis mine):

As a special case, the last element of a structure with more than one named member may have an incomplete array type; this is called a flexible array member. In most situations, the flexible array member is ignored. In particular, the size of the structure is as if the flexible array member were omitted except that it may have more trailing padding than the omission would imply. However, when a . (or ->) operator has a left operand that is (a pointer to) a structure with a flexible array member and the right operand names that member, it behaves as if that member were replaced with the longest array (with the same element type) that would not make the structure larger than the object being accessed; the offset of the array shall remain that of the flexible array member, even if this would differ from that of the replacement array. If this array would have no elements, it behaves as if it had one element but the behavior is undefined if any attempt is made to access that element or to generate a pointer one past it.

Or in other words, if you have:

struct something
{
 /* other variables */
 char data[];
}
struct something *var = malloc(sizeof(*var) + extra);

You can access var->data with indices in [0, extra). Note that sizeof(struct something) will only give the size accounting for the other variables, i.e. gives data a size of 0.


It may be interesting also to note how the standard actually gives examples of mallocing such a construct (6.7.2.1.17):

struct s { int n; double d[]; };
int m = /* some value */;
struct s *p = malloc(sizeof (struct s) + sizeof (double [m]));

Another interesting note by the standard in the same location is (emphasis mine):

assuming that the call to malloc succeeds, the object pointed to by p behaves, for most purposes, as if p had been declared as:

struct { int n; double d[m]; } *p;

(there are circumstances in which this equivalence is broken; in particular, the offsets of member d might not be the same).

answered Feb 1, 2013 at 9:49
Sign up to request clarification or add additional context in comments.

3 Comments

To be clear, the original code in the question is still not standard in C99 (nor C11), and would still be considered a hack. The C99 standardization must omit the array bound.
What's [0, extra)?
38

This is a hack actually, for GCC (C90) in fact.

It's also called a struct hack.

So the next time, I would say:

struct bts_action *bts = malloc(sizeof(struct bts_action) + sizeof(char)*100);

It will be equivalent to saying:

struct bts_action{
 u16 type;
 u16 size;
 u8 data[100];
};

And I can create any number of such struct objects.

Peter Mortensen
31.4k22 gold badges110 silver badges134 bronze badges
answered Feb 1, 2013 at 9:45

Comments

8

The idea is to allow for a variable-sized array at the end of the struct. Presumably, bts_action is some data packet with a fixed-size header (the type and size fields), and variable-size data member. By declaring it as a 0-length array, it can be indexed just as any other array. You'd then allocate a bts_action struct, of say 1024-byte data size, like so:

size_t size = 1024;
struct bts_action* action = (struct bts_action*)malloc(sizeof(struct bts_action) + size);

See also: http://c2.com/cgi/wiki?StructHack

answered Feb 1, 2013 at 9:48

3 Comments

@Aniket: I'm not entirely sure from whence comes that idea.
in C++ yes, in C, not needed.
@sheu, it comes from the fact that your style of writing malloc makes you repeat yourself multiple times and if ever the type of action changes, you have to fix it multiple times. Compare the following two for yourself and you will know: struct some_thing *variable = (struct some_thing *)malloc(10 * sizeof(struct some_thing)); vs. struct some_thing *variable = malloc(10 * sizeof(*variable)); The second one is shorter, cleaner and clearly easier to change.
7

The code is not valid C (see this). The Linux kernel is, for obvious reasons, not in the slightest concerned with portability, so it uses plenty of non-standard code.

What they are doing is a GCC non-standard extention with array size 0. A standard compliant program would have written u8 data[]; and it would have meant the very same thing. The authors of the Linux kernel apparently love to make things needlessly complicated and non-standard, if an option to do so reveals itself.

In older C standards, ending a struct with an empty array was known as "the struct hack". Others have already explained its purpose in other answers. The struct hack, in the C90 standard, was undefined behavior and could cause crashes, mainly since a C compiler is free to add any number of padding bytes at the end of the struct. Such padding bytes may collide with the data you tried to "hack" in at the end of the struct.

GCC early on made a non-standard extension to change this from undefined to well-defined behavior. The C99 standard then adapted this concept and any modern C program can therefore use this feature without risk. It is known as flexible array member in C99/C11.

answered Feb 1, 2013 at 13:27

9 Comments

I doubt that "the linux kernel is not concerned with portability". Perhaps you meant portability to other compilers? It's true that it is quite entwined with features of gcc.
Nevertheless, I think this particular piece of code is not a mainstream code and is probably left out because its author didn't pay much attention to it. The license says its about some texas instruments drivers, so it's unlikely the core programmers of the kernel paid any attention to it. I'm pretty sure the kernel developers are constantly updating old code according to new standards or new optimizations. It's just too big to make sure everything is updated!
@Shahbaz With the "obvious" part, I meant portability to other operative systems, which naturally wouldn't make any sense. But they don't seem to give a damn about portability to other compilers either, they have used so many GCC extensions that Linux will not likely ever get ported to another compiler.
@Shahbaz As for the case of anything labelled Texas Instruments, TI themselves are notorious for producing the most useless, crappy, naive C code ever seen, in their app notes for various TI chips. If the code originates from TI, then all bets regarding the chance of interpreting something useful from it are off.
It's true that linux and gcc are inseparable. The Linux kernel is also quite hard to understand (mostly because an OS is complicated anyway). My point though, was that it's not nice to say "The authors of the Linux kernel apparently love to make things needlessly complicated and non-standard, if an option to do so reveals itself" due to a third-party-ish bad coding practice.
|
2

Another usage of zero length array is as a named label inside a struct to assist compile time struct offset check.

Suppose you have some large struct definitions (spans multiple cache lines) that you want to make sure they are aligned to cache line boundary both in the beginning and in the middle where it crosses the boundary.

struct example_large_s
{
 u32 first; // align to CL
 u32 data;
 ....
 u64 *second; // align to second CL after the first one
 ....
};

In code you can declare them using GCC extensions like:

__attribute__((aligned(CACHE_LINE_BYTES)))

But you still want to make sure this is enforced in runtime.

ASSERT (offsetof (example_large_s, first) == 0);
ASSERT (offsetof (example_large_s, second) == CACHE_LINE_BYTES);

This would work for a single struct, but it would be hard to cover many structs, each has different member name to be aligned. You would most likely get code like below where you have to find names of the first member of each struct:

assert (offsetof (one_struct, <name_of_first_member>) == 0);
assert (offsetof (one_struct, <name_of_second_member>) == CACHE_LINE_BYTES);
assert (offsetof (another_struct, <name_of_first_member>) == 0);
assert (offsetof (another_struct, <name_of_second_member>) == CACHE_LINE_BYTES);

Instead of going this way, you can declare a zero length array in the struct acting as a named label with a consistent name but does not consume any space.

#define CACHE_LINE_ALIGN_MARK(mark) u8 mark[0] __attribute__((aligned(CACHE_LINE_BYTES)))
struct example_large_s
{
 CACHE_LINE_ALIGN_MARK (cacheline0);
 u32 first; // align to CL
 u32 data;
 ....
 CACHE_LINE_ALIGN_MARK (cacheline1);
 u64 *second; // align to second CL after the first one
 ....
};

Then the runtime assertion code would be much easier to maintain:

assert (offsetof (one_struct, cacheline0) == 0);
assert (offsetof (one_struct, cacheline1) == CACHE_LINE_BYTES);
assert (offsetof (another_struct, cacheline0) == 0);
assert (offsetof (another_struct, cacheline1) == CACHE_LINE_BYTES);
answered Sep 15, 2016 at 17:53

1 Comment

Interesting idea. Just a note that 0-length arrays are not allowed by the standard, so this is a compiler-specific thing. Also, it might be a good idea to quote gcc's definition of the behavior of 0-length arrays in a struct definition, in the very least to show whether it could introduce padding before or after the declaration.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.