What's the need of array with zero elements?

Question 1

In the Linux kernel code I found the following thing which I can not understand.

 struct bts_action {
 u16 type;
 u16 size;
 u8 data[0];
 } __attribute__ ((packed));

The code is here: http://lxr.free-electrons.com/source/include/linux/ti_wilink_st.h

What's the need and purpose of an array of data with zero elements?

Question 2

I'm not sure if there should be either a zero-length-arrays or struct-hack tag ...

Question 3

@hippietrail, because often when someone asks what this struct is, they don't know that it is referred to as "flexible array member". If they did, they could have easily found their answer. Since they don't, they can't tag the question as such. That is why we don't have such a tag.

Question 4

Vote to reopen. I agree that this was not a duplicate, because none of the other posts addresses the combination of a non-standard "struct hack" with zero length and the well-defined C99 feature flexible array member. I also think it is always of benefit for the C programming community to shed some light on any obscure code from the Linux kernel. Mainly since many people have the impression that the Linux kernel is some sort of state of the art C code, for reasons unknown. While in reality it is a terrible mess flooded with non-standard exploits that never should be regarded as some C canon.

Question 5

Not a duplicate - isn't the first time I've seen someone close a question unnecessarily. Also I think this question adds to the SO Knowledge base.

Question 6

Possible duplicate of What happens if I define a 0-size array in C/C++?

Question 7

This is a way to have variable sizes of data, without having to call malloc (kmalloc in this case) twice. You would use it like this:

struct bts_action *var = kmalloc(sizeof(*var) + extra, GFP_KERNEL);

This used to be not standard and was considered a hack (as Aniket said), but it was standardized in C99. The standard format for it now is:

struct bts_action {
 u16 type;
 u16 size;
 u8 data[];
} __attribute__ ((packed)); /* Note: the __attribute__ is irrelevant here */

Note that you don't mention any size for the data field. Note also that this special variable can only come at the end of the struct.

In C99, this matter is explained in 6.7.2.1.16 (emphasis mine):

As a special case, the last element of a structure with more than one named member may have an incomplete array type; this is called a flexible array member. In most situations, the flexible array member is ignored. In particular, the size of the structure is as if the flexible array member were omitted except that it may have more trailing padding than the omission would imply. However, when a . (or ->) operator has a left operand that is (a pointer to) a structure with a flexible array member and the right operand names that member, it behaves as if that member were replaced with the longest array (with the same element type) that would not make the structure larger than the object being accessed; the offset of the array shall remain that of the flexible array member, even if this would differ from that of the replacement array. If this array would have no elements, it behaves as if it had one element but the behavior is undefined if any attempt is made to access that element or to generate a pointer one past it.

Or in other words, if you have:

struct something
{
 /* other variables */
 char data[];
}
struct something *var = malloc(sizeof(*var) + extra);

You can access var->data with indices in [0, extra). Note that sizeof(struct something) will only give the size accounting for the other variables, i.e. gives data a size of 0.

It may be interesting also to note how the standard actually gives examples of mallocing such a construct (6.7.2.1.17):

struct s { int n; double d[]; };
int m = /* some value */;
struct s *p = malloc(sizeof (struct s) + sizeof (double [m]));

Another interesting note by the standard in the same location is (emphasis mine):

assuming that the call to malloc succeeds, the object pointed to by p behaves, for most purposes, as if p had been declared as:
struct { int n; double d[m]; } *p;
(there are circumstances in which this equivalence is broken; in particular, the offsets of member d might not be the same).

Question 8

To be clear, the original code in the question is still not standard in C99 (nor C11), and would still be considered a hack. The C99 standardization must omit the array bound.

Question 9

What's [0, extra)?

Question 10

@JL2210, en.wikipedia.org/wiki/Interval_(mathematics)#Terminology

Question 11

This is a hack actually, for GCC (C90) in fact.

It's also called a struct hack.

So the next time, I would say:

struct bts_action *bts = malloc(sizeof(struct bts_action) + sizeof(char)*100);

It will be equivalent to saying:

struct bts_action{
 u16 type;
 u16 size;
 u8 data[100];
};

And I can create any number of such struct objects.

Question 12

The idea is to allow for a variable-sized array at the end of the struct. Presumably, bts_action is some data packet with a fixed-size header (the type and size fields), and variable-size data member. By declaring it as a 0-length array, it can be indexed just as any other array. You'd then allocate a bts_action struct, of say 1024-byte data size, like so:

size_t size = 1024;
struct bts_action* action = (struct bts_action*)malloc(sizeof(struct bts_action) + size);

See also: http://c2.com/cgi/wiki?StructHack

Question 13

@Aniket: I'm not entirely sure from whence comes that idea.

Question 14

in C++ yes, in C, not needed.

Question 15

@sheu, it comes from the fact that your style of writing malloc makes you repeat yourself multiple times and if ever the type of action changes, you have to fix it multiple times. Compare the following two for yourself and you will know: struct some_thing *variable = (struct some_thing *)malloc(10 * sizeof(struct some_thing)); vs. struct some_thing *variable = malloc(10 * sizeof(*variable)); The second one is shorter, cleaner and clearly easier to change.

Question 16

The code is not valid C (see this). The Linux kernel is, for obvious reasons, not in the slightest concerned with portability, so it uses plenty of non-standard code.

What they are doing is a GCC non-standard extention with array size 0. A standard compliant program would have written u8 data[]; and it would have meant the very same thing. The authors of the Linux kernel apparently love to make things needlessly complicated and non-standard, if an option to do so reveals itself.

In older C standards, ending a struct with an empty array was known as "the struct hack". Others have already explained its purpose in other answers. The struct hack, in the C90 standard, was undefined behavior and could cause crashes, mainly since a C compiler is free to add any number of padding bytes at the end of the struct. Such padding bytes may collide with the data you tried to "hack" in at the end of the struct.

GCC early on made a non-standard extension to change this from undefined to well-defined behavior. The C99 standard then adapted this concept and any modern C program can therefore use this feature without risk. It is known as flexible array member in C99/C11.

Question 17

I doubt that "the linux kernel is not concerned with portability". Perhaps you meant portability to other compilers? It's true that it is quite entwined with features of gcc.

Question 18

Nevertheless, I think this particular piece of code is not a mainstream code and is probably left out because its author didn't pay much attention to it. The license says its about some texas instruments drivers, so it's unlikely the core programmers of the kernel paid any attention to it. I'm pretty sure the kernel developers are constantly updating old code according to new standards or new optimizations. It's just too big to make sure everything is updated!

Question 19

@Shahbaz With the "obvious" part, I meant portability to other operative systems, which naturally wouldn't make any sense. But they don't seem to give a damn about portability to other compilers either, they have used so many GCC extensions that Linux will not likely ever get ported to another compiler.

Question 20

@Shahbaz As for the case of anything labelled Texas Instruments, TI themselves are notorious for producing the most useless, crappy, naive C code ever seen, in their app notes for various TI chips. If the code originates from TI, then all bets regarding the chance of interpreting something useful from it are off.

Question 21

It's true that linux and gcc are inseparable. The Linux kernel is also quite hard to understand (mostly because an OS is complicated anyway). My point though, was that it's not nice to say "The authors of the Linux kernel apparently love to make things needlessly complicated and non-standard, if an option to do so reveals itself" due to a third-party-ish bad coding practice.

Question 22

Another usage of zero length array is as a named label inside a struct to assist compile time struct offset check.

Suppose you have some large struct definitions (spans multiple cache lines) that you want to make sure they are aligned to cache line boundary both in the beginning and in the middle where it crosses the boundary.

struct example_large_s
{
 u32 first; // align to CL
 u32 data;
 ....
 u64 *second; // align to second CL after the first one
 ....
};

In code you can declare them using GCC extensions like:

__attribute__((aligned(CACHE_LINE_BYTES)))

But you still want to make sure this is enforced in runtime.

ASSERT (offsetof (example_large_s, first) == 0);
ASSERT (offsetof (example_large_s, second) == CACHE_LINE_BYTES);

This would work for a single struct, but it would be hard to cover many structs, each has different member name to be aligned. You would most likely get code like below where you have to find names of the first member of each struct:

assert (offsetof (one_struct, <name_of_first_member>) == 0);
assert (offsetof (one_struct, <name_of_second_member>) == CACHE_LINE_BYTES);
assert (offsetof (another_struct, <name_of_first_member>) == 0);
assert (offsetof (another_struct, <name_of_second_member>) == CACHE_LINE_BYTES);

Instead of going this way, you can declare a zero length array in the struct acting as a named label with a consistent name but does not consume any space.

#define CACHE_LINE_ALIGN_MARK(mark) u8 mark[0] __attribute__((aligned(CACHE_LINE_BYTES)))
struct example_large_s
{
 CACHE_LINE_ALIGN_MARK (cacheline0);
 u32 first; // align to CL
 u32 data;
 ....
 CACHE_LINE_ALIGN_MARK (cacheline1);
 u64 *second; // align to second CL after the first one
 ....
};

Then the runtime assertion code would be much easier to maintain:

assert (offsetof (one_struct, cacheline0) == 0);
assert (offsetof (one_struct, cacheline1) == CACHE_LINE_BYTES);
assert (offsetof (another_struct, cacheline0) == 0);
assert (offsetof (another_struct, cacheline1) == CACHE_LINE_BYTES);

Question 23

Interesting idea. Just a note that 0-length arrays are not allowed by the standard, so this is a compiler-specific thing. Also, it might be a good idea to quote gcc's definition of the behavior of 0-length arrays in a struct definition, in the very least to show whether it could introduce padding before or after the declaration.

Shahbaz Shahbaz 47.8k19 gold badges120 silver badges187 bronze badges · Accepted Answer · 2013-02-01 09:49:40Z

This is a way to have variable sizes of data, without having to call malloc (kmalloc in this case) twice. You would use it like this:

struct bts_action *var = kmalloc(sizeof(*var) + extra, GFP_KERNEL);

This used to be not standard and was considered a hack (as Aniket said), but it was standardized in C99. The standard format for it now is:

struct bts_action {
 u16 type;
 u16 size;
 u8 data[];
} __attribute__ ((packed)); /* Note: the __attribute__ is irrelevant here */

Note that you don't mention any size for the data field. Note also that this special variable can only come at the end of the struct.

In C99, this matter is explained in 6.7.2.1.16 (emphasis mine):

As a special case, the last element of a structure with more than one named member may have an incomplete array type; this is called a flexible array member. In most situations, the flexible array member is ignored. In particular, the size of the structure is as if the flexible array member were omitted except that it may have more trailing padding than the omission would imply. However, when a . (or ->) operator has a left operand that is (a pointer to) a structure with a flexible array member and the right operand names that member, it behaves as if that member were replaced with the longest array (with the same element type) that would not make the structure larger than the object being accessed; the offset of the array shall remain that of the flexible array member, even if this would differ from that of the replacement array. If this array would have no elements, it behaves as if it had one element but the behavior is undefined if any attempt is made to access that element or to generate a pointer one past it.

Or in other words, if you have:

struct something
{
 /* other variables */
 char data[];
}
struct something *var = malloc(sizeof(*var) + extra);

You can access var->data with indices in [0, extra). Note that sizeof(struct something) will only give the size accounting for the other variables, i.e. gives data a size of 0.

It may be interesting also to note how the standard actually gives examples of mallocing such a construct (6.7.2.1.17):

struct s { int n; double d[]; };
int m = /* some value */;
struct s *p = malloc(sizeof (struct s) + sizeof (double [m]));

Another interesting note by the standard in the same location is (emphasis mine):

assuming that the call to malloc succeeds, the object pointed to by p behaves, for most purposes, as if p had been declared as:
struct { int n; double d[m]; } *p;
(there are circumstances in which this equivalence is broken; in particular, the offsets of member d might not be the same).

To be clear, the original code in the question is still not standard in C99 (nor C11), and would still be considered a hack. The C99 standardization must omit the array bound.
@JL2210, en.wikipedia.org/wiki/Interval_(mathematics)#Terminology

CollectivesTM on Stack Overflow

What's the need of array with zero elements?

5 Answers 5

3 Comments

Comments

3 Comments

9 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

5 Answers 5

3 Comments

Comments

3 Comments

9 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related