For a data structure with indices (e.g. an array list, a dynamic array, etc...), should the indices be of type size_t
or int
? Is there a clear reason to use one over the other?
fooGetByIndex(struct foo* foo, size_t index);
or
fooGetByIndex(struct foo* foo, int index);
Until I had it suggested to me to use size_t
I'd always defaulted to int
without thinking much of it. Having experimented with both I'm not quite sure which makes for a better API.
There exist many discussions on size_t
vs int
on a more general level, and that's not what I'm asking. I'm interested in the more specific case of designing an API for a data structure that uses indices (i.e. is array-like) but abstracts away direct array access through an API.
Semantically size_t
is appropriate for indices of C arrays, which is the primary argument for it in this case. However if the C array is hidden behind an API (which might not even use one internally) that argument diminishes. Additionally being able to return -1
as an error value is much easier when using int
, whereas (size_t)-1
is arguably more error-prone and confusing for the user of the API, despite being well-defined and even used by the C standard library in its mbstowcs function.
If relevant, the two APIs I'm currently working on can be found on CodeReview here and here, though I'm looking for an answer that applies to API design of index-based data structures in general, not just those two examples.
Is using either size_t
or int
better API design in this case, or are both equally valid (i.e. the choice is subjective)?
2 Answers 2
I clearly would prefer size_t, as it is an unsigned integer, and indices are>= 0. You immediately know how to use this parameter.
It is no good style returning special values as -1 for error conditions. This will require extra code for checking. If you forget those checks at some places, this can cause hard to find bugs.
You should use an alternative way for error handling, e.g.:
Throw an exception:
If you e.g. request an index, you could return the error condition by return value and the index by parameter position:
bool GetMyIndex (size_t &result);
Usage:
size_t returned_index = 0; if (!GetMyIndex(returned_index)) { // handle the error }
-
Yeah, the good old C exceptions.bool3max– bool3max2019年05月29日 23:01:46 +00:00Commented May 29, 2019 at 23:01
There are various advantages for each of the possible approaches.
Using only one type for all indices is nice, allowing you to pass a pointer to an index, or in C++ a reference to an index, around without having to worry what exactly you are indexing.
Having an unsigned index is nice if the index values cannot ever be negative.
Having a signed index is nice because it means you don't have to be paranoid with loops like for (i = count - 1; i>= 0; --i). And you can use -1 to imply an invalid index.
Not having artificial restrictions because of the index type is nice. It's rubbish to use an int index on a 64 GByte machine that could easily handle much bigger indexes.
Not wasting space is nice. It's rubbish to have a 64 bit index that can access one of two items only.
int
is 32bitINT_MAX
is going to be more than enough capacity in practice. I suppose the question only makes sense whenint
is 32bit because when it's smaller you have to usesize_t
. This hidden assumption may well hold the answer, as designing aroundint
being a particular size isn't good C.int
is 32bit because when it's smaller you have to usesize_t
." Then again there's enough of a correlation betweenint
width and resource availability that on any platform with a 16bitint
32767 of something is going to be plenty. Data structures that need the extra range provided bysize_t
could arguably be considered a special case.