This is the primary module for my APL interpreter. The general idea is motivated by my two Stack Overflow questions, but the details I fear are easy to get wrong and I suspect there are a few corners that my eyes simply don't observe critically since they have the authority of having lain in the file so long without change.
The data elements here are always int
s. "Everything is an int
" was a guiding mantra in designing the whole project. So, via a tag/value encoding (comp.lang.c code review thread newer code review), values of many types can be embedded in these arrays. They're not generic in the normal C programming sense of the word. But by operating over int
s, it achieves genericity by virtue of the way it is used by the other modules.
This code doesn't include garbage collection or any other lifetime tracking as I'm still figuring out how I want to do that. So these functions allocate their results and expect the caller to manage freeing appropriately.
The ppnarg.h file provides counting of arguments in variadic macros. This is used to provide an easy interface to these functions. In order to create a [2][3][4] array, one calls.
array a = array_new(2,3,4);
The values can be accessed with the elem
function which returns a pointer to the specified element, so that dereferencing the resulting pointer
is an l-value and can be the target of assignment.
*elem(a,1,1,1) = 'k';
putchar(*elem(a,1,1,1));
Various other function families like slice*
and transpose*
provide other superpowers by creating new views of existing data for which the *elem()
function will index differently. One of my favorites is cast
which lets you hijack an existing C array and wrap the dynamic header around it.
In order to generically iterate through the row-major order of any array requires the coordinated use of several of these functions, but I haven't packaged this into its own function. The different places where I have needed to do this haven't really factored nicely into a separate function.
// assuming `a` as defined above, will iterate through all 2x3x4 elements
int n = productdims(a->rank,a->dims);
int scratch[a->rank];
for (int i=0; i<n; i++){
vector_index(i, a->dims, a->rank, scratch);
*elema(a,scratch) = 3;
}
vector_index
returns its scratch
parameter so the two functions can be composed into a single line. But it makes for a really long, unreadable line. At each stage, the n-d coordinates are available in the scratch array.
And my most recent augmentation is function-type arrays, which do not store all their declared data, but generate it as a function of the index. Thus, a [2][3][4] array of the constant 42
can be created without actually storing 24 42
s in memory.
array b = array_new_function(3, (int[]){2,3,4},
(int[]){1,42}, 2, constant);
*elem()
for each index will return 42
. An array which simply consists of sequential integers 0
..n
can be created with the j_vector
function-type array. And a considerable degree of constant-folding can be done by modifying the elements of the ->weight
member array and the cons
member. Indeed the values of any polynomial can be enumerated by creation of suitable dimension and weight parameters.
array c = iota(10); // function-type array generating 0..9 for indices 0..9
c->cons = 1; // generate 1..10 for 0..9
c->weight[c->rank-1] = c->cons = 2; // generate 2..20 by 2s
Do any spots look fishy? Or are there improvements I could make to the overall design or style or anything else?
#ifndef AR_H_
#define AR_H_
#include "../ppnarg.h"
typedef struct ar {
int type;
int rank; // number of dimensions
int *dims; // size of each dimension
int cons; // constant term of the indexing formula
int *weight; // corresponding coefficient in the indexing formula
int *data; // address of first array element
int *(*func)(struct ar *,int); // data function (if function type)
} *array;
enum type {
normal,
indirect,
function
};
int productdims(int rank, int dims[]);
array array_new_dims(int rank, int dims[]);
array array_new_function(int rank, int dims[],
int *data, int datan, int *(*func)(array,int)); // type=function
int *constant(array a,int idx);
int *j_vector(array a,int idx);
void loaddimsv(int rank, int dims[], va_list ap);
array (array_new)(int rank, ...);
#define array_new(...) (array_new)(PP_NARG(__VA_ARGS__),__VA_ARGS__)
array cast_dims(int data[], int rank, int dims[]); // type=indirect
array (cast)(int data[], int rank, ...); // type=indirect
#define cast(data,...) (cast)(data,PP_NARG(__VA_ARGS__),__VA_ARGS__)
array clone(array a); // type=indirect
array copy(array a);
int *elema(array a, int ind[]);
int *elemv(array a, va_list ap);
int *elem(array a, ...);
int *vector_index(int ind, int dims[], int n, int vec[]);
int ravel_index(int vec[], int dims[], int n);
void transpose2(array a);
void transpose(array a, int shift);
void transposea(array a, int spec[]);
array slice(array a, int i); // type=indirect
array slicea(array a, int spec[]); // type=indirect
array slices(array a, int s[], int f[]); // type=indirect
array extend(array a, int extra); // type=indirect
array cat(array x, array y);
array iota(int n); // type=function
array scalar(int n);
array (vector)(int n, ...);
#define vector(...) (vector)(PP_NARG(__VA_ARGS__),__VA_ARGS__)
int issolid(array a);
array makesolid(array a);
#endif
#include <stdarg.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "../ppnarg.h"
#include "ar.h"
int productdims(int rank, int dims[]){
int i,z=1;
for (i=0; i<rank; i++)
z *= dims[i];
return z;
}
// create new array object
array array_new_dims(int rank, int dims[]){
int datasz;
int i;
int x;
array z;
datasz=productdims(rank,dims);
z=malloc(sizeof*z
+ (rank+rank+datasz)*sizeof(int));
z->type = normal;
z->rank = rank;
z->dims = (int*)(z+1);
z->cons = 0;
z->weight = z->dims + rank;
z->data = z->weight + rank;
memmove(z->dims,dims,rank*sizeof(int));
for(x=1, i=rank-1; i>=0; i--){
z->weight[i] = x;
x *= z->dims[i];
}
return z;
}
// as a convention, a->data[0]==1
// indicating 1 (additional) data item (after data[0])
// data[1] is the actual data
int *constant(array a,int idx){
return a->data+1;
}
int *j_vector(array a,int idx){
a->data[1] = idx;
return a->data+1;
}
// create special function-type array
array array_new_function(int rank, int dims[],
int *data, int datan, int *(*func)(array,int)){
int i,x;
array z;
z=malloc(sizeof*z
+ (rank+rank+datan)*sizeof(int));
z->type = function;
z->rank = rank;
z->dims = (int*)(z+1);
z->cons = 0;
z->weight = z->dims + rank;
z->data = z->weight + rank;
memmove(z->data, data, datan+sizeof(int));
z->func = func;
memmove(z->dims,dims,rank*sizeof(int));
for(x=1, i=rank-1; i>=0; i--){
z->weight[i] = x;
x *= z->dims[i];
}
return z;
}
void loaddimsv(int rank, int dims[], va_list ap){
int i;
for (i=0; i<rank; i++){
dims[i]=va_arg(ap,int);
}
}
// create array, taking dims from variable argument list
array (array_new)(int rank, ...){
va_list ap;
int dims[rank];
va_start(ap,rank);
loaddimsv(rank,dims,ap);
va_end(ap);
return array_new_dims(rank,dims);
}
// create array object accessing existing array data
array cast_dims(int data[], int rank, int dims[]){
int i,x;
array z=malloc(sizeof*z
+ (rank+rank)*sizeof(int));
z->type = indirect;
z->rank = rank;
z->dims = (int*)(z+1);
z->cons = 0;
z->weight = z->dims + rank;
z->data = data;
memmove(z->dims,dims,rank*sizeof(int));
for(x=1, i=rank-1; i>=0; i--){
z->weight[i] = x;
x *= z->dims[i];
}
return z;
}
// create array accessing existing data taking dims from varargs
array (cast)(int data[], int rank, ...){
va_list ap;
int dims[rank];
va_start(ap,rank);
loaddimsv(rank,dims,ap);
va_end(ap);
return cast_dims(data, rank, dims);
}
// create a duplicate descriptor sharing array data
array clone(array a){
array z=malloc(sizeof*z
+ (a->rank+a->rank)*sizeof(int));
z->type = indirect;
z->rank = a->rank;
z->dims = (int*)(z+1);
z->cons = 0;
z->weight = z->dims + z->rank;
z->data = a->data;
memmove(z->dims,a->dims,z->rank*sizeof(int));
memmove(z->weight,a->weight,z->rank*sizeof(int));
return z;
}
// convert a ravel index to an index vector
int *vector_index(int ind, int dims[], int n, int vec[]){
int i,t=ind, *z=vec;
for (i=0; i<n; i++){
z[n-1-i] = t % dims[n-1-i];
t /= dims[n-1-i];
}
return z;
}
// convert index vector to ravel index
int ravel_index(int vec[], int dims[], int n){
int i,z=*vec;
for (i=0; i<n-1; i++){
z *= dims[i];
z += vec[i+1];
}
return z;
}
// create a new array object with data copied from array a
array copy(array a){
int datasz = productdims(a->rank,a->dims);
array z=malloc(sizeof*z
+ (a->rank+a->rank+datasz)*sizeof(int));
int i;
int x;
int ind[a->rank];
z->type = normal;
z->rank = a->rank;
z->dims = (int*)(z+1);
z->cons = 0;
z->weight = z->dims + z->rank;
z->data = z->weight + z->rank;
memmove(z->dims,a->dims,z->rank*sizeof(int));
for (x=1, i=z->rank-1; i>=0; i--){
z->weight[i] = x;
x *= z->dims[i];
}
for (i=0; i<datasz; i++){
vector_index(i,z->dims,z->rank,ind);
z->data[i] = *elema(a,ind);
}
return z;
}
// nb. cannot run on the ravel with non-solid indirect array
int *elemr(array a, int idx){
if (a->type==function) return a->func(a,idx);
else return a->data+idx;
}
int *elema(array a, int ind[]){
int idx = 0;
int i;
for (i=0; i<a->rank; i++){
idx += ind[i] * a->weight[i];
}
idx += a->cons;
return elemr(a,idx);
}
int *elemv(array a, va_list ap){
int idx = 0;
int i;
for (i=0; i<a->rank; i++){
int ind;
ind = va_arg(ap, int);
idx += ind * a->weight[i];
}
idx += a->cons;
return elemr(a,idx);
}
int *elem(array a, ...){
va_list ap;
int *z;
va_start(ap,a);
z = elemv(a,ap);
va_end(ap);
return z;
}
// elem(a,i,j) -> elem(a,j,i)
void transpose2(array a){
int t;
t = a->dims[0],
a->dims[0] = a->dims[1],
a->dims[1] = t;
t = a->weight[0],
a->weight[0] = a->weight[1],
a->weight[1] = t;
}
// rotate indices by shift amount
void transpose(array a, int shift){
int i;
int t;
while(shift){
if (shift>0){
t=a->dims[0];
for(i=1; i<a->rank; i++)
a->dims[i-1]=a->dims[i];
a->dims[a->rank-1]=t;
t=a->weight[0];
for(i=1; i<a->rank; i++)
a->weight[i-1]=a->weight[i];
a->weight[a->rank-1]=t;
--shift;
} else {
t=a->dims[a->rank-1];
for (i=a->rank-2; i>=0; i--)
a->dims[i+1]=a->dims[i];
a->dims[0]=t;
t=a->weight[a->rank-1];
for (i=a->rank-2; i>=0; i--)
a->weight[i+1]=a->weight[i];
a->weight[0]=t;
++shift;
}
}
}
// select new order of indexing with array of dimension indices
void transposea(array a, int spec[]){
int dims[a->rank];
int weight[a->rank];
int i;
for (i=0; i<a->rank; i++){
dims[i] = a->dims[spec[i]];
weight[i] = a->weight[spec[i]];
}
memcpy(a->dims, dims, a->rank*sizeof(int));
memcpy(a->weight, weight, a->rank*sizeof(int));
}
// return new indirect array of one item of array
array slice(array a, int i){
int rank = a->rank-1;
array z=malloc(sizeof(struct ar)
+ (rank+rank)*sizeof(int));
z->rank = rank;
z->dims = (int *)(z+1);
z->weight = z->dims + z->rank;
memcpy(z->dims, a->dims+1, z->rank*sizeof(int));
memcpy(z->weight, a->weight+1, z->rank*sizeof(int));
z->cons = i*a->weight[0];
z->data = a->data;
return z;
}
// return new indirect array selecting a single item (if 0<=spec[i]<dims[i])
// or all items (if spec[i]==-1) from each dimension
array slicea(array a, int spec[]){
int i,j;
int rank;
for (i=0, rank=0; i<a->rank; i++)
rank += spec[i]==-1;
int dims[rank];
int weight[rank];
for (i=0,j=0; i<rank; i++,j++){
while (spec[j]!=-1) j++;
if (j>=a->rank) break;
dims[i] = a->dims[i];
weight[i] = a->weight[j];
}
array z = cast_dims(a->data, rank, dims);
memcpy(z->weight,weight,rank*sizeof(int));
for (j=0; j<a->rank; j++){
if (spec[j]!=-1)
z->cons += spec[j] * a->weight[j];
}
return z;
}
// select a contiguous range from s[i] to f[i] of each dimension
array slices(array a, int s[], int f[]){
int rank = 0;
int i;
for (i=0; i<a->rank; i++){
rank += s[i] != f[i];
}
int dims[rank];
int weight[rank];
int j=0;
for (i=0; i<rank; i++){
while (s[j]==f[j]) ++j;
dims[i] = 1 + (s[j]<f[j] ? f[j]-s[j] : s[j]-f[j] );
weight[i] = s[j]<f[j] ? a->weight[j] : -a->weight[j];
++j;
}
array z = cast_dims(a->data, rank, dims);
memcpy(z->weight, weight, rank*sizeof(int));
for (i=0; i<a->rank; i++){
z->cons += s[i] * a->weight[i];
}
return z;
}
// prepend extra unit axes
// extend(vector(...),1) -> 1xN row vector
array extend(array a, int extra){
int rank = a->rank + extra;
int dims[rank];
int i;
for (i=0; i<extra; i++)
dims[i] = 1;
memcpy(dims+extra, a->dims, a->rank*sizeof(int));
return cast_dims(a->data, rank, dims);
}
// yield ravelled concatenation of two arrays
array cat(array x, array y){
int xsz = productdims(x->rank,x->dims);
int ysz = productdims(y->rank,y->dims);
int datasz = xsz + ysz;
array z=array_new(datasz);
int scratch[x->rank+y->rank];
int i;
for (i=0; i<xsz; i++)
*elem(z,i) = *elema(x,vector_index(i,x->dims,x->rank,scratch));
for (i=0; i<ysz; i++)
*elem(z,xsz+i) = *elema(y,vector_index(i,y->dims,y->rank,scratch));
return z;
}
// generate a j-vector
// which yields iota values as a function of argument indices
array iota(int n){
#if 0
array z = array_new(n);
int i;
for (i=0; i<n; i++)
*elem(z,i) = i;
return z;
#endif
return array_new_function(1,&n,(int[]){1,0},2,j_vector);
}
// generate a 1 element vector, ie. a scalar array object
array scalar(int n){
array z = array_new(1);
*elem(z,0) = n;
return z;
}
// create a vector array object initialized with variable arguments
array (vector)(int n, ...){
va_list ap;
array z = array_new(n);
int i;
va_start(ap,n);
for (i=0; i<n; i++)
*elem(z,i) = va_arg(ap, int);
va_end(ap);
return z;
}
int issolid(array a){
int i,x;
for (i=a->rank-1,x=1; i>=0; i--){
if (a->weight[i] != x)
return 0;
x *= a->dims[i];
}
return 1;
}
array makesolid(array a){
if (a->type==function || !issolid(a))
return copy(a);
return a;
}
#ifdef TESTMODULE
#include <stdlib.h>
#include <string.h>
#include "minunit.h"
int tests_run = 0;
static char *test_basic(){
array a = array_new_dims(1, (int[]){4});
*elem(a,3) = 12;
test_case(*elem(a,3)!=12);
array b = array_new(4,5);
*elem(b,3,4) = 5;
test_case(*elem(b,3,4)!=5);
array c = iota(4);
test_case(*elem(c,3)!=3);
//array d = iota(64);
//array e = cast(d->data, 2,2,2,2,2,2); // no longer works with j_vector iota
//test_case(*elem(e, 1,1,1,1,1,1) != 63);
//array f = cast(d->data, 4,4,4); // no longer works with j_vector iota
//test_case(*elem(f, 3,3,3) != 63);
return 0;
}
static char *all_tests(){
mu_run_test(test_basic);
return 0;
}
int main(){
char *result=all_tests();
if (result != 0) {
printf("%s\n",result);
} else {
printf("ALL TESTS PASSED\n");
}
printf("Tests run: %d\n", tests_run);
return result != 0;
}
#endif //defined TESTMODULE
/* file: minunit.h
cf.http://www.jera.com/techinfo/jtns/jtn002.html */
#define mu_assert(message, test) do { if (!(test)) return message; } while (0)
#define mu_run_test(test) do { char *message = test(); tests_run++; \
if (message) return message; } while (0)
#define test_case(c) do { if(c)return #c; } while(0)
extern int tests_run;
Additional related modules have been posted for review here and in comp.lang.c.
1 Answer 1
Tangential request: Please pick useful (long if necessary) names for your source files. A very good start would be to take the "decoder ring" you posted to comp.lang.c and make the appropriate substitutions throughout.
ar ==> array
en ==> encoding
st ==> symbol_table
wd ==> scanner (or perhaps "lexer", if that's what you mean)
vb ==> verbs (or perhaps "operators", or perhaps "verbs_and_operators")
You're using __VA_ARGS__
, so this must be C99 (or better). You should take advantage of C99's "declaration at point of use" syntax: i.e., instead of the K&R-C-looking
int productdims(int rank, int dims[]){
int i,z=1;
for (i=0; i<rank; i++)
z *= dims[i];
return z;
}
you should write
int product_of_dims(int rank, const int *dims)
{
int z = 1;
for (int i=0; i < rank; ++i) {
z *= dims[i];
}
return z;
}
Notice that in addition to declaring i
within the for-loop, I also
const
-qualified things that can safely beconst
- added whitespace, for readability
- added curly braces around even one-line control structures, for consistency
- replaced
[]
with*
in parameter declarations;[]
is a misleading syntax, since what's being passed is not an array but merely a pointer.
I'll rewrite your next function, too, to show what it could be like:
array *new_array_with_dims(int rank, const int *dims)
{
const int datasz = product_of_dims(rank, dims);
struct array *z = malloc(sizeof *z + (2*rank+datasz)*sizeof(int));
z->type = normal;
z->rank = rank;
z->cons = 0;
z->dims = (int *)(z + 1);
memmove(z->dims, dims, rank * sizeof(int));
z->weight = z->dims + rank;
for (int i = 0, wgt = 1; i < rank; ++i) {
z->weight[rank-i-1] = wgt;
wgt *= z->dims[rank-i-1];
}
z->data = z->weight + rank;
return z;
}
From 23 lines (21 non-blank) to 21 (17 non-blank), and with a nice self-explanatory name (eliminating one additional comment line), and properly const
-qualified.
I split up the initializations so that we conceptually initialize z
first, then z->dims
, then z->weight
, and finally z->data
. This isn't so much of a "good style" versus "bad style" thing; it was just a personal organizational choice.
Notice also that I quietly eliminated the confusing typedef struct ar *array
in favor of typedef struct array array
. Strongly prefer never to create an opaque typedef for a type that isn't a value type. The reader, being human, will instinctively assume that if he has an array a
and creates a copy b = a
, now he has a copy. If it turns out that what he really has is a handle to a
's data (not a new array at all), that's going to be frustrating to him.
Whereas, every C programmer can be expected to know that if you make a copy of array *a
with b = a
, you get a copy of the pointer. That's much more understandable.
If you are going to use typedefs for struct types, it's good style to write typedef struct X { ... } X;
for every struct type in your program. That'll save the maintainer from having to look up for each type X
whether he needs to be writing struct X
or merely X
; the answer is always "merely X
". (The C++ language does this for you automatically.)
API design issue: You should avoid creating a pair of names array_new
(macro) and array_new
(function) that do different things. The Standard Library provides macro/function pairs such as printf
and isdigit
, but it always makes sure that the macro and the function have exactly the same semantics. You've made it so that array_new(2,2)
creates a 2x2 array but (array_new)(2,2)
segfaults. This is very bad design.
Call the helper function something like array_new_helper
or array_new_impl
, so that nobody will ever call it by accident (e.g., through a function pointer, or through a helper macro that wraps its arguments in parentheses for safety).
int t;
t = a->dims[0],
a->dims[0] = a->dims[1],
a->dims[1] = t;
Looks like you need a function void swap(int*, int*)
!
See also Implement generic swap macro in C. Writing three lines of code by hand every time you need to swap two values will bite you eventually, if you do it long enough.
I'm sure there's more to improve, but I'm out of time. This has probably given you a lot to go on, anyway.
-
\$\begingroup\$ Very useful stuff. I like the reorganized
array_new_with_dims
very much. For the swap example, I felt that it helped motivate the following function which uses the same idea in a more complicated way. But, that function is probably better implemented with 3memcpy
s and a temp array. ... I'm very much torn about the implicit pointer in the typedef. I feel it can greatly reduce the number of*
s I need to type throughout the project, and it's limited to the 4 major abstractions (array, symbol-table, verb, abstract-number), but I at least should explain this convention in the README. \$\endgroup\$luser droog– luser droog2016年03月07日 01:44:06 +00:00Commented Mar 7, 2016 at 1:44 -
\$\begingroup\$ If "reducing the number of
*
in the source" is your goal, may I suggest Java? ;) C (and C++) have a nice default convention that anything ending in*
is a pointer/handle/reference type and anything not ending in*
is a value type; breaking that convention is generally a bad idea. At least maybe change the names fromar / array
toarray / arrayptr
? But then you're not saving any characters; you're just writing outptr
where I would have said*
. And consider: iftypedef array *arrayptr;
, then what is the type ofconst arrayptr p
? If you guessedconst array *
, you're wrong... \$\endgroup\$Quuxplusone– Quuxplusone2016年03月07日 07:14:33 +00:00Commented Mar 7, 2016 at 7:14 -
\$\begingroup\$ I have to concede that it may cause issues with
const
. I'm still learning how to use it. But for the rest, I think my specific setup makes this usage appropriate (or at least less inappropriate) than C code in the wild. eg the tally function. The array is initialized witharray a = getptr(w);
and used with the->
operator. So the fact that it's a pointer isn't hidden, but de-emphasized. ... But I know everyone probably says their own code should be the exception... \$\endgroup\$luser droog– luser droog2016年03月07日 07:50:57 +00:00Commented Mar 7, 2016 at 7:50 -
\$\begingroup\$ latest commit reflects everything, I think, except the implicit pointer which I'm still hemming and hawing over... found/fixed a bug in
slicea
usingi
instead ofj
. ...Very pleased with these improvements. Thank you. \$\endgroup\$luser droog– luser droog2016年03月08日 12:20:41 +00:00Commented Mar 8, 2016 at 12:20