Scalability of running commands from user input

Question 1

Here is some code I have that has been extracted and shrunk down from a project of mine.

#include <stdio.h>
#include <stdint.h>
#include <time.h>
#include <string.h>
#define streq(x, y) (strcmp((x), (y)) == 0)
#define ARRAY_SIZE(x) (sizeof(x)/sizeof(x[0]))
typedef struct
{
 const char *cmd;
 void* (*fn)(void);
} __attribute__((__packed__)) Command;
void* getTime(void)
{
 return ((void*)((uintptr_t)time(NULL)));
}
void* getDay(void)
{
 time_t t = time(NULL);
 struct tm tm = *localtime(&t);
 return ((void*)((uintptr_t)(tm.tm_wday)));
}
static Command commands[] =
{
 {"time", getTime},
 {"day", getDay},
};
int main(int argc, char *argv[])
{
 for (int i = 0; i < ARRAY_SIZE(commands); ++i)
 {
 Command *p = commands+i;
 if (streq(argv[1], p->cmd)) printf("%d\n", (int)p->fn());
 }
}

When run with some given input, it runs the method associated with that input. Since this is a scaled down version of my project, it can only run two commands.

$ ./test-command day
4

Right now, my implementation should run with \$ O \left( n\right)\$ time complexity. This would be unacceptable with the hundreds of thousands of commands my project could scale to. I am looking for reviews on scalability and optimization, and how well my Command structure and function pointer template would scale as well.

Question 2

Sort your array and bin search it, no?

Question 3

@vnp Would a binary search be better than a hash search?

Question 4

It is much simpler. Hundreds of thousands of commands translate to 30 or so binary steps. If even that is unacceptable, maybe hash is a way to go. However, the hash function must be very very good, for any conflict resolution incurs significant cost.

Question 5

@vnp Sounds like you should be writing an answer ;)

Question 6

Sorting the command list prior to compile time is not that much of a burden. If there a propensity to fail to do so, a simple run-time check in the DEBUG version of code would detect it.
Hash is the way to go for high speed, but a simple binary search would be useful to begin with to iron out initial code development issues.
To help with string compare speeds have done this ((x[0] == y[0]) && (strcmp((x), (y)) == 0)), but this may be premature optimization.
Minor: Suggest replacing void* with some large unsigned integer like uintmax_t as the generic return type for the various functions void* (*fn)(void). Or better yet, a union of the various types.
```
typedef union {
 time_t t;
 void *p ;
 int i;
} Return_T;
```
Minor: localtime() could return NULL. Best to test function return value before dereference.
Minor: Suggest const. static const Command commands[] = ....
Minor: Best to make array indexes size_t: size_t i = 0; ... Command *p = commands+i;
Minor: Casting (uintptr_t)time(NULL) and then to int in printf("%d\n", (int)p->fn()); can easily lose significant data. See note #4.

bsearch() example: something like the following untested code

static int compar(const void *a, const void *b) {
 const char *key = (const char *) a;
 const Command *element = (const Command *) b;
 return strcmp(key, element->cmd);
}
void main(int argc, char *argv[]) {
 Command *match = bsearch(argv[1], commands, ARRAY_SIZE(commands),
 sizeof commands[0], compar);
 if (match != NULL) {
 printf("%d\n", (int) (match->fn()));
 }
}

Question 7

I could not get your bsearch() example to work for me.

Question 8

@syb0rg Sorry can not hep more right now, maybe tomorrow. Good luck. Try cplusplus.com/reference/cstdlib/bsearch

Question 9

An advice #3 is surely of a wrong kind. A cost of avoiding a function call is a conditional: it not taken, a function call follows; if taken, the pipeline is flushed anyway. Where's a win? Besides, strcmp is most likely inlined.

Question 10

@vnp I doubt you would agree but most processors in 2014 are not pipelined. Consider embedded processors quantities are over 1 Billion per year. These simpler devices do benefit from code as suggested. In any case, "may be premature optimization" already addresses the concern that such a tactic may indeed be of negative value. As with many optimizations, looking at the order of complexity (hash, bsearch,, etc.) of the algorithm typically trumps linear speed improvements.

Question 11

As requested.

An unstructured array may only be searched in linear order. To get better asymptotics you need a more search-friendly structure. A simplest solution would be to presort the Command array, and apply a binary search to find the command. A definite advantage here is a possibility to use standard library. You may even have the array sorted during build time.

For hundreds of thousands of commands the binary search would complete in about 20 lookups. If this is unacceptable, and your commands' names let you come up with a perfect hash function, hash is a way to go. Beware that a hash function must be really perfect - that is, almost no conflicts and almost no misses. Otherwise, the asymptotic constant would defeat all the advantages of O(1).

Update: assuming a unixish environment, the best way to sort at the build time is to have a commands.c file as

static Commands commands[] = {
#include "command-list.c"
};

with a command-list.txt text file

"time", getTime
"day", getDay

and a Makefile rule saying something like (may require some backslashes)

%.c: %.txt
 sort $< | sed -e 's/.*/{&},/' > $@

Question 12

What would be the best way to sort the array during build time? I would prefer not to do it manually.

Question 13

Terminology snipe: A perfect hash function means no conflicts at all. Technical snipe: The binary search is 20 random lookups. Even a bad hashing scheme can be made to do quite well here if you store all the command names, in hash order, in a single string.

Question 14

@tmyklebu You should form an answer out of that. It's something I would upvote.

Question 15

@syb0rg: chux already nailed it.

chux chux 36.1k2 gold badges43 silver badges96 bronze badges · Accepted Answer · 2014-06-16 23:57:16Z

Sorting the command list prior to compile time is not that much of a burden. If there a propensity to fail to do so, a simple run-time check in the DEBUG version of code would detect it.
Hash is the way to go for high speed, but a simple binary search would be useful to begin with to iron out initial code development issues.
To help with string compare speeds have done this ((x[0] == y[0]) && (strcmp((x), (y)) == 0)), but this may be premature optimization.
Minor: Suggest replacing void* with some large unsigned integer like uintmax_t as the generic return type for the various functions void* (*fn)(void). Or better yet, a union of the various types.
```
typedef union {
 time_t t;
 void *p ;
 int i;
} Return_T;
```
Minor: localtime() could return NULL. Best to test function return value before dereference.
Minor: Suggest const. static const Command commands[] = ....
Minor: Best to make array indexes size_t: size_t i = 0; ... Command *p = commands+i;
Minor: Casting (uintptr_t)time(NULL) and then to int in printf("%d\n", (int)p->fn()); can easily lose significant data. See note #4.

bsearch() example: something like the following untested code

static int compar(const void *a, const void *b) {
 const char *key = (const char *) a;
 const Command *element = (const Command *) b;
 return strcmp(key, element->cmd);
}
void main(int argc, char *argv[]) {
 Command *match = bsearch(argv[1], commands, ARRAY_SIZE(commands),
 sizeof commands[0], compar);
 if (match != NULL) {
 printf("%d\n", (int) (match->fn()));
 }
}

@syb0rg Sorry can not hep more right now, maybe tomorrow. Good luck. Try cplusplus.com/reference/cstdlib/bsearch
An advice #3 is surely of a wrong kind. A cost of avoiding a function call is a conditional: it not taken, a function call follows; if taken, the pipeline is flushed anyway. Where's a win? Besides, strcmp is most likely inlined.
@vnp I doubt you would agree but most processors in 2014 are not pipelined. Consider embedded processors quantities are over 1 Billion per year. These simpler devices do benefit from code as suggested. In any case, "may be premature optimization" already addresses the concern that such a tactic may indeed be of negative value. As with many optimizations, looking at the order of complexity (hash, bsearch,, etc.) of the algorithm typically trumps linear speed improvements.

Stack Exchange Network

Scalability of running commands from user input

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Scalability of running commands from user input

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions