4
\$\begingroup\$

I work a lot with byte buffers and have to extract different parts. In this example, it's 4 byte, but it ranges from a single bit to 128 bit. Speed is the most important metric here. See the code for a MWE. I'd like to know if there is a better way.

#include <stdint.h>
static uint32_t get_data(uint8_t *buf, size_t off)
{
 return ((uint32_t)(buf[off + 0]) << 24) +
 ((uint32_t)(buf[off + 1]) << 16) +
 ((uint32_t)(buf[off + 2]) << 8) +
 ((uint32_t)(buf[off + 3]));
}
int main(int argc, char **argv)
{
 uint8_t buf[128];
 /* get some example data */
 for (uint8_t i = 0; i < 128; ++i)
 buf[i] = i;
 /* we want the data from offset 10 as an uint32_t */
 uint32_t res = get_data(buf, 10);
}
asked Jan 11, 2018 at 19:24
\$\endgroup\$
2
  • \$\begingroup\$ Do you mean bit or byte at the end? \$\endgroup\$ Commented Jan 11, 2018 at 19:55
  • \$\begingroup\$ I used both bit and byte in my question intentionally, although I usually work with bytes. But in rare cases I also need to know the value of individual bits, hence the range 1 to 128. \$\endgroup\$ Commented Jan 11, 2018 at 20:01

2 Answers 2

1
\$\begingroup\$

Since you want low level operations, I'd suggest memmove

#include <time.h>
#include <stdlib.h>
#include <stdint.h>
#include <stdio.h>
#include <string.h>
static uint32_t get_data(uint8_t *buf, size_t off)
{
 return ((uint32_t)(buf[off + 0]) << 24) +
 ((uint32_t)(buf[off + 1]) << 16) +
 ((uint32_t)(buf[off + 2]) << 8) +
 ((uint32_t)(buf[off + 3]));
}
int main(int argc, char **argv)
{
 uint8_t buf[128];
 /* get some example data */
 for (uint8_t i = 0; i < 128; ++i)
 buf[i] = i;
 clock_t t = clock();
 uint32_t res;
 for(int i=0; i<10000; i++)
 memmove(&res, buf+10, sizeof(uint32_t));
 t = clock() -t;
 printf("Time %lf\n", (double)t/CLOCKS_PER_SEC);
 t = clock();
 for(int i=0; i<10000; i++)
 res = get_data(buf, 10);
 t = clock() -t;
 printf("Time %lf\n", (double)t/CLOCKS_PER_SEC);
}

Because a single copy doesn't show any difference I tried with 10.000 and my results were:

Time 0.000049
Time 0.000090

Almost double the speedup

  • EDIT 1: As mentioned in the comments, memcpy is a viable alternative to memmove.
  • EDIT 2: The speed difference in this example cannot be observed with -O flag as the compiler executes the loop only one time.
answered Jan 11, 2018 at 19:51
\$\endgroup\$
5
  • 2
    \$\begingroup\$ Might want to compare with memcpy, as the buffers don't overlap. \$\endgroup\$ Commented Jan 11, 2018 at 19:56
  • 1
    \$\begingroup\$ Tried this with clang, from -O0 to -O3. I see absolutly no difference from manual bit shifting to memmove/memcpy. Anyway, as it is not slower, memcpy is much shorter to write and better readable. \$\endgroup\$ Commented Jan 11, 2018 at 20:20
  • 2
    \$\begingroup\$ I think it's because -O3 understands the silly for loop and executes it only one time so as I wrote, with one copy you can't tell the difference. With 1billion in the for loop it takes 4 seconds without -O and 0.00001 sec with so I guess that's it \$\endgroup\$ Commented Jan 11, 2018 at 20:24
  • 5
    \$\begingroup\$ There is a difference between the OP's code and memmove/memcpy: The result with the latter depends on the system's byte order, whereas OP's result doesn't. \$\endgroup\$ Commented Jan 11, 2018 at 21:05
  • 1
    \$\begingroup\$ memmove to a uint32_t simply gets optimized away to mov eax, DWORD PTR [rdi+rsi] on GCC 7.2 (-O3) \$\endgroup\$ Commented Jan 11, 2018 at 21:15
0
\$\begingroup\$

I'd like to know if there is a better way.

uint32_t res = get_data(buf, 10); and get_data(buf, 10) are a good first step as 1) it is functionally correct and 2) highly portable.

Any "better" solution should use this as the baseline in which to compare/profile.

The next performance step involves some assumptions. If the uint32_t is of the expected endian, than a simplememcpy() will work in lieu of get_data().

memcpy(&res, buf + 10, sizeof ref);

Although this may look like a function call, a worthy compiler "understands" memcpy() and can replace this with efficient in-line emitted code. Let your good compiler do its job - or get a better compiler.

If code "knows" res andrefdo not overflowmemcpy()is faster, or as fast as memmove(). IAC, a good compiler replaces either of these with in-line code for such small sizeof ref copies. mox nix


Soapbox: Overall, the core issue with modern code efficiency improvement is that it is unlikely to be a good investment of coding expense/effort. Spend time writing good code without employing tricks. Real efficiency improvement comes from higher level choices than this - which can vary from implementation to implementation. You may code something faster on a select platform, but be slower on the next as the big O() is the same,

answered Jan 12, 2018 at 4:51
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.