I've made this simple function for reading N bits from an unsigned char*
(with a possible offset) and I'm willing to share it here, get a review, suggestions for improving it, etc...
typedef struct bits_t {
unsigned char *data;
unsigned int len;
} bits_t;
bits_t *read_bits(unsigned char *src, int bits_offset, int nbits){
unsigned int curr_bit, curr_byte, remaining_to_read, bit_position_in_byte;
curr_bit = curr_byte = 0;
remaining_to_read = nbits;
bits_t *bits = malloc(sizeof(bits_t));
bits->len = nbits;
bits->data = calloc(1, bits->len);
for(curr_bit = 1; curr_bit <= nbits; curr_bit++){
if(curr_bit >= bits_offset && remaining_to_read){
curr_byte = (curr_bit - 1) / 8;
bit_position_in_byte = (curr_bit - 1) - (curr_byte * 8);
bits->data[remaining_to_read - 1] = read_bit(src[curr_byte], bit_position_in_byte);
remaining_to_read--;
}
}
return bits;
}
Main:
unsigned char *x = "oo";
bits_t *bits = read_bits(x, 0, 16);
for(i = 0; i < bits->len; i++){
printf("%d", bits->data[i]);
}
Result: 0110111101101111
2 Answers 2
- Consider checking
src[currentByte]
for'0円'
and abort when it's true (apparently the passed in string wasn't long enough). Depends on your usage though. - Alternatively to the previous point you could add a
source_length
parameter to let the caller supply the information of how long the sequence really is. I would consider changing the interface slightly. Have some methods which manage the object for you:
bits_t *new_bits_t(int nbits) { bits_t *res = malloc(sizeof(bits_t)); res->data = calloc(nbits); res->len = nbits; return res; } // frees the memory allocated for the structure and sets the reference to NULL void delete_bits_t(bits_t **bits) { if (bits == NULL || *bits == NULL) return; free((*bits)->data); free(*bits); *bits = NULL; }
Then your
read_bits
function could fill in the structure passed in and also return error code in case the reading failed (i.e. source is too short):int read_bits(unsigned char *src, bits_t *bits, int bits_offset) { ... }
bit_position_in_byte = (curr_bit - 1) - (curr_byte * 8);
should be equivalent tobit_position_in_byte = (curr_bit - 1) % 8;
- Your loop is one based for some reason but inside the loop you have to subtract 1 from
current_bit
everywhere. You should just make your loop 0 based which would de-clutter the loop a bit. - I would find
bit_in_byte
just as descriptive a name asbit_position_in_byte
. You could get rid of the
remaining_to_read
by calculating the position:for (curr_bit = 0; curr_bit < nbits; ++curr_bit) { input_bit = curr_bit + bits_offset - 1; input_byte = input_bit / 8; bit_in_byte = input_bit % 8; bits->data[nbits - curr_bit - 1] = read_bit(src[input_byte], bit_in_byte); }
- The loop apparently fills
data
from the end which seems little bit unexpected to me (kind of string reversal).
Using _t
as a suffix is reserved by POSIX. Also the structure is
unnecessary. The len
field is essentially unused - the caller knows the
length already. So just allocate and return an array. Note that you need to
check that the allocation succeeded.
Parameter src
should be const
Why is everything unsigned except nbits
? I would make all apart from the chars signed -
making things unsigned adds nothing here and so is just 'noise'.
Variables should generally be defined one per line and initialised immediately - although it is better to define them at the point of first use.
It is normal to start indexing at 0 rather than 1. And your loop should start
at bits_offset
, rather than starting at 1 and indexing through.
Here is a revised version. Note the use of %
to get the bit offset.
#include <stdlib.h>
#include <stdio.h>
static unsigned char* read_bits(const unsigned char *src, int offset, int nbits)
{
unsigned char *s = malloc((size_t) nbits);
if (s) {
--nbits;
for (int bit = offset; nbits >= 0; ++bit, --nbits){
int byte = bit / 8;
int bit_position = bit % 8;
s[nbits] = (src[byte] >> bit_position) & 1;
}
}
return s;
}
-
\$\begingroup\$ Good, but I think you've misinterpreted the function's intent, which is to interpret arbitrary data as numeric bits. I think
unsigned char *bits_big_endian(const char *src, int bit_offset, unsigned int nbits)
would be a more appropriate signature. Data, being non-numeric, are most naturallychar[]
, notunsigned char[]
. @alexandernst wants the output as0
and1
, not'0'
and'1'
. A numeric bit would beunsigned char
, and a numeric bit string should not be terminated with'0円'
. \$\endgroup\$200_success– 200_success2013年11月18日 20:28:04 +00:00Commented Nov 18, 2013 at 20:28 -
\$\begingroup\$ @200_success thanks, you are quite right. I modified it to fit the spec. \$\endgroup\$William Morris– William Morris2013年11月18日 23:06:20 +00:00Commented Nov 18, 2013 at 23:06
-
\$\begingroup\$ Sorry for not replying yesterday, I was busy. I tried your code but it won't return anything actually. Have a look here: ideone.com/v1BJCV \$\endgroup\$alexandernst– alexandernst2013年11月19日 17:25:04 +00:00Commented Nov 19, 2013 at 17:25
-
\$\begingroup\$ Alex, the code you tried was the edited version above but with my original
main
(which assumed a string was returned). I removedmain
from the code above because your originalmain
works with this code. My original code, which output a string can be seen if you click on the "edited 21 hours ago" link below the code. \$\endgroup\$William Morris– William Morris2013年11月19日 20:29:51 +00:00Commented Nov 19, 2013 at 20:29
read_bit
implementation seems to be missing. Is it a macro or an actual function? \$\endgroup\$size_t
rather thanunsigned int
forlen
andnbits
. Maybe alsobits_offset
. \$\endgroup\$