Function for reading bits

Question 1

I've made this simple function for reading N bits from an unsigned char* (with a possible offset) and I'm willing to share it here, get a review, suggestions for improving it, etc...

typedef struct bits_t {
 unsigned char *data;
 unsigned int len;
} bits_t;
bits_t *read_bits(unsigned char *src, int bits_offset, int nbits){
 unsigned int curr_bit, curr_byte, remaining_to_read, bit_position_in_byte;
 curr_bit = curr_byte = 0;
 remaining_to_read = nbits;
 bits_t *bits = malloc(sizeof(bits_t));
 bits->len = nbits;
 bits->data = calloc(1, bits->len);
 for(curr_bit = 1; curr_bit <= nbits; curr_bit++){
 if(curr_bit >= bits_offset && remaining_to_read){
 curr_byte = (curr_bit - 1) / 8;
 bit_position_in_byte = (curr_bit - 1) - (curr_byte * 8);
 bits->data[remaining_to_read - 1] = read_bit(src[curr_byte], bit_position_in_byte);
 remaining_to_read--;
 }
 }
 return bits;
}

Main:

unsigned char *x = "oo";
bits_t *bits = read_bits(x, 0, 16);
for(i = 0; i < bits->len; i++){
 printf("%d", bits->data[i]);
}

Result: 0110111101101111

Question 2

1. Your function is never called. 2. Have a look at this other question (or other similar questions) and note that your code could be made much shorter and faster.

Question 3

1. I'm just showing how I'm calling my function. Obviously, I'm doing that inside main... 2. Err... Do you even know what my function is doing?

Question 4

Thanks. I think it would be more efficient if you determined the input chars to iterate over and then placed the appropriate values (mostly one input byte at a time) into your struct. This should reduce the index calculations

Question 5

Your read_bit implementation seems to be missing. Is it a macro or an actual function?

Question 6

Use size_t rather than unsigned int for len and nbits. Maybe also bits_offset.

Question 7

Consider checking src[currentByte] for '0円' and abort when it's true (apparently the passed in string wasn't long enough). Depends on your usage though.
Alternatively to the previous point you could add a source_length parameter to let the caller supply the information of how long the sequence really is.

I would consider changing the interface slightly. Have some methods which manage the object for you:

bits_t *new_bits_t(int nbits)
{
 bits_t *res = malloc(sizeof(bits_t));
 res->data = calloc(nbits);
 res->len = nbits;
 return res;
}
// frees the memory allocated for the structure and sets the reference to NULL
void delete_bits_t(bits_t **bits)
{
 if (bits == NULL || *bits == NULL) return;
 free((*bits)->data);
 free(*bits);
 *bits = NULL;
}

Then your read_bits function could fill in the structure passed in and also return error code in case the reading failed (i.e. source is too short):

int read_bits(unsigned char *src, bits_t *bits, int bits_offset)
{
 ...
}

bit_position_in_byte = (curr_bit - 1) - (curr_byte * 8); should be equivalent to bit_position_in_byte = (curr_bit - 1) % 8;
Your loop is one based for some reason but inside the loop you have to subtract 1 from current_bit everywhere. You should just make your loop 0 based which would de-clutter the loop a bit.
I would find bit_in_byte just as descriptive a name as bit_position_in_byte.

You could get rid of the remaining_to_read by calculating the position:

for (curr_bit = 0; curr_bit < nbits; ++curr_bit) {
 input_bit = curr_bit + bits_offset - 1;
 input_byte = input_bit / 8;
 bit_in_byte = input_bit % 8;
 bits->data[nbits - curr_bit - 1] = read_bit(src[input_byte], bit_in_byte);
}

The loop apparently fills data from the end which seems little bit unexpected to me (kind of string reversal).

Question 8

Using _t as a suffix is reserved by POSIX. Also the structure is unnecessary. The len field is essentially unused - the caller knows the length already. So just allocate and return an array. Note that you need to check that the allocation succeeded.

Parameter src should be const

Why is everything unsigned except nbits? I would make all apart from the chars signed - making things unsigned adds nothing here and so is just 'noise'.

Variables should generally be defined one per line and initialised immediately - although it is better to define them at the point of first use.

It is normal to start indexing at 0 rather than 1. And your loop should start at bits_offset, rather than starting at 1 and indexing through.

Here is a revised version. Note the use of % to get the bit offset.

#include <stdlib.h>
#include <stdio.h>
static unsigned char* read_bits(const unsigned char *src, int offset, int nbits)
{
 unsigned char *s = malloc((size_t) nbits);
 if (s) {
 --nbits;
 for (int bit = offset; nbits >= 0; ++bit, --nbits){
 int byte = bit / 8;
 int bit_position = bit % 8;
 s[nbits] = (src[byte] >> bit_position) & 1;
 }
 }
 return s;
}

Question 9

Good, but I think you've misinterpreted the function's intent, which is to interpret arbitrary data as numeric bits. I think unsigned char *bits_big_endian(const char *src, int bit_offset, unsigned int nbits) would be a more appropriate signature. Data, being non-numeric, are most naturally char[], not unsigned char[]. @alexandernst wants the output as 0 and 1, not '0' and '1'. A numeric bit would be unsigned char, and a numeric bit string should not be terminated with '0円'.

Question 10

@200_success thanks, you are quite right. I modified it to fit the spec.

Question 11

Sorry for not replying yesterday, I was busy. I tried your code but it won't return anything actually. Have a look here: ideone.com/v1BJCV

Question 12

Alex, the code you tried was the edited version above but with my original main (which assumed a string was returned). I removed main from the code above because your original main works with this code. My original code, which output a string can be seen if you click on the "edited 21 hours ago" link below the code.

ChrisWue ChrisWue 20.6k4 gold badges42 silver badges107 bronze badges · Answer 1 · 2013-11-18 08:41:31Z

Consider checking src[currentByte] for '0円' and abort when it's true (apparently the passed in string wasn't long enough). Depends on your usage though.
Alternatively to the previous point you could add a source_length parameter to let the caller supply the information of how long the sequence really is.

I would consider changing the interface slightly. Have some methods which manage the object for you:

bits_t *new_bits_t(int nbits)
{
 bits_t *res = malloc(sizeof(bits_t));
 res->data = calloc(nbits);
 res->len = nbits;
 return res;
}
// frees the memory allocated for the structure and sets the reference to NULL
void delete_bits_t(bits_t **bits)
{
 if (bits == NULL || *bits == NULL) return;
 free((*bits)->data);
 free(*bits);
 *bits = NULL;
}

Then your read_bits function could fill in the structure passed in and also return error code in case the reading failed (i.e. source is too short):

int read_bits(unsigned char *src, bits_t *bits, int bits_offset)
{
 ...
}

bit_position_in_byte = (curr_bit - 1) - (curr_byte * 8); should be equivalent to bit_position_in_byte = (curr_bit - 1) % 8;
Your loop is one based for some reason but inside the loop you have to subtract 1 from current_bit everywhere. You should just make your loop 0 based which would de-clutter the loop a bit.
I would find bit_in_byte just as descriptive a name as bit_position_in_byte.

You could get rid of the remaining_to_read by calculating the position:

for (curr_bit = 0; curr_bit < nbits; ++curr_bit) {
 input_bit = curr_bit + bits_offset - 1;
 input_byte = input_bit / 8;
 bit_in_byte = input_bit % 8;
 bits->data[nbits - curr_bit - 1] = read_bit(src[input_byte], bit_in_byte);
}

The loop apparently fills data from the end which seems little bit unexpected to me (kind of string reversal).

William Morris William Morris 9,38419 silver badges42 bronze badges · Answer 2 · 2013-11-18 19:49:04Z

Using _t as a suffix is reserved by POSIX. Also the structure is unnecessary. The len field is essentially unused - the caller knows the length already. So just allocate and return an array. Note that you need to check that the allocation succeeded.

Parameter src should be const

Why is everything unsigned except nbits? I would make all apart from the chars signed - making things unsigned adds nothing here and so is just 'noise'.

Variables should generally be defined one per line and initialised immediately - although it is better to define them at the point of first use.

It is normal to start indexing at 0 rather than 1. And your loop should start at bits_offset, rather than starting at 1 and indexing through.

Here is a revised version. Note the use of % to get the bit offset.

#include <stdlib.h>
#include <stdio.h>
static unsigned char* read_bits(const unsigned char *src, int offset, int nbits)
{
 unsigned char *s = malloc((size_t) nbits);
 if (s) {
 --nbits;
 for (int bit = offset; nbits >= 0; ++bit, --nbits){
 int byte = bit / 8;
 int bit_position = bit % 8;
 s[nbits] = (src[byte] >> bit_position) & 1;
 }
 }
 return s;
}

Good, but I think you've misinterpreted the function's intent, which is to interpret arbitrary data as numeric bits. I think unsigned char *bits_big_endian(const char *src, int bit_offset, unsigned int nbits) would be a more appropriate signature. Data, being non-numeric, are most naturally char[], not unsigned char[]. @alexandernst wants the output as 0 and 1, not '0' and '1'. A numeric bit would be unsigned char, and a numeric bit string should not be terminated with '0円'.
@200_success thanks, you are quite right. I modified it to fit the spec.
Sorry for not replying yesterday, I was busy. I tried your code but it won't return anything actually. Have a look here: ideone.com/v1BJCV
Alex, the code you tried was the edited version above but with my original main (which assumed a string was returned). I removed main from the code above because your original main works with this code. My original code, which output a string can be seen if you click on the "edited 21 hours ago" link below the code.

Stack Exchange Network

Function for reading bits

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Function for reading bits

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions