Counting Occurrences of a Specific 3-Bit Pattern in a Byte Array

Question 1

I was asked this problem in an interview, and this is the solution I came up with. I was told it's not the most efficient solution, but I can't think of any other solution. This is the problem.

Problem: Given a byte array and a fixed 3-bit pattern 0b101, we need to count the number of times this 3-bit pattern appears within the array. The pattern can span across byte boundaries, meaning it can be formed using bits from consecutive bytes.

int count_pattern_occurrences(const uint8_t* array, size_t size, uint8_t pattern = 0b101) {
 if (size == 0 || pattern > 0b111) {
 return 0; // Array is empty or invalid pattern
 }
 int count = 0;
 uint32_t value = 0; // To hold the accumulated bits
 int bit_count = 0;
 for (size_t i = 0; i < size; ++i) {
 value = (value << 8) | array[i];
 bit_count += 8;
 // Check within the current byte
 for (int bit_pos = bit_count - 8; bit_pos <= bit_count - 3; ++bit_pos) {
 if (((value >> (bit_count - 3 - bit_pos)) & 0b111) == pattern) {
 ++count;
 }
 }
 // Prepare for the next iteration, keeping only the last 2 bits
 value &= 0b11;
 bit_count = 2;
 }
 return count;
}

I aimed to minimize space usage by implementing this with O(1) space complexity. However, I'm not sure if there's a more efficient way than iterating through the byte arrays. Is it possible to optimize this further, or am I missing something?

Question 2

I need more coffee, this morning. Intuition is suggesting there's a "bit-fiddle" possible that would make the inner loop iterate less often (possibly only once for some data bytes)... Writing "just in case" code can be expensive for a company. Anecdote: A co-worker wrote C++ classes with over-ride-able functions whose base class returned 7 as "the number of days in a week" just in case a "week" ever needed to be redefined as something other than 7 days. KISS principles dictate "worry about core requirements, not some hypothetical."

Question 3

One more to emphasise the advice to not fuss over "tomorrow's requirements". Tomorrow, the task may be to count occurrences of 0b1x1 (where x == "don't care"), or 0b1xx1, or any other of the innumerable variations. "Sufficient unto the day is the evil thereof".

Question 4

One general point I’d add: space is cheap these days, but time isn’t. So unless the problem statement specifically mentions space complexity, I’d recommend to always optimise for runtime first.

Question 5

Is 0b10101 one or two instances?

Question 6

Thanks @Fe2O3. I'd remember it from now on.

Question 7

The OP's code is clean and quite readable.
Kudos for its presentation (seriously!). The function's name, purpose and its variables are unmistakeable, and newer compiler capabilities ("binary constants") have been used.

"Efficiency" seems to be the primary focus behind this question. Careful examination of the operations suggests there might be a "better way"...

"Table Lookup" of, in this case, 10 bit patterns (2¹⁰ = 1024 elements) would be very fast. That technique could rapidly gobble through all the array's bytes of data. The downside to a LUT would be tedious and error-prone manually populating such a single purpose table, OR writing code to generate the 1024 values. The latter approach would entail solving the coding problem for 1024 values, then capturing its output for conversion to a compile-time table. The unknown "use case" would have to justify the work. (And, would be a solution to only one specific search pattern.) Additionally, this is tractable for 8+2=10 bits. It doesn't scale well, though, once the target pattern goes over 10 bits to 12, 18 or ?? bits...

Proving ground

This reviewer admits to difficulty "tracking" the values of too many variables, especially when differences with both constants and changing variable values are stirred into the mix. I prefer a simpler approach...

Below, I've taken the OP's code, lightly changed only a few places in order to use my older compiler, and added two things: 1) A simple "test harness" that deals with a tiny array of only 2 bytes and 2) a revision of the OP's code that still loops in search of the target pattern. (Written as C, but should be good-to-go as C++, I hope.)

Unfortunately, this has shown that there is something not-quite-right in the OP's code. The comparison of the two approaches reveals a discrepancy when the two array values combine to form
0b0000 0001 0100 0000 (320 = 256 + 64),
and the discrepancy obviously continues for more array values higher than that.

Here is the full code, presented "as is" for the OP to study and consider. In brief, it uses a shifting "window" to isolate continuous regions of a 10bit register (sliding left-to-right) and counting matches of the target pattern. (I've not been sitting in an "interview situation" when writing this. NO(!) denigration of the OP or that code is to be inferred from this post, please!) I'd be happy to respond to any questions or comments about this post.

#include <stdio.h>
#include <stdint.h>
int count_pattern_occurrences(const uint8_t* array, size_t size, uint8_t pattern) {
 pattern = 0x5;
 if (size == 0 || pattern > 0x7) {
 return 0; // Array is empty or invalid pattern
 }
 int count = 0;
 uint32_t value = 0; // To hold the accumulated bits
 int bit_count = 0;
 for (size_t i = 0; i < size; ++i) {
 value = (value << 8) | array[i];
 bit_count += 8;
 // Check within the current byte
 for (int bit_pos = bit_count - 8; bit_pos <= bit_count - 3; ++bit_pos) {
 if (((value >> (bit_count - 3 - bit_pos)) & 0x7) == pattern) {
 ++count;
 }
 }
 // Prepare for the next iteration, keeping only the last 2 bits
 value &= 0x3;
 bit_count = 2;
 }
 return count;
}
int occurs_101( uint8_t arr[], size_t size ) {
 int cnt = 0;
 uint32_t bits = 0;
 for( size_t i = 0; i < size; ++i ) {
 bits = ((bits & 0x3) << 8) | arr[i];
 uint32_t pattern = 0x5 << 8;
 uint32_t msk = 0x7 << 8;
 do {
 pattern >>= 1;
 msk >>= 1;
 cnt += (bits & msk) == pattern;
 } while( !(msk & 1) );
 }
 return cnt;
}
int main( void ) {
 uint8_t arr[2] = { 0 };
 do {
 do {
 int v0 = count_pattern_occurrences( arr, 2, 0x5 );
 int v1 = occurs_101( arr, 2 );
 if( v0 != v1 ) {
 printf( "%5d: %d %d\n", ((int)arr[0]<<8)+arr[1], v0, v1 );
 getchar();
 }
 } while( ++arr[1] != 0 );
 } while( ++arr[0] != 0 );
 puts( "All done!" );
 return 0;
}

Beyond this, it becomes apparent that searching through large arrays might benefit from gathering-up several bytes into a uint64_t or even a uint128_t, as has been suggested in another answer, then using some shifting and XOR operations to process many bits simultaneously, followed by some form of 'bit count' on the resulting value. Again, the "use case" would need to be defined to decide how much time and effort should be invested in the problem.

In conclusion, the OP's code is clean and, with some pencil work, could be checked for its functionality. The highest priority has to be the correctness of the code. (And, I admit that I've not checked my own version for all 65536 possible combinations of 2x 8bits of input. This I leave for the OP, or you, the gentle reader, to perform.)

Worthwhile?

Revisiting code that's been left to simmer for a few hours is a good way to shape it into what might be considered "a keeper." Here's a slightly more compact version of the code above:

int occurs_101a( uint8_t arr[], size_t size ) {
 int cnt = 0;
 uint32_t bits = 0;
 for( size_t i = 0; i < size; ++i ) {
 bits = ((bits << 8) | arr[i] ) & 0x3FF;
 for( uint32_t wrk = bits; wrk >= 0x5; wrk >>= 1 )
 cnt += (wrk & 0x7) == 0x5;
 }
 return cnt;
}

It doesn't matter at which end of any 10bit sample register the tabulation begins, so the previous "mask" does not need to be manipulated (shifted). The above also has the possibility of "early determination of contiguous high-order clear bits" that would preclude some fruitless looping.

Worthwhile?

Overboard?

int occurs_101b( uint8_t arr[], size_t size ) {
 int cnt = 0;
 uint32_t b = 0, w; // bits register and working register
 for( size_t i = 0; i < size; ++i ) {
 b = ((b << 8) | arr[i]) & 0x3FF;
 w = ((((b >> 1) & ~b) >> 1) & b) & 0xFF; // 0b101
 for( ; w; cnt++ ) w &= w - 1; // Brian K's algo
 }
 return cnt;
}

The above has an inner loop to include Brian Kernighan's well known "bit count" algorithm. This version is now amenable, with some tweaking, to using wider registers to combine several array bytes for simultaneous 'consolidation' and counting of instances of the target pattern. Note, however, that the target bit pattern is now expressed in bitwise operations making it slightly more difficult to discern.

Always have some fun with whatever you do.

Interview (and career) tip

Here's another version to ponder:

int occurs_101c( uint8_t arr[], size_t size ) {
 int cnt = 0;
 uint32_t b = 0; // bits register
 for( size_t i = 0; i < size; ++i ) {
 b = ((b << 8) | arr[i] ) & 0x3FF;
 cnt += ((b>>0 & 7) == 5) // 0000000101
 + ((b>>1 & 7) == 5) // 0000001010
 + ((b>>2 & 7) == 5) // 0000010100
 + ((b>>3 & 7) == 5) // 0000101000
 + ((b>>4 & 7) == 5) // 0001010000
 + ((b>>5 & 7) == 5) // 0010100000
 + ((b>>6 & 7) == 5) // 0101000000
 + ((b>>7 & 7) == 5);// 1010000000
 }
 return cnt;
}

The above contains quite a few operations, but is "branchless" (apart from the required outer loop). Further, it can be easily "reasoned about" by anyone with a modicum of programming experience. In other words, it is simple to write, simple to read, and its correctness can be verified visually. Although not efficient or fast, its results will always be correct (for the problem as stated.) It's a good starting point! The worst thing a coder can produce is code that works most of the time.

Most of the answers, here, refer to the speed of using a LUT. One of those provides a function, containing some intricate but working code, that populates a LUT suitable for its implementation of the solution. The simple and verifiable function above could be called 1024 times (2¹⁰) with 1024 bit patterns by a caller that, likewise, populates an appropriate LUT (or outputs text suitable for copy/paste into a compile-time array.)

Consider the hypothetical that there is very limited time to write code to solve a problem. Now there are two priorities: Correctness (always), and a Deadline. If the interviewer got nothing better than the code above, the interviewee can highlight its simplicity and its correctness. Incorrect or incomplete solutions both score very low marks. If judged to be "inefficient. Do over.", the code above can be used to populate that efficient LUT with correct values. Or, as demonstrated by the offered "test harness", this simple version can be used to verify the results of a "more efficient" version that uses bit manipulation only.

Aside: Storing 1024 values that vary from 0 to 5, each in a byte, can be improved upon.
Exercise: cut that memory footprint in half by storing (and retrieving) 4bit values from an array of 512 bytes (each being a pair of "nibbles".) How does this affect performance? Can it be written "branchless"?

Recommend not walking away from this problem because it seems to be solved. It is by exploring alternatives that one adds-to or strengthens one's intuition and storehouse of "on-hand" solutions.

Question 8

Is there a standard way to represent an array of bytes as a sequence of bits? I think [0x01, 0x80] could represent bit 0 being set in the first byte and bit 6 being set in the second byte (becoming bit 14).

Question 9

@JasonGoemaat Hi there. By convention, 1 byte is (typically) 8 bits labelled (left-to-right) b7 to b0. The easily extends to 1 "word" (double bytes) labelled b15 to b0, and so on as hardware advances increase widths of integer datatypes. Big vs Little Endian describes two CPU architectures, so one has to know how data is processed by the target implementation. This Q&A may serve to explain some considerations. At the end of the day, it's "caveat emptor". There are multiple "standards", so there is no real "standard".... Cheers! :-).

Question 10

It would probably be good to state the way you were proposing to handle it and ask for clarification. I think they like to see your thought processes as much as anything.

Question 11

@JasonGoemaat I do not understand your comment. What is the "it" to be handled? What is proposed and where?

Question 12

'it' is the way to parse the array of bytes into a bit stream. "I plan to treat the bits of each byte from high to low so that bits 1 and 0 of byte 0 match up with bit 8 of byte 1 when checking for the pattern. This way the byte array would form a large big endian number when combined. I could also treat the bits of each byte from low to high so that bits 7 and 8 of byte 0 match up with bit 0 of byte 1, which way would you prefer?"

Question 13

I see basically two categories of solution that aren't bit-by-bit:

A table-based approach.
A word-oriented approach.

I will get to them, but using this as a hook:

I tried to write it in a generic way just in case a follow up question would come up. Is it a problem?

"Problem" is a bit much, but inherently you will tend to lose opportunities for simplification, or to put it another way, you're making the problem more difficult to solve.

For example, let's say you use a table based approach, using a table with 1024 entries to process 10 bits: 8 bits belonging to the current byte, plus two extra bits that are two last bits from the previous byte. You would form the index into the table by concatenating those things. There is a somewhat annoying case at the start of the array (or the end, depending on how you arrange things), where we must avoid finding instances of the pattern that start before the array. If we know that the pattern is 101, then that automatically doesn't happen if the "left over bits" are initialized as zero. If the pattern can be 000 and other annoying values, it takes some subtle and possibly bug-prone logic to avoid finding pattern matches that would include one or two phantom bits that aren't part of the array.

A word-oriented approach grabs a bigger chunk of the array at once and uses a couple of bitwise operations to turn those bits from the array into a mask indicating where the pattern matches. In the simplest case, if we were looking for the single-bit pattern of 1, the data itself would already be a mask of where the pattern matches (and there wouldn't be any complications from patterns that cross a word-boundary), all that remains is to std::popcount the data word-by-word and sum the results. If we were looking for 0, we first invert the data, and use std::popcount(~word).

For longer patterns, some shifts and ANDs enter the picture. For example to get the mask indicating where the pattern 101 matches, the "core" of the solution could be:

mask = data & (~data << 1) & (data << 2)

It's possible to support variable patterns, using XOR to do conditional inversion. Of course that adds some additional complexity.

You could shift right instead of left. Does that matter? Not really, not at this point.

I've seemingly ignored the "pattern can span across boundaries" problem here, but for the pattern 101 there is an elegant (perhaps not the fastest, but it's nice and simple) solution: work 7 bytes at the time (if you're using uint64_t as your "word"), using 2 of the 8 spare bits that leaves in an uint64_t to put the two extra bits from the adjacent word. That leaves 6 "padding" bits that are in the word but in which no matches must be counted, but here's a nice thing about the pattern 101: that doesn't matter, there will never be a match involving those 6 zeroes. There could have been if a pattern was allowed to end in a zero, though that is not hard to solve, just put an extra bitwise AND to get rid of the invalid matches.

Another option to handle patterns that cross across word boundaries is handling them explicitly, counting them separately from the rest.

If you have access to the kind of shift where you can supply an extra operand of bits to shift into the top/bottom, you can use that to work a full word at the time.

What about SIMD

Most of the "raw processing performance" of a modern high-performance CPU (by which I roughly mean, basically anything that isn't a basic microcontroller; even some fancy microcontrollers have ARM NEON) is locked behind its SIMD/vector processing capabilities, so whether we can use that deserves some thought.

A word-oriented approach is probably more amenable to a SIMD implementation, either manual but especially for auto-vectorization (compilers are, so far, not very eager to turn a table-lookup into a variable-shuffle).

It's not that SIMD cannot do table-lookups; depending on the ISA it can: there is for example tbx/tbl in ARM NEON and SVE, and pshufb (SSSE3), vpermb (AVX512), vpermt2b (AVX512) in x86/x64-land. But a small table means there's not a lot of progress made per lookup; for example a lookup into a 16-entry (4-bit index) table only processes 2 positions if the pattern has a length of 3, a lookup into a 64-entry (6-bit index) table (eg vpermb in AVX512) processes 4 positions - that's something but it's not really reasonable to use a 1024-entry table.

The word-oriented approach mostly "just works" in SIMD. Working 7 bytes at the time is a bit awkward but possible (just shuffle the bytes), maybe it works out better to count the word-boundary-crossing patterns separately, or with AVX512 you could use those "concatenate and shift" instructions such as vpshldq.

Question 14

This is a good, in-depth introduction to "better" solutions, even getting into advanced techniques with some depth (that is over my head.) However, this page states that a good "answer" will mention at least one thing in "review" of the OP's code. Just tryin' to "play by the rules"... Cheers!

Question 15

@Fe2O3 look man you're welcome to drop a downvote

Question 16

The pop-up text for DV says "This answer is not useful." I completely disagree with that! You've written, imho, a very, very good & useful response for any/all who read it. Just thought I'd bring up what is written in the Help Center guidelines.

Question 17

There's an assumption about what it means for the bit pattern to "span bytes" - we should document which endianness we're working to. This is part of the requirements-gathering which is often at the heart of good interview questions.

The code is missing some type definitions. Although I can't be sure, my guess is that these types are probably intended to be the corresponding names from std:

#include <cstddef>
#include <cstdint>
using std::size_t;
using std::uint8_t;
using std::uint32_t;

Note that std::uint8_t and std::uint32_t are optionally-defined types - they don't exist on targets that don't have native types of exactly the specified size, so portable code does not use them.

The C-style (pointer+size) interface is limiting. I'd expect a C++ function to accept an input range:

auto count_pattern_occurrences(std::ranges::input_range auto bytes)

More crucially, think about the return type. Given that the input length is a std::size_t, we'll need at least that much range in the return type. And given that there can be as many as four matches per input element, we need to think about how we'll handle overflow in the result - perhaps by throwing, or maybe by saturating at maximum?

Although the problem says that the pattern to search for will be fixed (0b101), the code here is perhaps prematurely generalised. Knowledge of the pattern allows a number of simplifications that are not available to the more general code (e.g. we get to eliminate the bit_count variable - or we could replace the inner loop with masks and a popcnt instruction).

value &= 0b11 is unnecessary, given that bits outside that mask get shifted out of range of our consideration anyway. And std::uint32_t is overkill for value given we only need 10 bits there - I suggest std::uint_fast16_t instead.

I would have liked to have seen at least the beginnings of some unit tests for the function, particularly given the non-obvious shift amount prior to masking. In fact, if I were doing this in an interview, I'd at least talk about how well it's suited to test-driven development.

Question 18

+1 for "endianness" and "requirements-gathering". IMHO. the problem text might be intentionally left ambiguous, inviting the candidate to detect and fill the gaps by asking the interviewer.

Question 19

@RalfKleberhoff The right answer would begin: "I suppose we want to use a digital computer, probably based on binary technology, and perhaps we are lucky and a byte consists of 8 bits, ..." :)

Question 20

@Hagen, even if the candidate just said something like, "this will only work on systems with 8-bit bytes," that would put them head and shoulders above the masses - it's a real opportunity to show that you know something about portability, and also that you can be pragmatic under time pressure!

Question 21

@HagenvonEitzen If you think this kind of question gives you the job...

Question 22

@Ralf, I guess what you're saying is that the candidate's solution isn't the most important thing about the response to the interview question. This kind of question can be a major factor in getting the job, but it's the conversation that's more important than the code (e.g. "Yes, I agree this isn't the most efficient solution; however, it's clear and obvious, and if the benchmarking determines that it's critical, we could then spend the time to make a faster version.")

Question 23

branch mispredictions

 if ((value ...) == pattern) {
 ++count;
 }

We have a data-dependent branch decision to make here, which essentially looks random to the predictor. Best it can do is predict that we usually skip the increment.

Use https://godbolt.org to learn whether your compiler was smart enough to rewrite this. If not, phrase it as count += (value ...) == pattern. That is, unconditionally add a boolean expression, to avoid mispredictions.

table based approach

int count_pattern_occurrences( ... , uint8_t pattern = 0b101) {

That pattern default is nice, but solving a more general problem can make it difficult to optimize a quite specific one.

not the most efficient solution

I'm going to assume that means "not the fastest" solution.

One can finagle "O(1) space complexity" quite a bit. Lookup tables may be relevant here.

Consider that loop. Six of the first eight iterations examine strictly bits within a byte. Allocate a 256-byte lookup table of zeros, and assign a positive count to entries where the 0b101 pattern is embedded, e.g. entries 5 and 13 are part of that set. This solves part of the problem.

When scanning data in array,

add lookup value to counter
if data value ends with 10 then increment if next byte's LSB is set
if data value MSB is set then increment if next byte's first two bits are 01

Notice that we can add a small value (as much as 3) to the counter and increment based on the next byte's masked value.

Question 24

If == is indeed guaranteed to return 1 for equality and 0 for non-equality - and has a way to do this without branching - then why shouldn't the compiler be able to perform this optimization automatically?

Question 25

@KarlKnechtel, it's a boolean situation. Either the OP's (undisclosed) compiler version and flags produced branching code, or it didn't, and one could speculate until the cows come home. I recommend the use of godbolt.org to learn the details for the situation of interest.

Question 26

Not an answer to the optimization question, but a remark that might be useful:

The problem text is quite vague. To a professional developer, some questions need clarification before an implementation can be done:

The byte array is to be interpreted as a sequence of bits. How are the bits ordered? Starting with the highest or the lowest bit of each byte?
Are overlapping pattern occurrences to be counted or skipped? Are there one or two matches of 101 in 00010101?
Is code to be optimized for readability, for space efficiency, or for time efficiency? And when going for time efficiency, what is the expected usage pattern (how often called, what is a typical array size)?

If your interviewers are senior developers, they'll expect questions like the ones above, showing that you think (and don't hesitate to ask for clarification) before coding, making sure to implement the requirements that have to be met, and not some mis-interpretation.

IMHO, a good developer isn't necessarily the one with the fastest code, but the one who most reliably produces code that fulfills the various requirements.

Question 27

This is a very good answer, even though it doesn't directly comment on the specific code. I hinted at requirements-gathering in my answer, but this really expands on that. I think all good interview-questions have ambiguous specifications, precisely to give the candidate opportunity to demonstrate their technical communication skills and their reasoning process more than is the case with tightly-defined challenges.

Question 28

I think a precomputed 10-bit lookup table is the way to go. This reduces the number of operations to a minimum and is thus likely much faster. It might introduce caching issues. As always, only benchmarking in the appropriate environment will tell how it compares in practice.

unsigned count = 0;
uint16_t x = 0;
for ( size_t i=0; i<n; ++i ) {
 x = ( ( x & 0x3 ) << 8 ) | buf[ i ];
 count += count_lookup[ x ];
}

Question 29

I'd personally solve this with a lookup table of 256 elements. Each entry maps a byte value onto the number of '101' bit patterns which appear in that value.

To deal with the problem of bit patterns spanning bytes, the lookup table also holds information on whether a byte starts with a 'partial pattern', i.e. the start or end of the pattern '101'. We can use this to see if the previous byte ended with '1'/'10' while the next byte start with '01'/'1' (respectively).

This, in my opinion, leads to code which is fairly easy to reason about, while only taking a few instructions for each byte. You could improve on the speed by considering multiple bytes at a time / using SIMD, but this is the version I'd write in an interview (and talk about alternatives in the abstract).

#include <stdio.h>
#include <stdint.h>
/**
 * Holds a single entry of our lookup table
 */
typedef struct 
{
 /// The total number of '101' bit patterns which appear in the byte.
 /// E.g. the byte '0101 1010' would have a count of '2'.
 /// We count overlapping patterns of '10101' as having a count of '2'.
 uint8_t count;
 
 /// The number of bits of a partial pattern which appear at the start of the byte.
 /// If the byte starts with '01', this takes the value '2'; if the byte starts with
 /// '1', this takes the value '1'.
 uint8_t startPartial;
 
 /// The number of bits of a partial pattern which appear at the end of a byte.
 /// If the byte ends with '10', this takes the value '2'; if the byte ends with
 /// '1', this takes the value '1'.
 uint8_t endPartial;
} LookupEntry_t;
/// A lookup from byte value -> information about that value
static LookupEntry_t s_lookup[256];
static void BuildLookup(void)
{
 for (int i = 0; i < 256; i++)
 {
 LookupEntry_t* pEntry = &s_lookup[i];
 // Partial pattern at the start of the byte
 if ((i & 0x80) > 0)
 {
 // Starts with '1'
 pEntry->startPartial = 1;
 }
 else if ((i & 0xC0) == 0x40)
 {
 // Starts with '01'
 pEntry->startPartial = 2;
 }
 
 // Partial pattern at the end of the byte
 if ((i & 0x01) > 0)
 {
 // Ends with '1'
 pEntry->endPartial = 1; 
 }
 else if ((i & 0x03) == 0x02)
 {
 // Ends with '10'
 pEntry->endPartial = 2; 
 }
 
 // Number of '101' patterns which appear in the byte
 for (int j = 0; j < 6; j++)
 {
 if (((i >> j) & 0x07) == 0x05)
 {
 pEntry->count++;
 }
 }
 }
}
int main(void) 
{
 BuildLookup();
 
 // 0010 1010 1000 0101
 uint8_t input[] = { 0x2A, 0x85 };
 
 uint32_t count = 0;
 uint8_t prevEndPartial = 0;
 for (size_t i = 0; i < sizeof(input); i++)
 {
 const LookupEntry_t* pEntry = &s_lookup[input[i]];
 
 count += pEntry->count;
 
 // If the previous byte ends with 2 bits of a pattern, and we start with 1 bit;
 // or if the previous byte ends with 1 bit of a pattern and we start with 2 bits,
 // then they combine to form a '101' which is bridged between the two bytes.
 if ((prevEndPartial + pEntry->startPartial) == 3)
 {
 count++;
 }
 
 prevEndPartial = pEntry->endPartial;
 }
 
 printf("Total: %d\n", count);
 
 return 0;
}

Note: that this is pure C, not C++.
Run online.

Question 30

Super trivial comment (because I'm bored): pEntry = &s_lookup[... demonstrates mixing camelCase with snake_case variable names. ("trivial", right?!) Code is very readable and works just fine! I'm just surprised that no-one has pointed this out, so far... Have a great day! Cheers! :-)

Question 31

@Fe2O3 I'd never really thought about it in those terms... It's my company coding standard, and I've seen that convention used elsewhere as well. Similar to a _t suffix for typedefs even when using camel case I guess.

Question 32

Devil's advocate here! If pEntry is a pointer, then sLookup would signal a struct, no? Cheers! :-)

Question 33

@Fe2O3 It's denoting a static rather than a struct. Not dissimilar to the m_ convention for member variables (in C++ and similar, rather than in C, of course). I've also seen g_ for globals

Question 34

Many codebases generally try to follow some sort of convention, but, AFAIK there is no single accepted standard... Me? I try to use the likes of enum { eZero, eOne, eTwo }; That hasn't caught-on yet, though... Cheers! :-)

Question 35

You mention that feedback reported it was "not the most efficient." That suggests that either you, or the interviewers, were focused on efficiency/performance.

I would suggest building a set of lookup tables. Several people have mentioned 10-bit tables, but I would suggest an array of 8-bit tables instead.

With this approach, you are concerned with two questions: how many times does the pattern occur in a single byte; and what bits were available for partial matches in the lead-up to this byte. There is no concern for following bits, since those will be addressed by considering the last bits of the current byte as "lead-up" bits in the next iteration.

So, at start you have two impossible values. I think you can just use 00 for this, since that won't cause any mismatches. But in the "general" case you might consider a special table for this. A table with 00 lead-up bits followed by 256 possible byte values will tell you how many matches you get for the first byte of the bit array. It should also tell you what the two lead-up bits are for the next table. So you can either compute the bits or store the bits. (Endianness matters here. Is 0x01 the first or last bit of the byte?)

Anyway, compute 4 tables (00, 01, 10, 11 lead-up bits) of 256 values, counting the number of times 0b101 occurs in the lead-up plus 8 bits of the index. Update the lead-up and accumulator for each byte. Decide what, if anything, to do with an array size that is not divisible by 8. (Treat them as all zero, IMO.)

That will get you to byte-oriented processing, with a small number of ops per byte. You may or may not get a speed improvement out of fetching word-sized values and then breaking them into byte-sized chunks. (Actually, this is C++ so I know you don't care about code size - make the tables 16 bits instead...)

Question 36

"Several people have mentioned 10-bit tables," Misunderstanding... Existing answers state or imply using something like lut[ 1024 ], whose element size is either unspecified or only a single byte. (One exception to that.) To recommend "4 tables ... of 256 values" is merely another way to address 1024 elements. The rest of this answer is difficult to understand.

user272752 · Accepted Answer · 2024-07-30 05:00:03Z

The OP's code is clean and quite readable.
Kudos for its presentation (seriously!). The function's name, purpose and its variables are unmistakeable, and newer compiler capabilities ("binary constants") have been used.

"Efficiency" seems to be the primary focus behind this question. Careful examination of the operations suggests there might be a "better way"...

"Table Lookup" of, in this case, 10 bit patterns (2¹⁰ = 1024 elements) would be very fast. That technique could rapidly gobble through all the array's bytes of data. The downside to a LUT would be tedious and error-prone manually populating such a single purpose table, OR writing code to generate the 1024 values. The latter approach would entail solving the coding problem for 1024 values, then capturing its output for conversion to a compile-time table. The unknown "use case" would have to justify the work. (And, would be a solution to only one specific search pattern.) Additionally, this is tractable for 8+2=10 bits. It doesn't scale well, though, once the target pattern goes over 10 bits to 12, 18 or ?? bits...

Proving ground

This reviewer admits to difficulty "tracking" the values of too many variables, especially when differences with both constants and changing variable values are stirred into the mix. I prefer a simpler approach...

Below, I've taken the OP's code, lightly changed only a few places in order to use my older compiler, and added two things: 1) A simple "test harness" that deals with a tiny array of only 2 bytes and 2) a revision of the OP's code that still loops in search of the target pattern. (Written as C, but should be good-to-go as C++, I hope.)

Unfortunately, this has shown that there is something not-quite-right in the OP's code. The comparison of the two approaches reveals a discrepancy when the two array values combine to form
0b0000 0001 0100 0000 (320 = 256 + 64),
and the discrepancy obviously continues for more array values higher than that.

Here is the full code, presented "as is" for the OP to study and consider. In brief, it uses a shifting "window" to isolate continuous regions of a 10bit register (sliding left-to-right) and counting matches of the target pattern. (I've not been sitting in an "interview situation" when writing this. NO(!) denigration of the OP or that code is to be inferred from this post, please!) I'd be happy to respond to any questions or comments about this post.

#include <stdio.h>
#include <stdint.h>
int count_pattern_occurrences(const uint8_t* array, size_t size, uint8_t pattern) {
 pattern = 0x5;
 if (size == 0 || pattern > 0x7) {
 return 0; // Array is empty or invalid pattern
 }
 int count = 0;
 uint32_t value = 0; // To hold the accumulated bits
 int bit_count = 0;
 for (size_t i = 0; i < size; ++i) {
 value = (value << 8) | array[i];
 bit_count += 8;
 // Check within the current byte
 for (int bit_pos = bit_count - 8; bit_pos <= bit_count - 3; ++bit_pos) {
 if (((value >> (bit_count - 3 - bit_pos)) & 0x7) == pattern) {
 ++count;
 }
 }
 // Prepare for the next iteration, keeping only the last 2 bits
 value &= 0x3;
 bit_count = 2;
 }
 return count;
}
int occurs_101( uint8_t arr[], size_t size ) {
 int cnt = 0;
 uint32_t bits = 0;
 for( size_t i = 0; i < size; ++i ) {
 bits = ((bits & 0x3) << 8) | arr[i];
 uint32_t pattern = 0x5 << 8;
 uint32_t msk = 0x7 << 8;
 do {
 pattern >>= 1;
 msk >>= 1;
 cnt += (bits & msk) == pattern;
 } while( !(msk & 1) );
 }
 return cnt;
}
int main( void ) {
 uint8_t arr[2] = { 0 };
 do {
 do {
 int v0 = count_pattern_occurrences( arr, 2, 0x5 );
 int v1 = occurs_101( arr, 2 );
 if( v0 != v1 ) {
 printf( "%5d: %d %d\n", ((int)arr[0]<<8)+arr[1], v0, v1 );
 getchar();
 }
 } while( ++arr[1] != 0 );
 } while( ++arr[0] != 0 );
 puts( "All done!" );
 return 0;
}

Beyond this, it becomes apparent that searching through large arrays might benefit from gathering-up several bytes into a uint64_t or even a uint128_t, as has been suggested in another answer, then using some shifting and XOR operations to process many bits simultaneously, followed by some form of 'bit count' on the resulting value. Again, the "use case" would need to be defined to decide how much time and effort should be invested in the problem.

In conclusion, the OP's code is clean and, with some pencil work, could be checked for its functionality. The highest priority has to be the correctness of the code. (And, I admit that I've not checked my own version for all 65536 possible combinations of 2x 8bits of input. This I leave for the OP, or you, the gentle reader, to perform.)

Worthwhile?

Revisiting code that's been left to simmer for a few hours is a good way to shape it into what might be considered "a keeper." Here's a slightly more compact version of the code above:

int occurs_101a( uint8_t arr[], size_t size ) {
 int cnt = 0;
 uint32_t bits = 0;
 for( size_t i = 0; i < size; ++i ) {
 bits = ((bits << 8) | arr[i] ) & 0x3FF;
 for( uint32_t wrk = bits; wrk >= 0x5; wrk >>= 1 )
 cnt += (wrk & 0x7) == 0x5;
 }
 return cnt;
}

It doesn't matter at which end of any 10bit sample register the tabulation begins, so the previous "mask" does not need to be manipulated (shifted). The above also has the possibility of "early determination of contiguous high-order clear bits" that would preclude some fruitless looping.

Worthwhile?

Overboard?

int occurs_101b( uint8_t arr[], size_t size ) {
 int cnt = 0;
 uint32_t b = 0, w; // bits register and working register
 for( size_t i = 0; i < size; ++i ) {
 b = ((b << 8) | arr[i]) & 0x3FF;
 w = ((((b >> 1) & ~b) >> 1) & b) & 0xFF; // 0b101
 for( ; w; cnt++ ) w &= w - 1; // Brian K's algo
 }
 return cnt;
}

The above has an inner loop to include Brian Kernighan's well known "bit count" algorithm. This version is now amenable, with some tweaking, to using wider registers to combine several array bytes for simultaneous 'consolidation' and counting of instances of the target pattern. Note, however, that the target bit pattern is now expressed in bitwise operations making it slightly more difficult to discern.

Always have some fun with whatever you do.

Interview (and career) tip

Here's another version to ponder:

int occurs_101c( uint8_t arr[], size_t size ) {
 int cnt = 0;
 uint32_t b = 0; // bits register
 for( size_t i = 0; i < size; ++i ) {
 b = ((b << 8) | arr[i] ) & 0x3FF;
 cnt += ((b>>0 & 7) == 5) // 0000000101
 + ((b>>1 & 7) == 5) // 0000001010
 + ((b>>2 & 7) == 5) // 0000010100
 + ((b>>3 & 7) == 5) // 0000101000
 + ((b>>4 & 7) == 5) // 0001010000
 + ((b>>5 & 7) == 5) // 0010100000
 + ((b>>6 & 7) == 5) // 0101000000
 + ((b>>7 & 7) == 5);// 1010000000
 }
 return cnt;
}

The above contains quite a few operations, but is "branchless" (apart from the required outer loop). Further, it can be easily "reasoned about" by anyone with a modicum of programming experience. In other words, it is simple to write, simple to read, and its correctness can be verified visually. Although not efficient or fast, its results will always be correct (for the problem as stated.) It's a good starting point! The worst thing a coder can produce is code that works most of the time.

Most of the answers, here, refer to the speed of using a LUT. One of those provides a function, containing some intricate but working code, that populates a LUT suitable for its implementation of the solution. The simple and verifiable function above could be called 1024 times (2¹⁰) with 1024 bit patterns by a caller that, likewise, populates an appropriate LUT (or outputs text suitable for copy/paste into a compile-time array.)

Consider the hypothetical that there is very limited time to write code to solve a problem. Now there are two priorities: Correctness (always), and a Deadline. If the interviewer got nothing better than the code above, the interviewee can highlight its simplicity and its correctness. Incorrect or incomplete solutions both score very low marks. If judged to be "inefficient. Do over.", the code above can be used to populate that efficient LUT with correct values. Or, as demonstrated by the offered "test harness", this simple version can be used to verify the results of a "more efficient" version that uses bit manipulation only.

Aside: Storing 1024 values that vary from 0 to 5, each in a byte, can be improved upon.
Exercise: cut that memory footprint in half by storing (and retrieving) 4bit values from an array of 512 bytes (each being a pair of "nibbles".) How does this affect performance? Can it be written "branchless"?

Recommend not walking away from this problem because it seems to be solved. It is by exploring alternatives that one adds-to or strengthens one's intuition and storehouse of "on-hand" solutions.

Is there a standard way to represent an array of bytes as a sequence of bits? I think [0x01, 0x80] could represent bit 0 being set in the first byte and bit 6 being set in the second byte (becoming bit 14).
@JasonGoemaat Hi there. By convention, 1 byte is (typically) 8 bits labelled (left-to-right) b7 to b0. The easily extends to 1 "word" (double bytes) labelled b15 to b0, and so on as hardware advances increase widths of integer datatypes. Big vs Little Endian describes two CPU architectures, so one has to know how data is processed by the target implementation. This Q&A may serve to explain some considerations. At the end of the day, it's "caveat emptor". There are multiple "standards", so there is no real "standard".... Cheers! :-).
It would probably be good to state the way you were proposing to handle it and ask for clarification. I think they like to see your thought processes as much as anything.
@JasonGoemaat I do not understand your comment. What is the "it" to be handled? What is proposed and where?
'it' is the way to parse the array of bytes into a bit stream. "I plan to treat the bits of each byte from high to low so that bits 1 and 0 of byte 0 match up with bit 8 of byte 1 when checking for the pattern. This way the byte array would form a large big endian number when combined. I could also treat the bits of each byte from low to high so that bits 7 and 8 of byte 0 match up with bit 0 of byte 1, which way would you prefer?"

Stack Exchange Network

Counting Occurrences of a Specific 3-Bit Pattern in a Byte Array

8 Answers 8

Proving ground

Worthwhile?

Overboard?

Interview (and career) tip

What about SIMD

branch mispredictions

table based approach

You must log in to answer this question.

Hot Network Questions

Counting Occurrences of a Specific 3-Bit Pattern in a Byte Array

8 Answers 8

Proving ground

Worthwhile?

Overboard?

Interview (and career) tip

What about SIMD

branch mispredictions

table based approach

You must log in to answer this question.

Related

Hot Network Questions