I made a search method because I want to use binary data like PNG files. The point of this is to be able to find strings like IEND
in a PNG file. I think it works properly. Any improvements/fixes would be appreciated.
public static int findString(byte[] in, byte[] find) {
boolean done = false;
for(int i = 0; i < in.length; i++) {
if(in[i] == find[0]) {
for(int ii = 1; ii < find.length; i++) {
if(in[i+ii] != find[ii]) break;
else if(ii==find.length-1) done = true;
}
if(done) return i-1;
}
}
return -1;
}
4 Answers 4
If you want to look for a particular chunk of bytes in a PNG file, don't scan it naïvely byte by byte. Not only would it be slow, you could also accidentally match some data that happens to look like a marker.
Instead, you should take advantage of the chunk layout described in the PNG specification. Each chunk is tagged with a type and a length. If the tag is of a type that you are not interested in, seek ahead to the next chunk — the length field will tell you how many bytes to skip.
I recommend naming the parameters haystack
and needle
for clarity. (It's one of the few things PHP did well!)
You don't test your bounds properly. This test crashes with an ArrayIndexOutOfBoundsException
:
byte[] haystack = new byte[] { 1, 2, 3, 4, 5 };
byte[] needle = new byte[] { 4, 5, 6 };
findString(haystack, needle);
The conditional if(in[i] == find[0])
is superfluous. Just have the inner loop start with ii = 0
.
Avoid using variables like done
pointlessly. If you're done, and it's possible to return the result immediately, then do it.
I might have misunderstood something but both of these tests fail:
@Test public void test3() { byte[] in = { 1, 2, 3, 4, 5, 6 }; byte[] find = { 3, 4, 5 }; assertEquals(2, findString(in, find)); // returns -1 } @Test public void test5() { byte[] in = { 1, 2, 3, 4, 5, 6 }; byte[] find = { 4, 5, 6 }; assertEquals(3, findString(in, find)); // returns -1 }
I think you should increase
ii++
here instead ofi
:for(int ii = 1; ii < find.length; i++) {
So, try to use variables which are easer to tell apart from each other.
Anyway, don't reinvent the wheel, there is a library for that! Guava has a
Bytes
class with the following method:public static int indexOf(byte[] array, byte[] target)
Returns the start position of the first occurrence of the specified target within array, or -1 if there is no such occurrence.
It's open-source, so you can compare their code with yours.
See also: Effective Java, 2nd edition, Item 47: Know and use the libraries (The author mentions only the JDK's built-in libraries but I think the reasoning could be true for other libraries too.)
Here another implementation, probably more readable:
public static int search(byte[] input, byte[] searchedFor) {
//convert byte[] to Byte[]
Byte[] searchedForB = new Byte[searchedFor.length];
for(int x = 0; x<searchedFor.length; x++){
searchedForB[x] = searchedFor[x];
}
int idx = -1;
//search:
Deque<Byte> q = new ArrayDeque<Byte>(input.length);
for(int i=0; i<input.length; i++){
if(q.size() == searchedForB.length){
//here I can check
Byte[] cur = q.toArray(new Byte[]{});
if(Arrays.equals(cur, searchedForB)){
//found!
idx = i - searchedForB.length;
break;
} else {
//not found
q.pop();
q.addLast(input[i]);
}
} else {
q.addLast(input[i]);
}
}
return idx;
}