Fuzzy Hashing API
Macros | Functions
fuzzy.h File Reference

These functions allow a programmer to compute the fuzzy hashes (also called the context-triggered piecewise hashes) of a buffer of text , the contents of a file on the disk , and the contents of an open file handle . There is also a function to compute the similarity between any two fuzzy signatures . More...

#include <stdint.h>
#include <stdio.h>

Go to the source code of this file.

Macros

#define  FUZZY_FLAG_ELIMSEQ   0x1u
  fuzzy_digest flag indicating to eliminate sequences of more than three identical characters
 
#define  FUZZY_FLAG_NOTRUNC   0x2u
  fuzzy_digest flag indicating not to truncate the second part to SPAMSUM_LENGTH/2 characters.
 
#define  SPAMSUM_LENGTH   64
 
#define  FUZZY_MAX_RESULT   (2 * SPAMSUM_LENGTH + 20)
 

Functions

struct fuzzy_statefuzzy_new (void)
  Construct a fuzzy_state object and return it. More...
 
struct fuzzy_statefuzzy_clone (const struct fuzzy_state *state)
  Create a copy of a fuzzy_state object and return it. More...
 
int  fuzzy_set_total_input_length (struct fuzzy_state *state, uint_least64_t total_fixed_length)
  Set fixed length of input. More...
 
int  fuzzy_update (struct fuzzy_state *state, const unsigned char *buffer, size_t buffer_size)
  Feed the data contained in the given buffer to the state. More...
 
int  fuzzy_digest (const struct fuzzy_state *state, char *result, unsigned int flags)
  Obtain the fuzzy hash from the state. More...
 
void  fuzzy_free (struct fuzzy_state *state)
  Dispose a fuzzy state. More...
 
int  fuzzy_hash_buf (const unsigned char *buf, uint32_t buf_len, char *result)
  Compute the fuzzy hash of a buffer. More...
 
int  fuzzy_hash_file (FILE *handle, char *result)
  Compute the fuzzy hash of a file using an open handle. More...
 
int  fuzzy_hash_stream (FILE *handle, char *result)
  Compute the fuzzy hash of a stream using an open handle. More...
 
int  fuzzy_hash_filename (const char *filename, char *result)
  Compute the fuzzy hash of a file. More...
 
int  fuzzy_compare (const char *sig1, const char *sig2)
 

Detailed Description

These functions allow a programmer to compute the fuzzy hashes (also called the context-triggered piecewise hashes) of a buffer of text , the contents of a file on the disk , and the contents of an open file handle . There is also a function to compute the similarity between any two fuzzy signatures .

Macro Definition Documentation

FUZZY_MAX_RESULT

#define FUZZY_MAX_RESULT   (2 * SPAMSUM_LENGTH + 20)

The longest possible length for a fuzzy hash signature (without the filename)

SPAMSUM_LENGTH

#define SPAMSUM_LENGTH   64

Length of an individual fuzzy hash signature component.

Function Documentation

fuzzy_clone()

struct fuzzy_state* fuzzy_clone ( const struct fuzzy_statestate )

Create a copy of a fuzzy_state object and return it.

It can be used with fuzzy_update and fuzzy_digest independently of the original. It must be disposed with fuzzy_free like the original has to be cleared in this way.

Parameters
state The fuzzy state
Returns
the cloned fuzzy_state or NULL on failure

fuzzy_compare()

int fuzzy_compare ( const char *  sig1,
const char *  sig2 
)

Computes the match score between two fuzzy hash signatures.

Returns
Returns a value from zero to 100 indicating the match score of the two signatures. A match score of zero indicates the signatures did not match. When an error occurs, such as if one of the inputs is NULL, returns -1.

fuzzy_digest()

int fuzzy_digest ( const struct fuzzy_statestate,
char *  result,
unsigned int  flags 
)

Obtain the fuzzy hash from the state.

This operation does not change the state at all. It reports the hash for the concatenation of the data previously fed using fuzzy_update.

Parameters
state The fuzzy state
result Where the fuzzy hash is stored. This variable must be allocated to hold at least FUZZY_MAX_RESULT bytes.
flags is a bitwise or of FUZZY_FLAG_* macros. The absence of flags is represented by a zero.
Returns
zero on success, non-zero on error

fuzzy_free()

void fuzzy_free ( struct fuzzy_statestate )

Dispose a fuzzy state.

Parameters
state The fuzzy state to dispose

fuzzy_hash_buf()

int fuzzy_hash_buf ( const unsigned char *  buf,
uint32_t  buf_len,
char *  result 
)

Compute the fuzzy hash of a buffer.

The computes the fuzzy hash of the first buf_len bytes of the buffer. It is the caller's responsibility to append the filename, if any, to result after computation.

Parameters
buf The data to be fuzzy hashed
buf_len The length of the data being hashed
result Where the fuzzy hash of buf is stored. This variable must be allocated to hold at least FUZZY_MAX_RESULT bytes.
Returns
Returns zero on success, non-zero on error.

fuzzy_hash_file()

int fuzzy_hash_file ( FILE *  handle,
char *  result 
)

Compute the fuzzy hash of a file using an open handle.

Computes the fuzzy hash of the contents of the open file, starting at the beginning of the file. When finished, the file pointer is returned to its original position. If an error occurs, the file pointer's value is undefined. It is the callers's responsibility to append the filename to the result after computation.

Parameters
handle Open handle to the file to be hashed
result Where the fuzzy hash of the file is stored. This variable must be allocated to hold at least FUZZY_MAX_RESULT bytes.
Returns
Returns zero on success, non-zero on error

fuzzy_hash_filename()

int fuzzy_hash_filename ( const char *  filename,
char *  result 
)

Compute the fuzzy hash of a file.

Opens, reads, and hashes the contents of the file 'filename' The result must be allocated to hold FUZZY_MAX_RESULT characters. It is the caller's responsibility to append the filename to the result after computation.

Parameters
filename The file to be hashed
result Where the fuzzy hash of the file is stored. This variable must be allocated to hold at least FUZZY_MAX_RESULT bytes.
Returns
Returns zero on success, non-zero on error.

fuzzy_hash_stream()

int fuzzy_hash_stream ( FILE *  handle,
char *  result 
)

Compute the fuzzy hash of a stream using an open handle.

Computes the fuzzy hash of the contents of the open stream, starting at the current file position until reaching EOF. Unlike fuzzy_hash_file the stream is never seeked. If an error occurs, the result as well as the file position are undefined. It is the callers's responsibility to append the filename to the result after computation.

Parameters
handle Open handle to the stream to be hashed
result Where the fuzzy hash of the file is stored. This variable must be allocated to hold at least FUZZY_MAX_RESULT bytes.
Returns
Returns zero on success, non-zero on error

fuzzy_new()

struct fuzzy_state* fuzzy_new ( void  )

Construct a fuzzy_state object and return it.

To use it call fuzzy_update and fuzzy_digest on it. It must be disposed with fuzzy_free.

Returns
the constructed fuzzy_state or NULL on failure

fuzzy_set_total_input_length()

int fuzzy_set_total_input_length ( struct fuzzy_statestate,
uint_least64_t  total_fixed_length 
)

Set fixed length of input.

If we know the file size to compute fuzzy digest, we can boost computation by restricting range of blocksize.

Parameters
state The fuzzy state
total_fixed_length Total length of the data to generate digest
Returns
0 on success or -1 on failure

fuzzy_update()

int fuzzy_update ( struct fuzzy_statestate,
const unsigned char *  buffer,
size_t  buffer_size 
)

Feed the data contained in the given buffer to the state.

When an error occurs, the state is undefined. In that case it must not be passed to any function besides fuzzy_free.

Parameters
state The fuzzy state
buffer The data to be hashes
buffer_size The length of the given buffer
Returns
zero on success, non-zero on error

Generated on Wed Sep 6 2017 10:15:46 for Fuzzy Hashing API by   doxygen 1.8.13

AltStyle によって変換されたページ (->オリジナル) /