tools/fitdata.cpp File Reference

Gather statistics from dataset for MPLSH tuning. More...

#include <cstdlib>
#include <gsl/gsl_multifit.h>
#include <boost/program_options.hpp>
#include <boost/progress.hpp>
#include <lshkit.h>


Namespaces

namespace tr1

Functions

bool is_good_value (double v)
int main (int argc, char *argv[])

Detailed Description

Gather statistics from dataset for MPLSH tuning.

This program gahters statistical data from a small sample dataset for automatic MPLSH parameter tuning. It carries out the following steps:

  1. Sample N points from the dataset. Only those N points will be used for future computation.
  2. Sample P pairs of points from the sample, calculate the distance for each pair.
  3. Sample Q points from the sample as queries points.
  4. Divide the sample into F folds.
  5. For i = 1 to F, take i folds and run K-NN search, so the query points will be searched against sample datasets of N/F, 2N/F, ..., N/F points.

The statistical data is printed to standard output after the progress display.

Allowed options:
 -h [ --help ] produce help message.
 -N [ -- ] arg (=0) number of points to use
 -P [ -- ] arg (=50000) number of pairs to sample
 -Q [ -- ] arg (=1000) number of queries to sample
 -K [ -- ] arg (=100) search for K nearest neighbors
 -F [ -- ] arg (=10) divide the sample to F folds
 -D [ --data ] arg data file

Get LSHKIT at SourceForge.net. Fast, secure and Free Open Source software downloads doxygen

AltStyle によって変換されたページ (->オリジナル) /