Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

mauro-idsia/blip

Repository files navigation

blip

This is the "Bayesian network Learning Improved Project" (blip), an open-source Java package that offers a wide range of structure learning algorithms. It is developed my Mauro Scanagatta and it is distributed under the LGPL-3 by IDSIA.

It focuses on score-based learning, mainly the BIC and the BDeu score functions, and allows the user to learn BNs from datasets containing thousands of variables. It provides state-of-the-art algortihms for the following tasks: parent set identification ( BIC ), general structure optimization (WINASOBS-ENT), bounded treewidth structure optimization (KMAX) and structure learning on incomplete data sets (SEM-KMAX).

An R binding is also available: (https://github.com/mauro-idsia/r.blip).

References

This package implements the algorithms detailed in the following papers:

Usage

The process of learning a bounded-treewidth BN is explained by using the "child" network as example.

Dataset format

The format for the initial dataset has to be the same as the file "child-5000.dat", namely a space-separated file containing:

* First line: list of variables names, separated by space;
* Second line: list of variables cardinalities, separated by space;
* Following lines: list of values taken by the variables in each datapoint, separated by space.

Parent set identification

The first step is build the parent sets score cache. The state-of-the-art approach is to use BIC* (for the BIC score):

java -jar blip.jar scorer.is -d data/child-5000.dat -j data/child-5000.jkl -t 10 -b 0 

Main options:

  • -d VAL : Datafile input path (.dat format)
  • -j VAL : Parent set scores output file (.jkl format)
  • -t N : Maximum time limit, in seconds (default: 10)
  • -b N : Number of machine cores to use - if 0, all are used (default: 1)

General structure optimization

Given the parent sets score cache, now it is time to learn the structure. The state-of-the-art approach is to use WINASOBS (Windows operator applied to ASOBS) with ENT (entropy-based) ordering:

java -jar blip.jar solver.winasobs.adv -smp ent -d data/child-5000.dat -j data/child-5000.jkl -r data/child.wa.res -t 10 -b 0

Main options:

  • -smp VAL : Advanced sampler (possible values: std, mi, ent, r_mi, r_ent)
  • -d VAL : Datafile input path (.dat format)
  • -j N : Parent set scores input file (.jkl format)
  • -r VAL : Structure output file (.res format)
  • -t N : Maximum time limit, in seconds (default: 10)
  • -b N : Number of machine cores to use - if 0, all are used (default: 1)

Bounded-treewidth structure optimization

Given the parent sets score cache, it is possible to learn a structure under a bounded treewidth constraints. The state-of-the-art approach is to use k-max:

For perfoming with k-max:

java -jar blip.jar solver.kmax -w 4 -j data/child-5000.jkl -r data/child-5000.kmax.res -t 10 -b 0

Main options:

  • -w N : Maximum treewidth allowed
  • -j N : Parent set scores input file (.jkl format)
  • -r VAL : Structure output file (.res format)
  • -t N : Maximum time limit, in seconds (default: 10)
  • -b N : Number of machine cores to use - if 0, all are used (default: 1)

Structure learning from incomplete data sets

To learn a structure from data containing missing values the state-of-the-art approach is to use SEM-kMAX:

java -jar blip.jar imputation.sem -d data/child-5000-missing.dat -o data/child-5000-imputed.dat -r data/child.res -t 1 -tmp data/tmp -w 6 -b 0

Main options:

  • -d VAL : Datafile (with missing valus) input path (.dat format)
  • -o VAL : Datafile (with imputed values) output path (.dat format)
  • -r VAL : Structure output file (.res format)
  • -t N : Time regulation parameter (default: 1)
  • -tmp VAL : Temporary directory
  • -w N : Learning treewidth (default: 6)
  • -b N : Number of machine cores to use - if 0, all are used (default: 1)

Interpreting the result

The format of the ".res" file is as follows: each line indicates the parent set assigned to each variable and its score.

For example the line "4: -2797.39 (10,17,18)" indicates that to the variable with index 4 in the dataset are assgined as parents the variables with index (10,17,18). This parent set has score -2797.39 (by default the score function is the BIC).

Learn the parameters

Using the structure found it is possible to learn the parameters with:

java -jar blip.jar parle -d data/child-5000.dat -r data/child-5000.kmax.res -n data/child-5000.kmax.uai

Main options:

  • -d VAL : Datafile input path (.dat format)
  • -r VAL : Structure input file (.res format)
  • -n VAL : BN output file (.uai format)

The final output will be a full Bayesian network in UAI format.

About

Bayesian network Learning Improved Project

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

Languages

AltStyle によって変換されたページ (->オリジナル) /