1

I'm trying to use the PGMPY package for python to learn the parameters of a bayesian network. If I understand expectation maximization correctly, it should be able to deal with missing values. I am currently experimenting with a 3 variable BN, where the first 500 datapoints have a missing value. There are no latent variables. Although the description in pgmpy suggests that it should work with missing values, I get an error. This error only occurs when calling the function with datapoints that have missing values. Am I doing something wrong? Or should I look into another package for EM with missing values?

#import
import numpy as np
import pandas as pd
from pgmpy.estimators import BicScore, ExpectationMaximization
from pgmpy.models import BayesianNetwork
from pgmpy.estimators import HillClimbSearch
# Read data that does not contain any missing values
data = pd.read_csv("asia10K.csv")
data = pd.DataFrame(data, columns=["Smoker", "LungCancer", "X-ray"])
test_data = data[:2000]
new_data = data[2000:]
# Learn structure of initial model from data
bic = BicScore(test_data)
hc = HillClimbSearch(test_data)
model = hc.estimate(scoring_method=bic)
# create some missing values
new_data["Smoker"][:500] = np.NaN
# learn parameterization of BN
bn = BayesianNetwork(model)
bn.fit(new_data, estimator=ExpectationMaximization, complete_samples_only=False)

The error I get is an indexing error:

 File "main.py", line 100, in <module>
 bn.fit(new_data, estimator=ExpectationMaximization, complete_samples_only=False)
 File "C:\Python38\lib\site-packages\pgmpy\models\BayesianNetwork.py", line 585, in fit
 cpds_list = _estimator.get_parameters(n_jobs=n_jobs, **kwargs)
 File "C:\Python38\lib\site-packages\pgmpy\estimators\EM.py", line 213, in get_parameters
 weighted_data = self._compute_weights(latent_card)
 File "C:\Python38\lib\site-packages\pgmpy\estimators\EM.py", line 100, in _compute_weights
 weights = df.apply(lambda t: self._get_likelihood(dict(t)), axis=1)
 File "C:\Python38\lib\site-packages\pandas\core\frame.py", line 8833, in apply
 return op.apply().__finalize__(self, method="apply")
 File "C:\Python38\lib\site-packages\pandas\core\apply.py", line 727, in apply
 return self.apply_standard()
 File "C:\Python38\lib\site-packages\pandas\core\apply.py", line 851, in apply_standard
 results, res_index = self.apply_series_generator()
 File "C:\Python38\lib\site-packages\pandas\core\apply.py", line 867, in apply_series_generator
 results[i] = self.f(v)
 File "C:\Python38\lib\site-packages\pgmpy\estimators\EM.py", line 100, in <lambda>
 weights = df.apply(lambda t: self._get_likelihood(dict(t)), axis=1)
 File "C:\Python38\lib\site-packages\pgmpy\estimators\EM.py", line 76, in _get_likelihood
 likelihood *= cpd.get_value(
 File "C:\Python38\lib\site-packages\pgmpy\factors\discrete\DiscreteFactor.py", line 195, in get_value
 return self.values[tuple(index)]
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

Thanks!

asked Mar 18, 2022 at 13:30

1 Answer 1

2

Since there is still no answer to your specific question, let me propose a solution with another module:

#import 
import pandas as pd
import numpy as np
import pyAgrum as gum
# Read data that does not contain any missing values
data = pd.read_csv("asia10K.csv")
# not exactly the same names
data = pd.DataFrame(data, columns=["smoking", "lung_cancer", "positive_XraY"]) 
test_data = data[:2000]
new_data = data[2000:].copy() 
# Learn structure of initial model from data
learner=gum.BNLearner(test_data)
learner.useScoreBIC()
learner.useGreedyHillClimbing()
model=learner.learnBN()
# create some missing values
new_data["smoking"][:500] = "?" # instead of NaN
# learn parameterization of BN
bn = gum.BayesNet(model)
learner2=gum.BNLearner(new_data,model)
learner2.useEM(1e-10)
learner2.fitParameters(bn)

In a notebook : EM in a notebook

answered Mar 29, 2022 at 21:58
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.