P(X = x) for continuous data is problematic #6

New issue

Open

@mhoehle

Description

@mhoehle

mhoehle

opened

on Sep 18, 2024

DataScienceInteractivePython/Interactive_Model_Fitting.ipynb

Line 56 in adf0515

"P(X | \\hat{f}_{\\beta}) = \\prod_{\\alpha = 1}^{n} P(X_{\\alpha}|\\hat{f}_{\\beta}(X)), \\alpha = 1,\\ldots,n\n",

The notebook useses the P(X | ... ) notation, which I would interpret as the conditional probability of the data. However, linear models would typically be used for continuous response data where P(X_i = | ... ) is zero. Instead, one would use the densities, i.e. small p or f.

Furthermore, since a product is used, this implies that the observations are independent from each other. Hence, as written a little further down:

OLS: - assumes that the errors have a mean of zero, constant variance and are independent of eachother (no correlation in error).

Is incomplete, because the same was assumed for the ML approach.

Altogether, I find that the post a little confusion. As far as I know: For a Gaussian response distribution with KNOWN $\sigma$ the OLS and MLE should be identical. I fail to completely understand what the exact data generating mechanism is in the example due to a lot of code, but for a simple normal X_1,...,X_n \iid N(\mu, \sigma^2) there are explicit solutions available? As a suggestion: Maybe write the data generating mechanism clearer in math notation.

Metadata

Assignees

No one assigned

Labels

No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

P(X = x) for continuous data is problematic #6

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions