Fix incorrect DROP_SIZE usage #296

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

jserv merged 1 commit into sysprog21:master from Mike1117:fix/drop-size

Jul 1, 2025

Merged

Fix incorrect DROP_SIZE usage #296

jserv merged 1 commit into sysprog21:master from Mike1117:fix/drop-size

Jul 1, 2025

Conversation

Mike1117

Copy link

Contributor

@Mike1117 Mike1117 commented Jun 28, 2025

Previously, measure() only recorded timings for indices in the middle range [DROP_SIZE, N_MEASURES - DROP_SIZE), while update_statistics() assumed all entries were available from index 0.

This mismatch led to statistical analysis on unmeasured (zero-filled) samples, potentially skewing results or preventing detection thresholds from being reached.

Now:

measure() records all samples
update_statistics() discards DROP_SIZE samples on both ends
Sample accounting matches ENOUGH_MEASURE estimation

Change-Id: Ibb1515043da5f56d72fe34fd5c78e2283df9a993

@Mike1117 Mike1117 changed the title ~~(削除) Fix: correct DROP_SIZE usage (削除ここまで)~~ (追記) Fix incorrect DROP_SIZE usage (追記ここまで)

Jun 28, 2025

@Mike1117 Mike1117 force-pushed the fix/drop-size branch from aea35da to 75aeade Compare

June 28, 2025 08:25

jserv

jserv requested changes

Jun 28, 2025

View reviewed changes

Copy link

Contributor

@jserv jserv left a comment •

edited

Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show more experiment results about "statistical analysis on unmeasured (zero-filled) samples, potentially skewing results or preventing detection thresholds from being reached.

@Mike1117

Copy link

Contributor Author

Mike1117 commented Jun 28, 2025

To validate the claim about "statistical analysis on unmeasured (zero-filled) samples", I added debug output in update_statistics() to print all exec_times[i] and corresponding classes[i].

Below are two outputs:

Before fix (measure() only measures middle range)

Note: exec_times[i] = 0 for unmeasured indices (head and tail), but update_statistics() still included them starting from i = 10.

i = 10 | class = 1 | exec_time = 0
i = 11 | class = 1 | exec_time = 0
i = 12 | class = 0 | exec_time = 0
...
i = 18 | class = 0 | exec_time = 0
i = 19 | class = 1 | exec_time = 0
i = 20 | class = 0 | exec_time = 1178
i = 21 | class = 1 | exec_time = 1140
i = 22 | class = 1 | exec_time = 1064
...
i = 129 | class = 1 | exec_time = 1064
i = 130 | class = 0 | exec_time = 0
...
i = 143 | class = 1 | exec_time = 0
i = 144 | class = 0 | exec_time = 0
i = 145 | class = 1 | exec_time = 0
i = 146 | class = 1 | exec_time = 0
i = 147 | class = 0 | exec_time = 0
i = 148 | class = 1 | exec_time = 0
i = 149 | class = 0 | exec_time = 0

This shows:

[10,19] and [130,149] contain zero values
These were being pushed into t_push(...) → skewing statistical results
Consider the t-value definition:
$t = \frac{\bar{X}_1 - \bar{X}_2}{\sqrt{\frac{s_1^2}{N_1} + \frac{s_2^2}{N_2}}}$

Zero-filled execution times will

Reduce the sample means $\bar{X}_1$, $\bar{X}_2$
Inflate variances $s_1^2$, $s_2^2$

Both of them will suppress the t-value, especially when the sample size is still small, and will cause the test to overestimate the number of additional measurements required.
Furthermore, this may delay or entirely prevent the detection of actual timing leaks, resulting in false negatives.

After fix (measure() records all, update_statistics() uses DROP_SIZE range)

i = 0 | class = 0 | exec_time = 1102
i = 1 | class = 1 | exec_time = 1101
...
i = 149 | class = 1 | exec_time = 1063

Now all samples have proper timing values
update_statistics() starts from i = DROP_SIZE and end in i < N_MEASURES - DROP_SIZE.
This avoids head/tail measurements to eliminate measurement bias caused by warm-up effects
Results become consistent and threshold will be reached as expected

This supports the need for consistent range handling between measurement and statistical analysis.

@jserv

Copy link

Contributor

jserv commented Jun 29, 2025

This supports the need for consistent range handling between measurement and statistical analysis.

Revise git commit messages accordingly.

@Mike1117


 Fix incorrect DROP_SIZE usage

Previously, measure() only recorded timings for indices in the middle
range [DROP_SIZE, N_MEASURES - DROP_SIZE), while update_statistics()
assumed all entries were starting available from index 10. This
mismatch allowed zero-valued exec_times from unmeasured head and tail
indices to be included in the t-test, reducing sample means, inflating
variances, and suppressing the t-value, which may lead to
incorrect results or prevent detection thresholds from being reached.
After the fix, all samples are measured and
update_statistics() discards DROP_SIZE samples at both ends. This
ensures correct sample accounting, prevents overestimating the number
of measurements required, and avoids false negatives due to
uninitialized timing data.
Change-Id: Ibb1515043da5f56d72fe34fd5c78e2283df9a993

@Mike1117 Mike1117 force-pushed the fix/drop-size branch from 75aeade to 3312405 Compare

June 30, 2025 06:00

@Mike1117

Copy link

Contributor Author

Mike1117 commented Jun 30, 2025

The commit message has been revised as requested and force-pushed to update the PR.

@jserv jserv merged commit 0f12ec4 into sysprog21:master

Jul 1, 2025

1 of 2 checks passed

@jserv

Copy link

Contributor

jserv commented Jul 1, 2025

Thank @Mike1117 for contributing!

Labels

None yet

2 participants

@Mike1117 @jserv

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix incorrect DROP_SIZE usage #296

Fix incorrect DROP_SIZE usage #296

Uh oh!

Conversation

@Mike1117 Mike1117 commented Jun 28, 2025

Uh oh!

@jserv jserv left a comment •

edited

Loading

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Mike1117 commented Jun 28, 2025

Before fix (measure() only measures middle range)

After fix (measure() records all, update_statistics() uses DROP_SIZE range)

Uh oh!

jserv commented Jun 29, 2025

Uh oh!

Mike1117 commented Jun 30, 2025

Uh oh!

Uh oh!

jserv commented Jul 1, 2025

Uh oh!

Uh oh!

Fix incorrect DROP_SIZE usage #296

Fix incorrect DROP_SIZE usage #296

Uh oh!

Conversation

@Mike1117 Mike1117 commented Jun 28, 2025

Uh oh!

@jserv jserv left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Mike1117 commented Jun 28, 2025

Before fix (measure() only measures middle range)

After fix (measure() records all, update_statistics() uses DROP_SIZE range)

Uh oh!

jserv commented Jun 29, 2025

Uh oh!

Mike1117 commented Jun 30, 2025

Uh oh!

Uh oh!

jserv commented Jul 1, 2025

Uh oh!

Uh oh!

@jserv jserv left a comment •

edited

Loading