Move activation of dispatchers into event loop thread #190

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

milindl merged 9 commits into master from dev_fix_wrong_thread_issue

Dec 12, 2024

Merged

Move activation of dispatchers into event loop thread #190

milindl merged 9 commits into master from dev_fix_wrong_thread_issue

Dec 12, 2024

Conversation

@milindl

Copy link

Contributor

@milindl milindl commented Nov 27, 2024 •

edited

Loading

Reduce test flakiness by adding some time for metadata to propagate, adding some 'sleep's, and disabling two tests which are flaky for librdkafka-reasons. Change group desc. test to account for the possibility of empty->dead state transition.
Move the event-loop-thread specific stuff onto that thread.
See if this makes tests run on semaphore consistently.

@milindl milindl requested review from a team as code owners

November 27, 2024 09:40

@confluent-cla-assistant

Copy link

confluent-cla-assistant bot commented Nov 27, 2024

🎉 All Contributor License Agreements have been signed. Ready to merge.
_{Please push an empty commit if you would like to re-run the checks to verify CLA status for all contributors.}

@milindl milindl mentioned this pull request

Nov 27, 2024

Revert moving ActivateDispatchers to worker Execute #188

Closed

milindl added 3 commits

November 27, 2024 17:19

@milindl


 Reduce test flakiness

0ec9d2b

@milindl


 Move activation of dispatchers to event loop thread

d5031af

@milindl


 Enable more tests on semaphore CI

d477e76

@airlock-confluentinc airlock-confluentinc bot force-pushed the dev_fix_wrong_thread_issue branch from 97bf55b to d477e76 Compare

November 27, 2024 11:53

@milindl


 Update changelog and code comments

664ddd4

@milindl milindl changed the title ~~(削除) [wip] Move activation of dispatchers into event loop thread (削除ここまで)~~ (追記) Move activation of dispatchers into event loop thread (追記ここまで)

Nov 28, 2024

emasab

emasab requested changes

Dec 11, 2024

View reviewed changes

Copy link

Contributor

@emasab emasab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix Milind and thanks @trevorr for finding it!
I request just a few changes to solve the flakiness in a different way.

.semaphore/semaphore.yml Outdated

- docker compose up -d && sleep 30

- export NODE_OPTIONS='--max-old-space-size=1536'

- npx jest --forceExit --no-colors --ci test/promisified/admin/delete_groups.spec.js test/promisified/consumer/pause.spec.js

- npx jest --forceExit --no-colors --ci test/promisified/

Copy link

Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If --forceExit still needed, are there unhandled promises?

Copy link

Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, sorry, left it in by accident.

test/promisified/admin/describe_groups.spec.js Outdated

// Depending on the environment of the test run, the group might transition into

// the DEAD state, so allow for both possibilities.

expect(describeGroupsResult.groups[0].state === ConsumerGroupStates.EMPTY || describeGroupsResult.groups[0].state === ConsumerGroupStates.DEAD).toBeTruthy();

Copy link

Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this won't be needed if some offsets are consumed and committed by the consumer. In that case even if there's a coordinator change, the group would be loaded and be EMPTY instead of DEAD.

test/promisified/consumer/consumeMessages.spec.js Outdated

* to be small and we get multiple partitions in the cache at once.

* This is to reduce flakiness. */

producer = createProducer({}, {

'linger.ms': 1,

Copy link

Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ensured better with:

Suggested change

'linger.ms': 1,

'batch.num.messages': 1,

test/promisified/consumer/consumeMessages.spec.js Outdated

Comment on lines 431 to 433

'fetch.message.max.bytes': 1,

'fetch.max.bytes': 1000,

'message.max.bytes': 1000,

Copy link

Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can leave only this one to get a single batch

Suggested change

'fetch.message.max.bytes': 1,

'fetch.max.bytes': 1000,

'message.max.bytes': 1000,

'fetch.message.max.bytes': 1,

test/promisified/consumer/consumeMessages.spec.js Outdated

Comment on lines 405 to 410

if (partitionsConsumedConcurrently >= 2) {

/* Given how librdkafka merges partition queues, it's very unlikely that

* we get *three* partitions in the cache at one time given how we've produced

* the messages. So just skip it, it'll be very flaky otherwise. */

return;

}

Copy link

Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can verify it with partitionsConsumedConcurrently >= 2
If you set

 const messagesConsumed = [];
 const expectedMaxConcurrentWorkers = Math.min(partitionsConsumedConcurrentlyDiff, partitions);
 const maxConcurrentWorkersReached = new DeferredPromise();

and then

 eachMessage: async event => {
 inProgress++;
 messagesConsumed.push(event);
 inProgressMaxValue = Math.max(inProgress, inProgressMaxValue);
 if (inProgressMaxValue >= expectedMaxConcurrentWorkers) {
 maxConcurrentWorkersReached.resolve();
 } else if (messagesConsumed.length > 30) {
 await sleep(1000);
 }
 inProgress--;
 },

and finally

 await maxConcurrentWorkersReached;
 expect(inProgressMaxValue).toBe(expectedMaxConcurrentWorkers);

the sleep inside the invocation makes sure that worker invocations can overlap and reach that value. First messages are skipped as there's the exponential cache growth.

Copy link

Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got good results on making this change but I changed messagesConsumed.length > 30 to messagesConsumed.length > 2048. As it will make sure we've completely maxed out the cache capacity by this time.

test/promisified/testhelpers.js

Copy link

Contributor

@emasab emasab Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DeferredPromise can be imported and exported from here for usage from rest of the tests.

test/promisified/testhelpers.js Outdated

if (debug) {

common['debug'] = debug;

} else { /* Turn off info logging unless specifically asked for, otherwise stdout gets very crowded. */

common['log_level'] = 1;

Copy link

Contributor

@emasab emasab Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep the logs with level <= 5 (NOTICE) . Otherwise we could not see some error or warning. If there are expected errors in the tests a custom logger could be added later to assert their message and avoid logging it.

test/promisified/admin/describe_groups.spec.js Outdated

isSimpleConsumerGroup: false,

protocolType: 'consumer',

state: ConsumerGroupStates.EMPTY,

isSimpleConsumerGroup: expect.any(Boolean),

Copy link

Contributor

@emasab emasab Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case it works, this change could be reverted

milindl added 5 commits

December 12, 2024 09:23

@milindl


 Merge branch 'master' into dev_fix_wrong_thread_issue

5dc8901

@milindl


 Address review comments

c8e3ab6

@milindl


 Merge branch 'master' into dev_fix_wrong_thread_issue

0c4fd8b

@milindl


 Render the clients unusable after errors in connection

324e46e

@milindl


 Reject connection only after we're done with the entire process

7e88894

emasab

emasab approved these changes

Dec 12, 2024

View reviewed changes

Copy link

Contributor

@emasab emasab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@milindl milindl merged commit 7ec8cda into master

Dec 12, 2024

2 checks passed

@milindl milindl deleted the dev_fix_wrong_thread_issue branch

December 12, 2024 13:52

Labels

None yet

3 participants

@milindl @emasab

Move activation of dispatchers into event loop thread #190

Move activation of dispatchers into event loop thread #190

Uh oh!

Conversation

@milindl milindl commented Nov 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

confluent-cla-assistant bot commented Nov 27, 2024

Uh oh!

@emasab emasab left a comment

Choose a reason for hiding this comment

Uh oh!

@emasab emasab Dec 11, 2024

Choose a reason for hiding this comment

Uh oh!

@milindl milindl Dec 12, 2024

Choose a reason for hiding this comment

Uh oh!

@emasab emasab Dec 11, 2024

Choose a reason for hiding this comment

Uh oh!

@emasab emasab Dec 11, 2024

Choose a reason for hiding this comment

Uh oh!

@emasab emasab Dec 11, 2024

Choose a reason for hiding this comment

Uh oh!

@emasab emasab Dec 11, 2024

Choose a reason for hiding this comment

Uh oh!

@milindl milindl Dec 12, 2024

Choose a reason for hiding this comment

Uh oh!

@emasab emasab Dec 11, 2024

Choose a reason for hiding this comment

Uh oh!

@emasab emasab Dec 11, 2024

Choose a reason for hiding this comment

Uh oh!

@emasab emasab Dec 11, 2024

Choose a reason for hiding this comment

Uh oh!

@emasab emasab left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

@milindl milindl commented Nov 27, 2024 •

edited

Loading