Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

chore: add retry to SageMaker steps in integration tests #162

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
shivlaks merged 4 commits into main from shivlaks/add-retry
Sep 10, 2021

Conversation

Copy link
Contributor

@shivlaks shivlaks commented Sep 10, 2021
edited
Loading

Summary

Currently, we observe a few different failures that occur during integration tests, which get
executed as a part of the PR build as well as pushes to branches.

createModel and createEndpoint particularly see failures most frequently and they are
primarily:

  • Rate exceeded - ThrottlingException.

This change defines a default retry strategy that makes 5 attempts, over an interval of 5
seconds, which backs off with a multiplier of 2. The methodology behind this strategy is
naive and may need some calibration. It should reduce the frequency of failures in the
short term.

We can adjust the retry strategy as we go and expand to something more API specific as
the need arises.

Testing

  • ran integ tests a few times locally - ensured they had the retry in the ASL definition and
    executed through successfully.

rendered retry from the StateMachine definition on sagemaker steps:

"Retry": [
 {
 "ErrorEquals": [
 "SageMaker.AmazonSageMakerException"
 ],
 "IntervalSeconds": 5,
 "MaxAttempts": 5,
 "BackoffRate": 2
 }
 ]
  • also pushed some dummy / trivial commits to this PR to trigger simultaneous builds. haven't
    seen any state machine failures yet 🤞

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

wong-a reacted with thumbs up emoji ca-nguyen reacted with rocket emoji
Copy link
Contributor

@ca-nguyen ca-nguyen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work!
This will allow us to push changes without worrying about triggering a build failures!

Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildProject6AEA49D1-sEHrOdk7acJc
  • Commit ID: c788e7b
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@shivlaks shivlaks merged commit 74d0f07 into main Sep 10, 2021
@wong-a wong-a deleted the shivlaks/add-retry branch September 10, 2021 21:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Reviewers
2 more reviewers

@wong-a wong-a wong-a approved these changes

@ca-nguyen ca-nguyen ca-nguyen approved these changes

Reviewers whose approvals may not affect merge requirements
Assignees
No one assigned
Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

AltStyle によって変換されたページ (->オリジナル) /