Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Multiple errors after upgrade #5671

Open
Labels
needs-priority needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.
@josemrs

Description

After checking multiple breaking changes, I thought got it under control, apparently not.

We run EKS 1.32 AWSManagedControlPlanes with 1.32 AWSManagedMachinePools with AL2 custom AMIs

The upgrade was going to be in 2 stages, first to "latest 1beta1" then latest 1beta2 as it is recommended here

So I did:

./clusterctl-v1.10.6 upgrade plan
Checking new release availability...
Latest release available for the v1beta1 API Version of Cluster API (contract):
NAME NAMESPACE TYPE CURRENT VERSION NEXT VERSION
bootstrap-kubeadm capi-kubeadm-bootstrap-system BootstrapProvider v1.7.3 v1.10.6
control-plane-kubeadm capi-kubeadm-control-plane-system ControlPlaneProvider v1.7.3 v1.10.6
cluster-api capi-system CoreProvider v1.7.3 v1.10.6
infrastructure-aws capa-system InfrastructureProvider v2.5.2 v2.9.1
You can now apply the upgrade by executing the following command:
clusterctl upgrade apply --contract v1beta1

So I run the upgrade command to do the intermediate upgrade and I got all upgraded, however, both, CAPI and CAPA, started complaining constantly about reconciliation and connection errors.

Perhaps is this but I thought I had it under control because of this

These are the logs, I tried to pick only the ones for one particular cluster, we have almost 30, all failing like this.

Logs from capa-controller-manager
I0919 10:58:42.605598 1 awsmanagedmachinepool_controller.go:202] "Reconciling AWSManagedMachinePool" controller="awsmanagedmachinepool" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSManagedMachinePool" AWSManagedMachinePool="prod/services-prod-pool-ap-southeast-2a" namespace="prod" name="services-prod-pool-ap-southeast-2a" reconcileID="82996b04-ef8f-4b26-b570-95f5010121cb" MachinePool="prod/services-prod-pool-ap-southeast-2a" cluster="prod/services.REDACTED"
I0919 10:58:42.605729 1 launchtemplate.go:81] "checking for existing launch template" controller="awsmanagedmachinepool" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSManagedMachinePool" AWSManagedMachinePool="prod/services-prod-pool-ap-southeast-2a" namespace="prod" name="services-prod-pool-ap-southeast-2a" reconcileID="82996b04-ef8f-4b26-b570-95f5010121cb" MachinePool="prod/services-prod-pool-ap-southeast-2a" cluster="prod/services.REDACTED"
[...]
I0919 10:58:45.429754 1 tags.go:128] "Reconciling ASG tags" controller="awsmanagedmachinepool" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSManagedMachinePool" AWSManagedMachinePool="prod/services-prod-pool-ap-southeast-2a" namespace="prod" name="services-prod-pool-ap-southeast-2a" reconcileID="82996b04-ef8f-4b26-b570-95f5010121cb" MachinePool="prod/services-prod-pool-ap-southeast-2a" cluster="prod/services.REDACTED" cluster-name="services_ap-southeast-2_prod_alienvault_cloud" nodegroup-name="services-prod-pool-ap-southeast-2a"
Logs from capi-controller-manager
E0919 11:01:39.644472 1 controller.go:347] "Reconciler error" err="Object prod/services.REDACTED is already owned by another MachinePool controller services-prod-pool-prometheus-ap-southeast-2" controller="machinepool" controllerGroup="cluster.x-k8s.io" controllerKind="MachinePool" MachinePool="prod/services-prod-pool-ap-southeast-2b" namespace="prod" name="services-prod-pool-ap-southeast-2b" reconcileID="dd96348e-37dc-4d9d-90f8-33b72cca5aa1"
E0919 11:01:42.691574 1 controller.go:347] "Reconciler error" err="Object prod/services.REDACTED is already owned by another MachinePool controller services-prod-pool-prometheus-ap-southeast-2" controller="machinepool" controllerGroup="cluster.x-k8s.io" controllerKind="MachinePool" MachinePool="prod/services-prod-pool-ap-southeast-2c" namespace="prod" name="services-prod-pool-ap-southeast-2c" reconcileID="4b104a11-3d94-401f-b227-c89eceb45e71"
E0919 11:01:44.009112 1 controller.go:347] "Reconciler error" err="Object prod/services.REDACTED is already owned by another MachinePool controller services-prod-pool-prometheus-ap-southeast-2" controller="machinepool" controllerGroup="cluster.x-k8s.io" controllerKind="MachinePool" MachinePool="prod/services-prod-pool-ap-southeast-2c" namespace="prod" name="services-prod-pool-ap-southeast-2c" reconcileID="35240758-7625-420d-85cc-517b095fa4f4"
E0919 11:01:52.674593 1 controller.go:347] "Reconciler error" err="Object prod/services.REDACTED is already owned by another MachinePool controller services-prod-pool-prometheus-ap-southeast-2" controller="machinepool" controllerGroup="cluster.x-k8s.io" controllerKind="MachinePool" MachinePool="prod/services-prod-pool-ap-southeast-2a" namespace="prod" name="services-prod-pool-ap-southeast-2a" reconcileID="5cd6d5a9-452a-474b-bcff-09ad0e98e6a1"
E0919 11:01:52.952752 1 controller.go:347] "Reconciler error" err="Object prod/services.REDACTED is already owned by another MachinePool controller services-prod-pool-prometheus-ap-southeast-2" controller="machinepool" controllerGroup="cluster.x-k8s.io" controllerKind="MachinePool" MachinePool="prod/services-prod-pool-ap-southeast-2a" namespace="prod" name="services-prod-pool-ap-southeast-2a" reconcileID="36a5298a-d1d2-4e8c-a7e3-da275b13d90b"
Logs from capi-kubeadm-bootstrap-controller-manager
I0919 10:57:44.297447 1 cluster_accessor.go:320] "Disconnecting" controller="clustercache" controllerGroup="cluster.x-k8s.io" controllerKind="Cluster" Cluster="prod/services.REDACTED" namespace="prod" name="services.REDACTED" reconcileID="de112319-22c9-4bc8-a248-da3869cb4f13"
I0919 10:57:44.297492 1 cluster_accessor.go:327] "Disconnected" controller="clustercache" controllerGroup="cluster.x-k8s.io" controllerKind="Cluster" Cluster="prod/services.REDACTED" namespace="prod" name="services.REDACTED" reconcileID="de112319-22c9-4bc8-a248-da3869cb4f13"
I0919 10:57:44.298712 1 cluster_accessor.go:252] "Connecting" controller="clustercache" controllerGroup="cluster.x-k8s.io" controllerKind="Cluster" Cluster="prod/services.REDACTED" namespace="prod" name="services.REDACTED" reconcileID="b212685a-8419-4acd-8ff3-7d893b41a2e3"
I0919 10:57:47.933214 1 cluster_accessor.go:274] "Connected" controller="clustercache" controllerGroup="cluster.x-k8s.io" controllerKind="Cluster" Cluster="prod/services.REDACTED" namespace="prod" name="services.REDACTED" reconcileID="b212685a-8419-4acd-8ff3-7d893b41a2e3"
Logs from capi-kubeadm-control-plane-system
I0919 11:00:09.828007 1 cluster_accessor.go:320] "Disconnecting" controller="clustercache" controllerGroup="cluster.x-k8s.io" controllerKind="Cluster" Cluster="prod/services.ap-southeast-2.prod.alienvault.cloud" namespace="prod" name="services.ap-southeast-2.prod.alienvault.cloud" reconcileID="f74b3271-9d4b-4b6a-95a7-7abe21839a7b"
I0919 11:00:09.828056 1 cluster_accessor.go:327] "Disconnected" controller="clustercache" controllerGroup="cluster.x-k8s.io" controllerKind="Cluster" Cluster="prod/services.ap-southeast-2.prod.alienvault.cloud" namespace="prod" name="services.ap-southeast-2.prod.alienvault.cloud" reconcileID="f74b3271-9d4b-4b6a-95a7-7abe21839a7b"
I0919 11:00:09.829332 1 cluster_accessor.go:252] "Connecting" controller="clustercache" controllerGroup="cluster.x-k8s.io" controllerKind="Cluster" Cluster="prod/services.ap-southeast-2.prod.alienvault.cloud" namespace="prod" name="services.ap-southeast-2.prod.alienvault.cloud" reconcileID="95222f01-14a5-4e4b-bec3-372e95d9b983"
I0919 11:00:13.479651 1 cluster_accessor.go:274] "Connected" controller="clustercache" controllerGroup="cluster.x-k8s.io" controllerKind="Cluster" Cluster="prod/services.ap-southeast-2.prod.alienvault.cloud" namespace="prod" name="services.ap-southeast-2.prod.alienvault.cloud" reconcileID="95222f01-14a5-4e4b-bec3-372e95d9b983"

This is the config of this particular cluster:

ap-southeast-2 cluster YAML
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
 name: services.REDACTED
 namespace: prod
 annotations:
 argocd.argoproj.io/sync-wave: "0"
spec:
 clusterNetwork:
 pods:
 cidrBlocks:
 - 192.168.0.0/16
 controlPlaneRef:
 apiVersion: controlplane.cluster.x-k8s.io/v2beta2
 kind: AWSManagedControlPlane
 name: services.REDACTED
 infrastructureRef:
 apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
 kind: AWSManagedCluster
 name: services.REDACTED
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSManagedCluster
metadata:
 name: services.REDACTED
 namespace: prod
 annotations:
 argocd.argoproj.io/sync-wave: "10"
spec: {}
---
apiVersion: controlplane.cluster.x-k8s.io/v1beta2
kind: AWSManagedControlPlane
metadata:
 name: services.REDACTED
 namespace: prod
 annotations:
 argocd.argoproj.io/sync-wave: "20"
spec:
 associateOIDCProvider: true
 eksClusterName: services_REDACTED_1
 region: ap-southeast-2
 version: v1.32.0
 network:
 vpc:
 id: vpc-XXXXXXXXXX
 subnets:
 - id: subnet-X
 - id: subnet-Y
 - id: subnet-Z
 securityGroupOverrides: 
 node-eks-additional: sg-W
 endpointAccess:
 private: true
 public: false
 bastion:
 enabled: false
 oidcIdentityProviderConfig:
 identityProviderConfigName: Okta
 issuerUrl: https://.okta.com/oauth2/XXXXXXXXXXXX
 clientId: XXXXXXXXX
 usernameClaim: preferred_username
 groupsClaim: groups
 groupsPrefix: "okta:"
 logging:
 apiServer: false
 controllerManager: false
 audit: false
 authenticator: false
 scheduler: false
 iamAuthenticatorConfig:
 mapRoles:
 - username: "kubernetes-admin"
 rolearn: "arn:aws:iam::XXXXXXXXXXXX:role/saas-OktaAdmins"
 groups:
 - "system:masters"
 addons:
 - name: "kube-proxy"
 version: "v1.32.6-eksbuild.6"
 conflictResolution: "overwrite"
 - name: "vpc-cni"
 version: "v1.20.1-eksbuild.1"
 conflictResolution: "overwrite"
 - name: "aws-ebs-csi-driver"
 version: "v1.48.0-eksbuild.1"
 conflictResolution: "overwrite"
 serviceAccountRoleARN: "arn:aws:iam::XXXXXXXXXXXX:role/prod-AmazonEKS_EBS_CSI_DriverRole"
 vpcCni:
 env:
 - name: POD_SECURITY_GROUP_ENFORCING_MODE
 value: standard
 - name: ENABLE_POD_ENI
 value: "true"
 - name: ENABLE_PREFIX_DELEGATION
 value: "true"
 additionalTags:
 Owner: "EngOps"
 created_by: "https://bitbucket.org/redacted/capi-cluster"
 Environment: "prod"
 identityRef:
 kind: AWSClusterRoleIdentity
 name: prod
 roleAdditionalPolicies:
 - arn:aws:iam::aws:policy/AmazonEKSVPCResourceController
---
apiVersion: bootstrap.cluster.x-k8s.io/v1beta2
kind: EKSConfig
metadata:
 name: services.REDACTED
 namespace: prod
spec:
 boostrapCommandOverride: "# Self-bootstrap embedded in AMI, doing nothing here for cluster"
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachinePool
metadata:
 name: services-prod-pool-prometheus-ap-southeast-2
 namespace: prod
 annotations:
 cluster.x-k8s.io/replicas-managed-by: "external-autoscaler"
 argocd.argoproj.io/sync-wave: "30"
spec:
 clusterName: services.REDACTED
 replicas: 2
 failureDomains:
 - ap-southeast-2a
 - ap-southeast-2b
 template:
 spec:
 bootstrap:
 configRef:
 apiVersion: bootstrap.cluster.x-k8s.io/v1beta2
 kind: EKSConfig
 name: services.REDACTED
 namespace: prod
 dataSecretName: services.REDACTED
 clusterName: services.REDACTED
 infrastructureRef:
 apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
 kind: AWSManagedMachinePool
 name: services-prod-pool-prometheus-ap-southeast-2
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSManagedMachinePool
metadata:
 name: services-prod-pool-prometheus-ap-southeast-2
 namespace: prod
 annotations:
 argocd.argoproj.io/sync-wave: "30"
spec:
 eksNodegroupName: services-prod-pool-prometheus
 availabilityZones:
 - ap-southeast-2a
 - ap-southeast-2b
 scaling:
 minSize: 2
 maxSize: 4
 updateConfig:
 maxUnavailable: 1
 awsLaunchTemplate:
 instanceType: m5.large
 ami:
 id: ami-YYYYYY
 labels:
 usm.io/role: prometheus
 taints:
 - key: dedicated
 effect: no-schedule
 value: prometheus
 subnetIDs:
 - subnet-X
 - subnet-Y
 roleAdditionalPolicies:
 - arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachinePool
metadata:
 name: services-prod-pool-ap-southeast-2a
 namespace: prod
 annotations:
 cluster.x-k8s.io/replicas-managed-by: "external-autoscaler"
 argocd.argoproj.io/sync-wave: "40"
spec:
 clusterName: services.REDACTED
 replicas: 2
 failureDomains:
 - ap-southeast-2a
 template:
 spec:
 bootstrap:
 configRef:
 apiVersion: bootstrap.cluster.x-k8s.io/v1beta2
 kind: EKSConfig
 name: services.REDACTED
 namespace: prod
 dataSecretName: services.REDACTED
 clusterName: services.REDACTED
 infrastructureRef:
 apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
 kind: AWSManagedMachinePool
 name: services-prod-pool-ap-southeast-2a
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSManagedMachinePool
metadata:
 name: services-prod-pool-ap-southeast-2a
 namespace: prod
 annotations:
 argocd.argoproj.io/sync-wave: "40"
spec:
 eksNodegroupName: services-prod-pool-ap-southeast-2a
 availabilityZones:
 - ap-southeast-2a
 scaling:
 minSize: 2
 maxSize: 25
 updateConfig:
 maxUnavailablePercentage: 40
 subnetIDs:
 - subnet-X
 awsLaunchTemplate:
 instanceType: m5.xlarge
 ami:
 id: ami-YYYYYY
 roleAdditionalPolicies:
 - arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachinePool
metadata:
 name: services-prod-pool-ap-southeast-2b
 namespace: prod
 annotations:
 cluster.x-k8s.io/replicas-managed-by: "external-autoscaler"
 argocd.argoproj.io/sync-wave: "41"
spec:
 clusterName: services.REDACTED
 replicas: 2
 failureDomains:
 - ap-southeast-2b
 template:
 spec:
 bootstrap:
 configRef:
 apiVersion: bootstrap.cluster.x-k8s.io/v1beta2
 kind: EKSConfig
 name: services.REDACTED
 namespace: prod
 dataSecretName: services.REDACTED
 clusterName: services.REDACTED
 infrastructureRef:
 apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
 kind: AWSManagedMachinePool
 name: services-prod-pool-ap-southeast-2b
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSManagedMachinePool
metadata:
 name: services-prod-pool-ap-southeast-2b
 namespace: prod
 annotations:
 argocd.argoproj.io/sync-wave: "41"
spec:
 eksNodegroupName: services-prod-pool-ap-southeast-2b
 availabilityZones:
 - ap-southeast-2b
 scaling:
 minSize: 2
 maxSize: 25
 updateConfig:
 maxUnavailablePercentage: 40
 subnetIDs:
 - subnet-Y
 awsLaunchTemplate:
 instanceType: m5.xlarge
 ami:
 id: ami-YYYYYY
 roleAdditionalPolicies:
 - arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachinePool
metadata:
 name: services-prod-pool-ap-southeast-2c
 namespace: prod
 annotations:
 cluster.x-k8s.io/replicas-managed-by: "external-autoscaler"
 argocd.argoproj.io/sync-wave: "42"
spec:
 clusterName: services.REDACTED
 replicas: 2
 failureDomains:
 - ap-southeast-2c
 template:
 spec:
 bootstrap:
 configRef:
 apiVersion: bootstrap.cluster.x-k8s.io/v1beta2
 kind: EKSConfig
 name: services.REDACTED
 namespace: prod
 dataSecretName: services.REDACTED
 clusterName: services.REDACTED
 infrastructureRef:
 apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
 kind: AWSManagedMachinePool
 name: services-prod-pool-ap-southeast-2c
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSManagedMachinePool
metadata:
 name: services-prod-pool-ap-southeast-2c
 namespace: prod
 annotations:
 argocd.argoproj.io/sync-wave: "42"
spec:
 eksNodegroupName: services-prod-pool-ap-southeast-2c
 availabilityZones:
 - ap-southeast-2c
 scaling:
 minSize: 2
 maxSize: 25
 updateConfig:
 maxUnavailablePercentage: 40
 subnetIDs:
 - subnet-Z
 awsLaunchTemplate:
 instanceType: m5.xlarge
 ami:
 id: ami-YYYYYY
 roleAdditionalPolicies:
 - arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs-priority needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

      Relationships

      None yet

      Development

      No branches or pull requests

      Issue actions

        AltStyle によって変換されたページ (->オリジナル) /