Help Center/ Cloud Container Engine/ FAQs/ Cluster/ Cluster Running/ How Do I Locate the Fault When a Cluster Is Unavailable?
Updated on 2025-04-25 GMT+08:00

How Do I Locate the Fault When a Cluster Is Unavailable?

This section provides you with some operations to locate the fault when a cluster becomes unavailable.

Troubleshooting

The issues here are described in order of how likely they are to occur.

Check these causes one by one until you find the cause of the fault.

If the fault persists, submit a service ticket and contact the customer service to help you locate the fault.

Check Item 1: Whether the Security Group Is Modified

  1. Log in to the management console and choose Service List > Networking > Virtual Private Cloud. In the navigation pane, choose Access Control > Security Groups and find the security group of the master node in the cluster.

    The name of this security group is in the format of Cluster name-cce-control-ID.

  2. Click the security group. On the details page displayed, ensure that the security group rules of the master node are correct.

    For details, see How Can I Configure a Security Group Rule for a Cluster?

Check Item 2: Whether the Cluster Is Overloaded

Symptom

The resource usage on the master nodes in the cluster reaches 100%.

Possible cause

When a cluster has a large number of resources created simultaneously, it causes an overload on the API server. This, in turn, overloads the master nodes and leads to OOM issues.

Solution

Increase the cluster management scale. A larger cluster management scale means higher capacity and improved performance of the master nodes. For details, see Changing Cluster Scale.

If a cluster is overloaded, you can submit a service ticket for technical support.

Check Item 3: Whether the KMS Key Used for Secret Encryption Is Valid

Symptom

If a cluster is unavailable, you can check the cluster event to locate the fault.

If KMS key status abnormal is displayed in the events, check whether the key used by the cluster is in the Disabled or Pending deletion state.

Solution

  1. Log in to the DEW console.
  2. In the custom key list, find the KMS key used by the cluster.

    • For a key in the Pending deletion state, click Cancel Deletion in the Operation column. If the key remains in a Disabled state even after cancellation, then cancel the action of disabling the key.
    • For a key in the Disabled state, click Enable in the Operation column.

  3. Verify whether the key has been enabled and wait for the cluster to be automatically restored. The restoration process should take about 5 to 10 minutes.

We use cookies to improve our site and your experience. By continuing to browse our site you accept our cookie policy. Find out more