Machine Pools

A Machine Pool is a grouping entity for Machines (Kubernetes workers) which share a set of common configuration such as Kubelet version or network profiles and are updated together ("update domain").

A Machine Pool always belongs to a Cluster.

Shared Responsibility Model

The user is responsible for setting network and disk configuration and triggering the upgrade of the Machine Pool to newer Kubernetes minor versions.
The meltcloud platform is responsible for upgrading the Machine Pool (i.e. the Kubelet) to the latest Kubernetes patch version. To achieve this, the platform will periodically create new revisions with the new patch versions which can be applied by the user (and, in the future, automatically in a configurable maintenance window).

Revisions

Most changes to a Machine Pool (i.e. Kubernetes version upgrades or changes to the network profile) require a reboot of the assigned Machines to adopt the new settings.

To allow for a controlled rollout that doesn't disrupt the workloads, Revisions were introduced:

For each change on Machine Pools that require a reboot, a new Revision containing the new settings is created
The revision can either be rolled out to the pool instantly (by saving the changes with Save Machine Pool & Apply) or applied manually (by saving the changes with Save Machine Pool and applying the revision in the Revisions tab later)

Revision Rollout

When a revision is being applied, the following happens:

Perform sanity checks that the Cluster, the Machine Pools and the Machines are in a healthy state
In a rolling-upgrade strategy, one Machine after another...
- receives a new Machine Revision that references the Machine Pool Revision with the new config
- is cordoned and drained on the Kubernetes API
- is gracefully stopped, boots into the new revision and joins the cluster again
- is uncordoned on the Kubernetes API
- receives workload again.
After all Machines in the pool are on the new revision, the rollout is finished.

INFO

In order for the rolling-upgrade to happen without disruptions, ensure to have spare capacity to tolerate one node being unavailable. Also, as with any Kubernetes setup, ensure that critical applications that can't tolerate downtimes properly have their Pod Disruption Budget configured.

The rollout operation is fully transparent and can be observed in the operations log. If the operation fails or times out, it will transition to the failed state and can be retried afterward.

Machine Pools ​

Shared Responsibility Model ​

Revisions ​

Revision Rollout ​

Machine Pools

Shared Responsibility Model

Revisions

Revision Rollout