After spending years looking at our platform and studying Google’s Site Reliability Engineering practices and experimenting with their tools, we decided that Kubernetes, based on Google’s Borg distribution architecture was the best suited for our next iteration of our Lighthouse platform. 

What is Kubernetes?

Kubernetes is open-source technology based on Google’s internal Borg architecture which they used internally for many years and was later released publicly. It is bleeding edge technology that harnesses the power of container orchestration.

Container orchestration is a loaded term, but while it is complex it is not complicated to understand. 

Let’s begin with the building blocks:

A container contains only the software and dependencies that you require to get your application running. This means your software runs in its own environment without the need to download or import external dependencies. This is similar to a virtual machine, except it only contains what it needs and is therefore lightweight and simple. 

The bottom line is, if it works on my computer then it works on your computer, as the container provides everything that is needed independently. 

Where do we run our containers?

Typically containers need their own environments to run in. If you are developing on a single machine then typically the application will run in something like Docker. If however you want to run multiple images among multiple nodes or a cluster, then Kubernetes is the preferred option. 

Kubernetes is the most flexible option because all the configuration (not only the containers), the environment, the application configurations, the network configurations can all be defined as code.

Why is this a good thing?

By storing our configurations in code we can have one source of truth and store our configurations in Git. This allows us to deploy changes much more easily, safely and with more speed and efficiency.

There are many benefits this gives us:

Scalability – we can easily scale up and our load can be distributed amongst many nodes which will improve performance 

Availability – this also relates to reliability because if one pod or container is not functioning or the node dies then there are other containers that are left to pick up the load

Simplicity – It fits in perfectly with our extreme programming and continuous delivery working model which focuses on small iterative changes 

Control – having everything in code makes things easy to control how it runs and where it runs

Observability – we are provided with more insights into how it runs and were it runs and because we are in the business of providing insights, the information we gather is passed on to our product and enriches the information we provide

Technology – because we have better access and everything is simplified, it will be easier for us to integrate with new disruptive technologies such as machine learning, big data, AI and whatever is on its way. 

This is not comprehensive but illustrates in a small way the power that will be made available by adopting this tech as we grow together towards a more reliable future.