Health Check for Kubernetes | CommunityBridge Program

Jayesh
6 min readNov 5, 2020

In this article, I talk about the CommunityBridge program and what I built as a Mentee for CoreDNS. This is the first article in a series where I’ll detail the thoughts and reasoning behind my choices while building my project.

The summer of 2020 was full of challenges. With the pandemic raging and everybody locked into their homes, staying productive and putting in long hours became really tricky for me. I was fortunate to have been accepted as a CommunityBridge Mentee by my mentors, Yong Tang and Paul Greenberg. The project I worked on helped me learn and better utilize my time in an increasingly timid situation around the world. I’ll talk about what the CommunityBridge program is and then explain the motivation behind the project and my solution to the problem.

CommunityBridge Program

It is a mentorship program funded by the Linux Foundation, Intel Corp. and other people, that promotes open source collaboration between developers and experts from the industry on important projects.

From the CommunityBridge website

It is similar to Google Summer Of Code (GSoC) in that you submit proposals to the problem statements put up by different project maintainers, interact with the mentors and discuss potential solutions. You also receive stipends for most projects. However, it is limited to projects that are linked to the Linux ecosystem, like those who are CNCF-graduated.

The mentorship program has many benefits for the students taking part:

  • Helps mentees learn and enhance their technical skills, and inspires them to become long-term active contributors.
  • Teaches aspiring developers the open source culture and collaboration norms, and guides them to participate in an open source community more effectively by using collaboration tools and infrastructure.
  • Learn from top developers in open source.
  • Receive free training and certifications on courses from the Linux Foundation.
  • Get job opportunity referrals with employers upon graduation.

How to Apply

The CommunityBridge program runs throughout the second half of the year.
The schedule for the year 2020 was as follows:

  • First Program: July 1 — September 30, 2020
  • Second Program: August 1 — October 31, 2020
  • Third Program: September 1 — November 30, 2020

The CommunityBridge Mentorship website lists all the projects that are accepting applications. It is a good idea to keep checking the page for finding new projects to work on.

I would recommend exploring all projects to find out what suits your interest and then finding out resources to learn the concepts that’ll enable you to start working on them.
The projects require some knowledge of Kubernetes, Linux and Cloud Native technologies. However, don’t fret if you have never worked with them; you can always learn on the fly or take it up - right now. You can reach out to me if you need any direction or help with resources!

Health Check for CoreDNS Pods

I had applied for a project under CoreDNS that had me working on creating a health check. I had initially prepared the proposal for the GSoC listing but later re-applied for CommunityBridge after a second project under CoreDNS was selected for GSoC. I’ll list below, the details about the project and in subsequent articles, talk more about every part of the architecture.

Objective

CoreDNS is the cluster DNS server for Kubernetes and is very critical for the overall health of the Kubernetes cluster. It is important to monitor the health of CoreDNS itself and restarting or repairing any CoreDNS pods that are not behaving correctly.

While CoreDNS exposes a health check itself in the form of Kubernetes’ livenessProbe:

  • The health check is not UDP (DNS) based. There have been cases where the health port is accessible (for TCP) but CoreDNS itself isn’t (UDP). This protocol difference means that CoreDNS is unhealthy from a cluster standpoint, but the control plane can’t see this.
  • The existing health check is also launched locally (the kubelet uses the livenessProbe) and the situation could be different for pods accessing it remotely.

Motivation and Scope

The project idea aims to get around limitations on Kubernetes’ health check and build an application that:

  • Checks CoreDNS health externally through UDP (DNS), from a remote Golang application.
  • Remedies CoreDNS pods which involves restarting, increasing memory limits, changing network configuration and more, if the response from the cluster and pod IPs is unsatisfactory. This is done by interacting with Kubernetes API through the Golang application.
  • Is flexible to allow changes in configuration and target different pods. The idea was to have some sort of an “Engine” as a template that can be modified to suit the user’s need.

Architecture

To accomplish the goals listed earlier, I thought of an architecture for the application, inspired by existing products and technologies. The binary would have to be designed to access all network namespaces and run commands in them. As such, I would have to deploy it on every node, just like a CNI plugin. This could be achieved through a DaemonSet.

(I read about CNI and watched CNCF’s videos online to understand its principles. I will detail my research in a later article.)

The binary as a part of a cluster.

Inside a node, there exist different pods bound to their respective namespaces. The binary is deployed on the host network and is thus on the root network namespace.

Pods with their network namespaces

Organization

In order for the solution to be comprehensible and to allow other open source developers to extend the app, following a standard project structure is of prime importance. I learnt about this repository for Golang projects, from one of my mentors, Paul Greenberg. I found it really valuable in assisting me to separate my logic into different libraries and in separating the executable app from the libraries.

The two main directories under root

The pkg directory has different packages that each perform some specific task.

  • engine : It uses all other packages to provide a template to start the application quickly with configurable parameters.
  • health : It has functions that help with connecting to the api-server, looking up pods and services, performing DNS queries on them and fixing failed deployments.
  • netns : It has functions that allow the application to enter any network namespace and perform DNS queries from there. It also allows modifying network configuration to rectify any failures.
  • exporter: It registers important metrics with a Prometheus client and has functions to collect and export them to a Prometheus channel.

The cmd directory has the applications themselves.

  • The coredns-healthcheck directory has a Golang program that utilizes the engine package to work as a full fledged health check binary. The implementation of it includes the information about what pods and services to perform the check on.
  • The dnsq directory contains an implementation of the dig tool by Miek Gieben, using the dns package. I have made use of it inside my application to perform DNS queries on the ports for pods and services.

Conclusion

That is all, keeping in mind the scope of this article.

In subsequent articles, I’ll talk about how I implemented the different packages and what I learnt and where I learnt it from.

I hope I could get you excited about open-source contribution through this article. Please feel free to reach out to me if you want to talk about technology or just about anything else 💖

Repository: Github

Socials:
LinkedIn
Twitter
Instagram

--

--

Jayesh

MLOps @ZenML, IIT Bhubaneswar Graduate | Varied interests | Love interacting with people. I crave knowledge 📒📰