A crash course on Service Mesh

Written on March 27, 2023
Estimated reading time : 8 mins

Like Kubernetes, the topic of service meshes can get quite complex once you get into the details of implementation. Based on my 3 years of experience with service meshes, I am attempting to simplify the journey of someone looking to makes sense of this vast topic. This is related to my Kubernetes series of posts.

Introduction

  • At a high level, a service mesh ensures communication between applications.
  • Specifically, a service mesh is a tool for adding observability, security, and reliability features to applications by inserting these features at the platform layer rather than the application layer (where libraries like Twitter’s Finagle, Netflix’s Hystrix, and Google’s Stubby were used).
  • The term was introduced in 2016 by William Morgan, Buoyant CEO which eventually led to the creation of the Linkerd 1.0 service mesh. You can read his followup blog post here.

Do service meshes only work with Kubernetes?

  • Service meshes are usually associated with Kubernetes. K8s network architecture and layered approach are well suited for service meshes.
  • Some service meshes like Linkerd 2.x work only with Kubernetes. Others like Istio, Consul and Cilium can work with applications deployed on Virtual Machines too though the setup will likely involve some additional effort.

Features

  • Observability
    • Organizations can get observability support (e.g., metrics, logs, and traces) as well as dependency or service graphs for each of their services (microservice or not), as they adopt a service mesh.
  • Security
    • A service mesh can help in setting up a zero trust security model.
    • Authentication, authorization and encrypting traffic between services(mTLS) can be taken up by a service mesh.
    • Most service meshes provide a certificate authority (CA) to manage keys and certificates for securing service-to-service communication.
  • Reliability
    • Resiliency features typically include circuit-breaking, latency-aware load balancing, eventually consistent service discovery, retries, timeouts, and deadlines.
    • Service meshes also safeguard service reliability by enforcing a timeout on long-running requests. It can ensure services don’t get overloaded by utilizing techniques like circuit breaking.

Service mesh cons

  • Some service meshes can be quite resource heavy(eg: Istiod uses 1 vCPU and 1.5 GB of memory).
  • Additional network hops for the traffic.
  • Operational complexity can significantly rise for some service meshes wth high learning curve.
  • With increase in maturity in k8s and in network CNIs, a number of features are already present and your dependency on service mesh may not be as as much as before.

Service mesh vs Event mesh

Event mesh and service mesh complement each other in the enterprise by providing two different but effective communication options.

  • Event mesh connects not only microservices but also legacy applications, cloud-native services, devices, and data sources/sinks. These can operate both in cloud and non-cloud environments.
  • While event mesh is asynchronous, service mesh supports more traditional synchronous request-reply messaging.

Service mesh architecture types

Sidecar proxy

"service-mesh-sidecar"

  • The most popular pattern for implementing a service mesh is the sidecar pattern.
  • It involves deploying a network proxy for every service instance which handles all communication between the services. This is part of the service mesh data plane which is controlled by the mesh control plane.
  • Many service meshes like Istio, Consul, Cilium use Envoy as proxy.

Host based proxy

  • A host based proxy approach involves using a shared agent running on each node/vm of a cluster as proxy.
  • It supposed to be a leaner alternative to the sidecar approach as Envoy based sidecars require a fair bit of resources to run (eg: Istiod uses 1 vCPU and 1.5 GB of memory).
  • There are service meshes out there who use host based proxy along with sidecar based proxy for better security and division of responsibilities. Eg: Istio’s new Ambient mesh (this mode is not ready for production yet though).

Istio’s Ambient mesh which uses both host based(ztunnel) and sidecar proxy(waypoint): "istio-ambient-mesh"

eBPF based

Isovalent’s Cilium service mesh architecture: "ebpf-service-mesh"

  • Extended Berkeley Packet Filter (eBPF) is a feature of the Linux kernel that allows applications to do certain types of work in the kernel itself. eBPF can be used to replace iptables rules, and accelerate the data plane by shortening the data path.
  • There are efforts being made with eBPF to have an improved performance with sidecar free service meshes. Currently Cilium is a service mesh which uses eBPF and a node based proxy(Envoy). Istio is also experimenting on this with Merbridge.
  • The Linux kernel has decades of features and safeguards in it. It looks difficult to have all proxy features of a service mesh in the kernel with eBPF(especially layer 7 features) but many organisations are looking into using eBPF for optimisation. We need to wait and watch to see how the winds blow here.

Comparison of different service meshes

The below table compares 4 prominent service meshes. For others you can look at https://servicemesh.es/(very detailed with a bit of outdated info and lesser meshes) or https://layer5.io/service-mesh-landscape.

Factor Istio Linkerd2 Consul Connect Cilium
First stable release 31 Jul 2018 18 Sep 2018 16 Oct 2017 24 Apr 2018
Repository Istio Linkerd 2.0 Consul Cilium
Language go (control plane), C++(data plane ie Envoy) go (control plane), rust (data plane) go go
Supporting organizations Lyft, Google, IBM, Microsoft Cloud Native Foundation (CNCF) HashiCorp Isovalent
Workloads Kubernetes + VMs Kubernetes only Kubernetes + VMs Kubernetes + VMs
Architecture : Single point of failure No – uses sidecar per pod No No Partial - node proxy makes the services in affected node vulnerable
Architecture : Proxy Sidecar proxy (customised Envoy) Sidecar proxy (Linkerd2-proxy) Sidecar proxy (Envoy) eBPF + node proxy (Envoy)
Architecture : Security Sidecar approach ensures high security. Sidecar approach ensures high security. Sidecar approach ensures high security. Host proxy approach isnt considered as secure.
mTLS Yes Yes Yes Yes
Security : Certificate Management Yes Yes Yes (with Vault integration) Yes
Communication Protocols TCP, HTTP/1.x, HTTP/2, gRPC TCP, HTTP/1.x, HTTP/2, gRPC TCP, HTTP/1.x, HTTP/2, gRPC TCP, HTTP/1.x, HTTP/2, gRPC
Traffic Management Blue/Green Deployments, Circuit Breaking, Fault Injection, Rate Limiting Blue/Green Deployments, Fault Injection Blue/Green Deployments, Circuit Breaking, Fault Injection, Rate Limiting Blue/Green Deployments, Circuit Breaking, Fault Injection, Rate Limiting
Multicluster Support Yes Yes Yes Yes
Ingress Istio gateway or Nginx ingress controller Any Envoy and Ambassador Any
Operations Complexity High Low Medium Medium (debugging with eBPF can be hard)
Learning curve High Medium High (plenty of moving parts) Medium
Resources footprint High Medium High Low
Support Largest community support Large community support + Enterprise support Solid Enterprise support Community support + Enterprise support

References





Feel free to share this article :

submit to reddit

Add your thoughts, questions, doubts, suggestions as comments below :