What Is a Service Mesh?

April 7, 2022

In This Article

Service Mesh Defined
What Issues Does a Service Mesh Solve?
What Capabilities Does a Service Mesh Add?
Service Mesh Architecture

Service Mesh Defined

A service mesh is an infrastructure layer that manages communication in a distributed services architecture. As microservices have become the most successful pattern in deploying distributed services, service meshes are most commonly used in microservices architectures.

Explore Cloud Native Services

What Issues Does a Service Mesh Solve?

As application development has evolved, applications have become more complex. Monolithic applications are now broken down into smaller microservices that communicate over the network using APIs. This causes an explosion of network traffic and increases the complexity and overall attack surface of the architecture.

Application developers focus on business logic—things such as purchasing a product or generating an invoice in an application. But to make the application more resilient, secure, and easier to manage, developers need to add network functionality to their services—functions such as network logic for time-outs and encryption to secure the communication and export of metrics to simplify management. Networking, however, may not be a core skill of developers, and adding these functionalities distracts developers from their primary task, which is to add value to their applications. In addition, network logic needs to be implemented individually with each service, which may be built using different languages. This makes the implementations nonuniform, and developers need to repeatedly add these functions as new services are introduced.

A service mesh solves these issues by removing the application's network functionality and putting it into a separate process. This frees up developers to focus on adding value to their applications and ensures that network functionality is uniform across services. This also allows network features to develop faster as the development of network functions becomes decoupled from the application itself.

What Capabilities Does a Service Mesh Add?

A service mesh allows you to transparently add observability, traffic management, and security to your services without having to modify your code.

Security

Traditionally, with a monolithic application, security has focused on the perimeter of the application. Firewalls allow only trusted connectivity into the application, and traffic is encrypted from the user to ensure the integrity of the connection. Communication inside the network, however, is considered secure and is sent without encryption or firewall rules.

We can no longer make this assumption with microservices, as each service can exist in a different geographic location. An example of this would be a Kubernetes cluster that is deployed in different availability domains in Oracle Cloud Infrastructure (OCI). With a service mesh, service-to-service communication can be secured using the following capabilities:

Authentication of services: A service mesh can provide service-to-service identification. At the beginning of workload-to-workload communication, the two parties must exchange credentials with their identity information. This allows the services to identify each other and confirm if they are authorized to interact. This is implemented with mutual TLS (Transport Layer Security) using the public key infrastructure to generate keys and certificates. Mutual TLS, or mTLS, is a type of mutual authentication in which the two parties in a connection authenticate each other using the TLS protocol.
Encryption of interservice communication: Service meshes can encrypt requests and responses and then decrypt them. Typically traffic is encrypted using mutual TLS, with the public key infrastructure generating keys and certificates.
Enforcement of security-related policies: A service mesh can enforce secure communication between services and only allow validated requests for communication that originate both outside and inside the application.

Traffic Management

With a service mesh, network services can be provided with various control capabilities.

Traffic shifting: Traffic can be split between one or more versions of a service at various degrees of load per version, which allows for A/B testing and canary deployment.
Load balancing: A service mesh can provide load balancing functionality using different conditions and criteria, such as locality at layer 7.
Time-out retries: A service mesh can customize HTTP time-out parameters for each service.
Circuit breaking: With a service mesh, unhealthy instances can be isolated and brought back gradually as needed.
Fault injection: A service mesh can add faults to an environment to test failover and resiliency.
Mirroring: A service can send a copy of live traffic to a mirrored service that is out of band of the critical request. This can be used for debugging and troubleshooting the application.

Observability

A service mesh is uniquely positioned to provide telemetry information as all interservice communication must pass through it. This allows the service mesh to capture telemetry data such as source, destination, protocol, URL, duration, status code, latency, logging, and other detailed statistics. This information can be exported to tools to provide monitoring, tracing, and logging of applications to understand traffic flow and debug problems as they occur.

Monitoring: A service mesh can export metrics to tools such as Prometheus and Kiali to monitor the performance and health of your services.
Tracing: Distributed tracing provides a way to monitor individual requests end to end as they flow through each service. The data can be exported to back ends such as Jaeger and Zipkin to monitor the flow of the request.
Logging: A service mesh can generate access logs for service traffic in a configurable set of formats, providing operators with complete control of the how, what, when, and where of logging. Logging information can be exported to tools such as Splunk and Datadog.

Service Mesh Architecture

The most common service mesh implementation has been with the Kubernetes container orchestration platform, which uses a sidecar model. This architecture encapsulates the code that implements the network functionality in a layer 4 “sidecar” proxy and then relies on traffic from and to services to be redirected to this sidecar proxy. It is called a sidecar because a proxy is attached to each application, much like a sidecar attached to a motorbike.

In Kubernetes, the application container sits alongside the proxy sidecar container in the same pod. Since they are in the same pod, they share the same network namespace and IP address, which allows the containers to communicate via localhost.

All network traffic from the application container to other services is unencrypted until it reaches the local proxy container. It’s the responsibility of the proxy container to perform service discovery, encrypt the traffic, and send it to the destination service. It also applies the configured network policy to the network traffic.

The traffic is then received by the proxy container at the receiving service. Authentication is performed by the proxy pods on both sides, and traffic is decrypted by the receiving proxy container and sent to the receiving application container. The proxy containers also perform any retry logic and export metrics and logs.

Diagram showing how a service mesh architecture works. — Figure 1. Service mesh architecture

The service mesh is divided into two main components: the data plane and the control plane.

The data plane is composed of the collection of sidecar proxies deployed in the environment and is responsible for the security, network functions, and observability of the application. Data planes also collect and report telemetry data on all mesh traffic. The Envoy proxy is the most used proxy for a service mesh data plane.
The control plane manages and configures the entire collection of proxies to route traffic. It handles forwarding, health checking, load balancing, authentication, authorization, and the aggregation of telemetry data. The control plane also handles certificate management and provides each proxy with its certificate. The following are examples of service meshes with control plane implementations:
- Istio
- Linkerd