Five key and emerging trends in cloud observability

In a world of highly abstracted, typically virtualised, often ephemeral and always dynamic cloud computing resources, the need to achieve continuous observability is key. However, the cloud was not created with observability of internal systems in mind; it was initially sold as a key route to IT agility through resource flexibility and cost manageability.

Now that cloud is here and adoption is growing, we need to stand back and assess our observability capability. In addition, as cloud-native implementations now span public, private, hybrid, multi-cloud (multiple vendors) instances, we can start to think about poly cloud, where different parts of an application and data service workloads are separated out over various Cloud Service Providers (CSPs).

With roots in control theory, observability in the modern cloud era manifests itself in many forms, so what key drivers are shaping the way we stick our head in the clouds to get a better view?

APM is everywhere

Many ask what the difference is between cloud observability and APM (Application Performance Monitoring). We used to ‘just simply’ have virtual machines, which meant that blocks or instances of compute could be comparatively easily exposed to observability.

We now live in the world of nested virtualisation, Software-Defined Infrastructure (SDI) and cloud services. Our application workloads are often surrounded by layers of software (also “applications”): operating systems, proxies, orchestration software, container engine, virtual machine, external service and more.

As APM has become almost synonymous with observability, we now see it extend to every tier and structure throughout the IT stack. We need APM for applications, obviously, but we also need infrastructure APM (iAPM, if you will) and it needs to be capable of being directed at any of the stars in the virtualised galaxy we now exist in.

We might be at the time when there is no need to differentiate between APM and non application monitoring. We can and the industry already leverage tools that allow monitoring and observability of software of all kinds in cloud, in similar way.

Establishing a strong network monitoring strategy

This article will explore how organisations can establish a strong network monitoring strategy, to ensure that connectivity and vulnerabilities are quickly mitigated. Read here

A federated centralised orchestrated view

In a world where we have multiple different cloud providers and many different cloud instances from different CSPs, we need an orchestrated federated level of observability with a centralised view and ability to filter and aggregate across multiple clouds in multiple clusters, if we want to be able to stay in control.

Federating observability data to a centralised place is a common technique and process these days. This has been proven to be the best way to look for cloud overloads, bad provisioning and ‘zombie’ cloud wastage where instances are left idle. When we bring all of these signals together, we can drive more efficient cloud resources to service our Content Delivery Networks (for example) and work at a smarter level all-around.

Connected correlation inside the firehose

The amount of data we are consuming and producing right now enables us to get many more signals to track our observability requirements. If we think about the fact that the Internet of Things (IoT) is exponentially increasing our data points, we are drinking from a firehose in terms of data flow… and that can make observability far more difficult.

To address this challenge, we need to think about connected correlation. When we seek to analyse system metrics, logs and traces, we need to be able to jump between those procedures and tasks quickly to work dynamically at different parts of the IT stack coalface. Because there is so much out there to observe, connected correlation helps provide vital links between the data sources that are actually mission-critical to the IT function’s operation.

Predictions for the future of IoT sensor technology

This article will explore predictions for the future of Internet of Things (IoT) sensor technology, as provided by IoT experts. Read here

Continuous profiling

Our observability goals see us continually looking for optimisations that will increase performance efficiency. This means we will need to look for, track and analyse different observability signals. One of the best ways to do this is by profiling. This technique enables us to know what part of the application is using how much compute resources (CPU time, memory, disk or network IO) without having to guess it when looking at total resource usage for our process.

Continuous profiling enables us to look at the applications and see past performance characteristics during interesting cases. It’s especially useful if it is about to run out of memory and perhaps crash the whole node. If we can look at application profiles every 60 seconds (or perhaps even more regularly), then we can see where a function in the application source code might need optimisation or augmentation. We can do this retrospectively even in the case of applications that are compiled (as opposed to interpreted) as it embeds debug symbols that enable us to map backwards to a specific function call.

A hive of activity with eBPF

Lastly then to eBPF, or extended Berkeley Packet Filter to use its full name. This is a mechanism that allows us to execute additional code in the Linux kernel. When we can look at specific functions inside the kernel using this ‘special spy agency’ technique, then we can gain new controls over observability. As an additional benefit, we can also note that eBPF does not require app-level instrumentation to start capturing metrics.

Even though it was initially designed for security, it can now be used more proactively for exposing the metrics of the application. We used to think of using a service mesh as a way to put proxies around an application, but service mesh can be replaced with eBPF, which has much lower overhead and more capabilities.

A ‘canary deployment’ might still require a service mesh and we should note that there are still non-observability use cases for service meshes like those in canary deployments (where tight control of traffic occurs) and authorisation (by mutual TLS). There is currently no attempt of eBPF to adjust traffic on such level, currently eBPF use cases are security and observability only.

If we can consider some (ideally all) of these factors and functionalities in our quest to achieve observability in modern IT stacks, then we just might be able to pop our head above the clouds and see what’s coming next.

Written by Bartłomiej Płotka, principal software engineer at **Red Hat**

Editor's Choice

Editor's Choice consists of the best articles written by third parties and selected by our editors. You can contact us at timothy.adler at stubbenedge.com More by Editor's Choice

Five key and emerging trends in cloud observability

APM is everywhere

Establishing a strong network monitoring strategy

A federated centralised orchestrated view

Connected correlation inside the firehose

Predictions for the future of IoT sensor technology

Continuous profiling

A hive of activity with eBPF

Editor's Choice

Related Topics

Related Stories

Is subscription-based networking the future?

Why and how to craft an effective hyperscale cloud exit strategy

Why cloud computing is losing favour

Future challenges and innovations in cloud security platforms

Related Stories

Is subscription-based networking the future?

Why and how to craft an effective hyperscale cloud exit strategy

Why cloud computing is losing favour

CMA to probe big tech cloud providers for market dominance