DevOps / Sys Admin Q & A #10 : Trouble Shooting

bogotobogo.com site search:

Load, throughput, and response time

What does performance mean to us?

To troubleshoot performance issues of our system, we need to understand the relationship between load, throughput, and response time.

Load is where it all starts. It is the demand for work to be done (for example, 500 queries per second). Without it, we have no work and, therefore, no response time. This is also what causes performance to degrade. At some point, there is a greater demand for work than the application's capacity of delivering, which is when bottlenecks occur.
Throughput (for example, 1,000 requests per second) is the execution rate of the work being demanded. The relationship between load and throughput is predictable. As load increases, throughput will also increase until some resource gets saturated, after which throughput will get plateaued. When throughput plateaus, it's an indicator our application scales any more.
Response time (msec) is the side effect of throughput. While throughput increase proportionately to load, response time will increase negligibly, but once the system reaches throughput plateau, response time will increase exponentially with the telltale "hockey stick" curve as queuing occurs.
Source: reference 1

The relationship between load and throughput becomes increasingly important in complex multi-tier applications. When a throughput plateau occurs, it may be visible across the multiple tiers simultaneously as shown in the picture below:

We could blame downstream items in situations like this, because poor performance downstream usually bubbles upstream. That could be a proper assessment for response time, however, it does not always apply to throughput.

By comparing throughput to load at each tier, we can identify what the root cause is. Here's one possible scenario corresponding to the throughput above, where the load scales linearly at each tier. In our case, since the load actually did increase continually at the database tier, we can safely identify that the bottleneck is indeed on the downstream database tier.

We may have another scenario where the app server load increases linearly, but does not propagate to the database tier as shown in the picture below:

So, the bottleneck is not in database tier but actually in the application server, which is not passing the load down to the database server. Note that load is the demand for work, and at some point the database is not being asked for anything additional. Without additional load, we won't have additional throughput.

Leaks - CPU, I/O (file handles or database connections), Memory

Note: this section is based on ref #1.

"A leak occurs whenever an application uses a resource and then doesn't give it back when it's done. Possible resources include memory, file handles, database connections, and many other things. Even resources like CPU and I/O can leak if the calling code encounters an unexpected condition that causes a loop it can't break out of, and then processing accumulates over time asmore instances of that code stack up."

DevOps / Sys Admin Q & A #10 : Trouble Shooting

DevOps

DevOps / Sys Admin Q & A

Linux - system, cmds & shell

DevOps

DevOps / Sys Admin Q & A

Docker & K8s

Ansible 2.0

Terraform

AWS (Amazon Web Services)

Jenkins

Puppet

Chef

Elasticsearch search engine, Logstash, and Kibana

Vagrant

GCP (Google Cloud Platform)

Big Data & Hadoop Tutorials

Redis In-Memory Database

Powershell 4 Tutorial

Git/GitHub Tutorial

Subversion