Jan 31, 2023 6 min read Java

How We Reduced Our Application Runtime By Factor Of 60

“Complications arose, ensued, were overcome.” - Captain Jack Sparrow

Disclaimer

All functions and events in this blog post, even those based on real modules or microservices - are entirely fictional. All code snippets are impersonated...

The Problem

The service in the diagram below analyzes large amounts of user data.

Each client can have between 1 to n packages, each package contains resources (also 1 to n).
The service database saves each resource id but not all the resource details (for confidentiality). To get all the details an external API call is made.
For each resource, the external API retrieves the details for the service to process.
After processing the service sends insights about each client resource.

Restrictions:

The service must process all the clients before iterating through the client list again.
Packages can not be processed in parallel yet resources inside a package can.

Starting Simple

The most straightforward way is to iterate over the clients serially.
The problem with this solution is one of the restrictions, a full iteration must be finished before a fresh one starts. That means the more clients you have the more time the service takes to serve a client next time.
To iterate over lists I like to use Java Streams.

List<ClientInsights> insights =
clientList.stream()
	.forEach(client -> service.exec(client.getPackageList()))
	.collect(Collectors.toList());;

With 500 clients (O(n²) resources), the service takes a whopping 60 hours to complete. It's bad, but it is something. From here it can only be improved.

Parallel Resources Processing

There is a restriction on running packages in parallel but there is no such restriction on resources. So the first thing we can do is run API calls for each resource in parallel. In Java, this can be done with CompletableFutures and executors.
For example, the code below will fetch resource details asynchronously with a thread pool of 5 threads.

    Executor executor = Executors.newFixedThreadPool(5);
    
    public List<Resource> get() {
      List<Resource> resources = new ArrayList<>();
      CompletableFuture<List<Resource>> resourcesResponse =
		CompletableFuture.supplyAsync(() -> {
        externalAPI.getResourcesDetails()
      }, executor);
      resourcesResponse.thenAccept(resources::addAll);
      return resources;
    }

Using java multithreading capabilities improved our service runtime by 50% it now takes 30 hours to process the client list. Nice improvement nonetheless, but it is still unacceptable in a dynamic business environment.

Parallel Clients

Taking our parallelism capabilities to the next level, we can try to process multiple clients parallelly. So now we can run with multiple clients where each package runs linearly and inside each package, resources run parallel as well.
This improvement reduces our runtime by another 40%, it is now taking 18 hours to process the client list.

Can We Do Better?

At this point, we parallelized everything we could.
Most optimization projects have several improvement tracks. Not all of them work, but eventually, there’s an 80/20 effect e.g. one of the tracks will bring 80% improvement. Totally refusing to accept an 18 hours runtime the team decided to investigate a few certain tracks.

API response caching (maybe there are calls that can be avoided?)
Workload distribution via Kafka and workers.
Investigate response time from the external API.
Investigate DB utilization. (maybe we are not utilizing our resources fully)

API Response Caching

This one looked promising at the start, but eventually, the team did not see any significant improvement in caching common calls. The bottleneck must be somewhere else. Checkout my previous post about cache in spring applications
to better understand how to use cache in your application.

Workload Distribution

Migrating the service to use Kafka with producer/consumer workers will require a major design change to our service. We will investigate this area to understand its feasibility, but we will implement this solution only if all other tracks go up in smoke.

Investigate External API Response Time

It's much easier to blame others than to look inside for improvement. So why not? Let's try to prove the external API is just slow and be done with it.
The team didn't find any problem with the external API, it's a dead end.

Investigating DB Utilization

We can find out about query runtime and cost, using amazon's insights tool (in my project we used AWS but Azure and GCP have similar tools.)
Inside Amazon platform 'Console Home':
1. select RDS

2. Select Performance Insights

3. Select your database

4. Scroll down to the "Top SQL" section.

It is clear that there is one query here that is much heavier than the others, we can investigate its performance using 'psql explain' tool, you can read more about explain here and here. There are many uses for a tool like 'explain', so I will describe here the case I usually use.

Based on the query we found AWS insights, go back to your IDE and find the actual query. Most of the time it is pretty straightforward looking at table aliases.
Once you found the query use psql 'explain' tool to get more details.
In my case, that's the query I used 'explain' on:

3. Usually with problematic queries you'll see a very big number in "Total Cost" section. Most of the time "Seq Scan" nodes are immediate suspects for performance issues.
for example:

4. Scrolling down the explain output, we can find the filter each node uses. For example for the above section:

we can see that there are 2 filters, one for is_deleted (a boolean value used as a soft delete indicator) and run_id = 1000 which means Postgres will have to iterate linearly over the output to filter every run_id which is not 1000.
run_id is a great candidate for indexing.

Indexes are special lookup tables that the database search engine can use to speed up data retrieval. Simply put, an index is a pointer to data in a table. An index in a database is very similar to an index in the back of a book.

5. To create an index, lets create a new migration file:

SET lock_timeout = '20s';

CREATE INDEX CONCURRENTLY IF NOT EXISTS index_name
ON table_name (column_name);

if your code uses multithreading you should consider using 'CONCURRENTLY' operator to let multiple threads access the index table

6. After introducing the index an improvement should be visible on amazon insights.

After analyzing all problematic queries we added 10~ Indexes to our database, this has led to a huge improvement in our service which now takes 1 hour end to end.
6o hours to 1.

Summary

Optimizing your software in a real-world production environment is usually very hard since software systems in these applications, tend to be highly complex. It is tough to follow each interaction between multiple components and find the places where the bottlenecks hide. For some scenarios, even if you find a bottleneck, it is even harder to measure the performance accurately. Quality measurements will force you to utilize the latest monitoring tools like Grafana or Splunk and support them with logs and metrics on your serverside code.
This blog post has demonstrated a real-life scenario for a team to investigate and improve its real-world application. If you have performance issues on your team and you are working on a spring application with a database, you can surely find some hints and clues for optimizing and improving your application around the tips shared in this post.