Java performance tuning is somewhat of a black art. It is hard to find good information about Java performance tuning, and there are several reasons for this.
First of all, performance tuning depends a lot on the system you are building and the hardware your system runs on. Over time your system, the Java virtual machine and the hardware executing your system evolves. With this evolution follows also an evolution in the applicable Java performance techniques.
Second, we Java developers have been fed a lot of untrue stories about the Java compiler and the Java Virtual Machine. It is often said that the Java compiler or VM can do a better job of optimizing your code than you can.
However, even now with Java 8, this is not correct. Algorithms, data formats, data structures, memory usage patterns, IO usage patterns etc. matter! There are many situations where you can optimize your code better than the Java compiler and JVM, because you know more about what your system is trying to do, its data structures, data usage patterns etc. than Java does.
We have heard this false narrative before: "The EJB container knows better when to load and store your entity objects". Only - it turned out to be false. The same is the case with the Java compiler and JVM narrative. That is why I have finally decided to start writing about Java performance tuning.
Please note: This Java performance tutorial is a multi page tutorial. This page is only the introduction!
Aspects Impacting Performance
There are a few recurring aspects of any system that impacts its performance. These aspects are:
- Memory Management
- Data Structures
- Network Communication
Memory management, data structures and algorithms are typically linked closely together. A certain algorithm may require a certain data structure. A certain data structure may impact memory management.
Concurrency means how well the system can distribute its load over multiple threads and CPUs. Concurrency may also be linked to data structures, but not always. It depends on the your system's concurrency model.
Network communication may impact your system's performance too. Some network protocols are faster than others, and you may sometimes be able to create a faster, custom network protocol for your own specific use case. Also your system's communication patterns impact performance - not just how messages are transported forth and back but also how often, and whether communication is synchronous or asynchronous.
The scalability of a system means how well it performs when you scale up or scale out the hardware. Scaling up (vertical scaling) means buying a bigger computer with more memory, more CPUs, faster disks, NIC etc. Scaling out (horizontal scaling) means distributing the system across multiple machines.
Reusable Java Performance Principles
Even though a lot of performance optimization is specific to each individual application, there are still Java performance principles, techniques and patterns which can be reused across many different types of applications and situations. This Java performance tutorial will focus mostly on such reusable principles.
I may also from time to time dive into a specific system / use case to examplify how this system was optimized. Such example cases can be quite enlightening, though many companies tend to hold their cards close to the body when it comes to something that can be considered a competitive edge (and performance is).
Core Java Perfomance Principles
The most common core principles of Java performance tuning which the tips in this tutorial are based on, are:
- Memory is faster than disk - much faster - and memory is cheap.
- All storage (memory / disk) works fastest when read from or written to sequentially. Arbitrary access is slower.
- Object allocation and garbage collection is slow.
- Data formats and data structures make a big difference in speed.
- Asynchronous IO scales better than synchronous IO.
- Singlethreaded performance is a prerequisite for multithreaded performance.
- Shared memory (or disk) concurrency is bad because it usually leads to lots of contention when the system gets busy.
As you will probably notice, many of the performance tips in this tutorial are based on these same principles.
The last one about singlethreaded performance might come as a surprise to some. Parallel computing and parallel programming is all the rage these days (2015), so you might have been told that you should be thinking in breaking down your problem into smaller problems which can be solved in parallel. Unfortunately there are not that many problems that can easily be parallelized.
Additionally, if your server works on many tasks at the same time (e.g incoming HTTP requests), the other CPUs in your server may already be busy working on their own tasks. Parallelizing tasks gain you nothing then, as the CPUs are already busy. In fact, it may hurt performance (unless you have way more CPUs than you are using on average).
Java Performance Credits
The principles and techniques presented in this Java performance tutorial are not all mine. Far from, actually. These tutorials present work by Java performance master minds who have learned and polished these techniques in real life high performance systems. Here are some of the Java performance master minds who's work have inspired or contributed to this Java performance tutorial (the order is random):
- Aleksey Shipilëv
- Martin Thompson
- Azul Systems
- Peter Lawrey
- Rick Hightower
- The Psy-Lob-Saw Blog
- The High Scalability Blog
- ... more coming ...
My own experiences come from from a mix of Java performance experiments, as well as the design and development of VStack.co - a fully hosted application backend which I have cofounded with WorpCloud Ltd. The faster our system is, the more, the bigger and the more demanding clients we can serve.
The Java Performance Toolkit and Benchmarks
The code and benchmarks presented in this Java performance tutorial will be made available on GitHub some time in the future. The code and benchmarks will live in separate GitHub repositories.
The Java performance toolkit is primarily intended to show implementations of the ideas presented in this tutorial. They may not always be full featured - ready for use in a real application - but may sometimes just serve as proof of concept of some idea. However, feel free to use the toolkit in your apps if you want.
The Java performance benchmarks are intended to be runnable on your own hardware, so you can see how a given technique, implementation etc. runs on your specific hardware.
It is easy to do something wrong - both in a performance idea, its implementation, or the benchmark measuring it. If you have any feedback or suggestions for the ideas, implementations or benchmarks presented here, I would very much appreciate if you send them to me. Techniques, JVMs and hardware evolves all the time. So should this Java performance tutorial + implementations + benchmarks.