Any program these days that does any significant computational load should utilize the availble processor cores. In this post we’ll show an example using Java where we schedule Callables on all available processor cores. In Java a Callable is task that returns a result and may throw an exception. It is very similar to a Thread, but gives a much easier way to return the result asynchronous computation. I typically don’t see many reasons to not use Callables over Threads, since even if you don’t want or need to return the result you can still return some sort of status code. This blurb is directly from the Javadoc The Callable interface is similar to Runnable, in that both are designed for classes whose instances are potentially executed by another thread. A Runnable, however, does not return a result and cannot throw a checked exception.
To obtain a Future in Java we need to submit the Callable to an Executor Service. A Future represents the result of an asynchronous computation. You have a reference to a future, it is working in the background. To block and wait until it is done call the get method. The process is pretty straighforward for divvying up tasks on multiple cores if you use the so called “Boss-Worker” Model. Where you break a computation down into different sub-computations, have workers work on each one, then collect and aggregate the result at the end.
I think of it in this way:
Define a unit of computation. This will be implemented in your class that implements the Callable Interface.
Create an Executor Service where the Callables can be submitted and Futures are returned.
Wait until all the Futures have completed and store the results from each of them
Combine all the results
This 4-step method allows unlocks the full potential of the processor and completes much faster than if we did the computational load on a single core and waited for the result. At the end of this we can look at some real results.
The devil is in all the details. For the Executor Service there are a number of types of thread pools that can be used. Here we’ll use the modern Work Stealing Pool. Read along and with the comments it’s pretty easy to follow.
The output of this program is as follows:
Available processor cores is 8
Time Taken multi-core: PT8.551061S
Time Taken single core: PT15.523523S
The multicore code is around twice as fast as the single core. Why not 8x since we have 8 cores? That’s a tougher question. We need to go back and analyze hits the disk, total memory used as well as set-up and tear down time for the executor.
Here is a video where I go through all the code and give a more detailed explanation.