Software development, technology, and other (possibly) interesting stuff...

Main Contents

Parallel Processing and Multi-Core Utilization with Java

January 23, 2011 Programming . 19 Comments


  1. zryan January 28, 2011 @ 10:59 am

    thank you for your text and program that is very useful for all

  2. Tweets that mention Parallel Processing and Multi-Core Utilization with Java | EMBARCADEROS -- January 31, 2011 @ 1:22 am

    [...] This post was mentioned on Twitter by Mario Gleichmann and Florian Bahr, Eric V. Eric V said: Processing and Multicore with Java 6 – [...]

  3. Stephan Schmidt January 31, 2011 @ 6:16 am

    Nice article,a little typo:

    ExecutorService eservice = Executors.newFixedThreadPool(50);

    should not be 50 but nr of processors in the listing.


  4. ed January 31, 2011 @ 8:24 am


    Thanks for pointing that out, I corrected it now.

  5. Thierry February 1, 2011 @ 2:12 am

    Hi interesting article

    Which Profiler did you use ?

    There’s something I’ve been wondering about your 50′s thread test. I always thought that
    1. runnable meant that the Thread ‘can’ run, whether or not it actually gets CPU/core share to run at the moment. So that Threads that ‘can’ run but are waiting for CPU/core share are still displayed as ‘Runnable’ (as opposed to ‘running’ ? Sorry I’m not a native english speaker)
    2. Blocked state represented threads that were waiting for a synchronized resource already taken by another thread.

    So that maybe in your 50′s thread run, there is some common monitor that is locked by threads waiting for CPU and that causes all your red ‘blocked’ states in the profiler view.

    If that is true, then maybe there’s something more to investigate in the test to optimize further…

    By the way, there’s some little red blocks in the 8 thread test, so maybe a little contention occurs already at this stage, that only becomes critical when you raised the thread count above Core capacity.

    Let me know if I’m wrong !

    Thanks for your article !!

    (Javadoc :


    Thread state for a runnable thread. A thread in the runnable state is executing in the Java virtual machine but it may be waiting for other resources from the operating system such as processor.


    Thread state for a thread blocked waiting for a monitor lock. A thread in the blocked state is waiting for a monitor lock to enter a synchronized block/method or reenter a synchronized block/method after calling Object.wait. )

  6. ed February 1, 2011 @ 10:08 am


    Thanks for you observation and comments. You are right the color green means that the thread is runnable and eligible for receiving CPU time by the scheduler and this does not mean that the thread is in fact consumming CPU time, only that the thread is ready to run.
    I have used jprofiler.
    Threads are blocking as you have pointed out, I’ll go back and test more, the contention is around the concatenation of the string str using the +, it uses StringBuilder.append and StringBuilder.toString internally. Since I wanted to add load for the CPU I didn’t optimized this part and at this point I am not sure why it is blocking as the string str is local to the task. I did some test switching to a local explicit StringBuilder to do the concatenation and that eliminates the problem.

    I will look into this and I will update with findings later.


  7. barnow February 17, 2011 @ 10:31 am

    Thanks for the article!
    I want to point out that when executing some work in parallel, you eventually want to collect somehow the results to provide final result, and the collection is done in one place, you need some kind of synchronization. In your first parallel implementation, that was the main thread collecting the results. In the second, the internal queue in ExecutorCompletionService was synchronize point. But in the example with callbacks, you didn’t provide one – callback methods could be executed in the same time and cnt counter doesn’t need to be equal 50 (lost update problem, ++ isn’t atomic). In this case, the cnt variable is the only value shared by the callback method invocations, thus usage of AtomicInteger will be sufficient; if some other data structure was used, other kind of synchronization should be considered.

  8. Benjamin February 1, 2012 @ 7:07 pm

    I’d be really curious to see your tests with a full load on each task. The wait makes sense for the case where you have some network access that you might be waiting on, but in the case where you want to load up your CPU as much as you can, there might not be a “wait” step.

    The case I’m playing with: Take a huge file, and put every word in it into a bloom filter. Even using your great final example, I couldn’t get it to run any faster in parallel than in a single thread. I don’t think the disk read is the bounding factor…

  9. net4 February 22, 2012 @ 7:46 am

    Parallel.For(1,NUM_F_TASKS, (i)=>
    string str = “”;
    for (int i=0; i<20000; i++) str += "t";

  10. freddy August 21, 2012 @ 9:33 pm

    what tool was used for cpu consumed time by thread in the graphic?

  11. ed August 23, 2012 @ 9:34 am


    It is JProfiler

  12. TD March 2, 2013 @ 12:00 am

    Great article. Thanks!

  13. Mohamed Farouk May 20, 2013 @ 7:11 pm

    Great Article, Makes very good sense, But what is not clear is how does the result come back to the caller if the caller is an interface implemntation rather than (test).

    Example provided – Test –> Task

    In real Life –> Test –> Interface/Implementation —> Task

    Now the result should come to the Interface Implementation method but in callable it is going to the callback method .
    Thread request –> Interface/Implementation M1 —> Task Call execution — callback method of the interace C1 and how can M1 get the result from C1??????

    Would appreciate if your can help with an example with
    In real Life –> Test –> Interface/Implementation —> Task

  14. Ladislav Jech August 10, 2013 @ 3:57 am

    I was working also on big file processing.
    Try to do read at once in one thread, or if the file is really big one, read by some buffer size[MB-GB], read using byte[] not String, filter using byte[] not String, don’t use Strings at all. Even if the file will be read at once or in chunks, split the read buffer into smaller parts and pass them to parallel process threads, you should definitely get improvement. If I understand your description correctly.

  15. Ladislav Jech August 10, 2013 @ 4:08 am

    I just implemented big file processing using single thread read with multiple threads to process data with custom ThreadResult class as process thread feedback and putting new threads as future tasks into List, once all fired I am checking in loop while testing Future.isDone() first and then get(). On the other hand I like the code with CompletionService or exactly following line:

    taskResult = cservice.take().get();

    the take() do “automatic wait for next completed task”, I didn’t notice it is waiting while studying concurrency framework, I just did the same different way, but will replace it by CompletionService.take().get() which looks to me cleaner… Thank you for this article!

  16. Hui October 3, 2013 @ 7:34 pm

    Bravo, nicely done! Thank you

  17. Xiao March 28, 2014 @ 9:23 pm

    Great article. what kind of tools do you use to get image 9?

  18. ed March 29, 2014 @ 10:11 pm

    The tool is JProfiler

  19. April 19, 2014 @ 11:22 am

    You’ve made some good points there. I checked on the wweb for more
    information about the issue and found most individuals will go
    along with your views on this site.

Leave a comment

DISCLAIMER: The opinions expressed in this weblog are entirely my own and do not reflect the policies of my employer, my wife, or any body else from the planet of origin. This weblog is not intended for commercial use, it is non-profit. content can only be used as reference with NO WARRANTY.