Friday, March 13, 2009

ByteBuffer is Fast Enough

The Short Story

  • ByteBuffer.get/put is fast enough
  • Concentrate data into chunks of 256+ bytes

The Long Story

Following up my posting about how ByteBuffer.get/put appeared to be about 3 times slower than simple array accessing, I did some testing.

ByteBuffer is Slow? Not so Fast!

ByteBuffer.put(byte[] b) is actually faster than using the array operator (e.g., b[0], b[1], etc.) ; for array sizes larger than 256 bytes. If you are moving around data with a ByteBuffer in chunks of less than 256 bytes, arrays may be faster. ByteBuffer can be much faster as the size of the array approaches and exceeds 4kB.

ByteBuffer becomes as fast as regular arrays at a buffer size of about 256 bytes

The Java array operator appears to have a bandwidth of 300MB/sec. This is the amount of time that it takes to read 1 byte from an array location and then write a value to the location. Put another way, the turn-around time for Java arrays is less than 5 nsec.

Performing the same operations with ByteBuffer.get/put at 1 byte per operation, the bandwidth could be as low as 7MB/sec; giving a turn-around time of less than 150 nsec. At first glance this is a huge difference: it suggests that ByteBuffer is 30 times slower than array access!

The bandwidth of ByteBuffer increases exponentially with the buffer size. When you hit about 256 bytes, array access and ByteBuffer are equivalent. ByteBuffer's performance continues to increase until you hit about 4KB in size --- at which point you are looking at over 1GB/sec.

Size Matters

The moral of the story is: try to concentrate your data. If you can put all the commonly accessed stuff into a block of 256+ bytes, access will be efficient. If you are in the situation where your data is small and scattered, things may get dicey.

Time Will Tell

Another important consideration is time. Ask yourself "do I really need a 5nsec response time?" The CPU cache is around 1 nsec and main memory is around 10 nsec. If you really need to turn one or two operations around that fast, you are pushing the performance boundaries of the platform.

If the answer is "yes, it really needs to be that fast," then it may be better to do this sort of thing entirely in JNI. That way Java index boundary checking can be avoided. Since we are talking about time scales where that could be a factor, it could make a difference.

If the answer is "no, 1 usec is easily fast enough," then ByteBuffer is probably the way to go. It will be much simpler and when you are trying to debug something, you can be much more confident that the problem is not in how shared memory is being read from or written to.

Your Mileage Will Vary

If you try the tests I performed on your own system, you will get different results. The nature of the tests makes them very sensitive to things like cache size, front-side bus speed, etc. The values I mention here like array access times of 300MB/sec are approximate.

Test Code

For those who are interested, you can find the test code here. The source is included in the executable JAR.

Thursday, March 12, 2009

JRE and Shared Memory Overhead

Shared memory access in Java appears to be about 3 times slower than it could be.

In another post, I mentioned that ByteBuffer and its related classes provide a way for Java developers to access shared memory without having to resort to JNI. The problems come in when you try to perform operations on that segment.

One approach is to get a reference to an array of bytes that represents the data in the segment and then use the regular array syntax to access the data. The ByteBuffer.array() method appears to be the the way to do this, but unfortunately it does work.

Here is an example of what I mean:

MappedByteBuffer mbb;
// code to initialize mbb omitted
byte[] ba = mbb.array(); // throws UnsupportedOperationException

After looking around a bit, I came to the conclusion that this was the intention of the original developers --- if you want to mess with the data in the segment, then you are supposed to use ByteBuffer.get/put.

This would be fine if get/put were about the same cost as using a straight byte array, but they appear to be 2 to 3 times slower. Here is a simple program that highlights the issue I'm running into. The basic difference is that one version uses:

b1 = bb.get(0);
bb.put(0, b2);

And the other that uses

b1 = bb[0];
bb[0] = b2;

The program performs these operations millions of times and then prints out the time (in milliseconds) they took to run. An example output:

Using get/put: 11578
Using array access: 3234

One thing this shows is that I really need to upgrade my system.

The basic point is that using get/put is a lot slower than using simple arrays. A program that reads and writes a lot of data in shared memory would be a lot faster if it could simply use an array rather than get/put.

Is there a way around this?

Wednesday, March 11, 2009

Shared Memory Using MappedByteBuffer

Java developers can access shared memory using the NIO class MappedByteBuffer. Here is an example:
RandomAccessFile rac = new RandomAccessFile("<some file name>", "rw");
FileChannel channel = rac.getChannel();
MappedByteBuffer buf =, 0, 1024);
This will create a shared memory segment with the name of the file passed to RandomAccessFile. The segment will start out with the contents of that file, if it exists. If the file does not exist, then it will be created.

The segment can be changed or read using the get/put methods defined by the ByteBuffer class. A different Java process that uses the same set of of calls will get the same shared memory: if they change their instance you will see the changes and vice versa. What's more, this also applies to non-java processes that use the same file name and memory mapped system calls.

I have tried this on Windows and Linux and I have also taken a look at the implementation code on It appears that both platforms are using memory mapped files.

As an aside, "memory mapped files" are an approach to using shared memory that appear to have originated with Unix. Unix tries to make most things look like files, so representing shared memory that way is pretty consistent with the Unix philosophy.

For those who are interested, here is a more complete example.


This site is about software development in general, with an emphasis on Java. It contains insights, discoveries, etc. that I have encountered while performing development on various projects.

"LTSLLC" is short for "Long Term Software, LLC." the consulting company that I created. I find blogger to be more convenient than Wordpress, otherwise I would host this blog off the main ltsllc site.