The Short Story
- ByteBuffer.get/put is fast enough
- Concentrate data into chunks of 256+ bytes
The Long Story
Following up my posting about how ByteBuffer.get/put appeared to be about 3 times slower than simple array accessing, I did some testing.ByteBuffer is Slow? Not so Fast!
ByteBuffer.put(byte[] b) is actually faster than using the array operator (e.g., b[0], b[1], etc.) ; for array sizes larger than 256 bytes. If you are moving around data with a ByteBuffer in chunks of less than 256 bytes, arrays may be faster. ByteBuffer can be much faster as the size of the array approaches and exceeds 4kB.
The Java array operator appears to have a bandwidth of 300MB/sec. This is the amount of time that it takes to read 1 byte from an array location and then write a value to the location. Put another way, the turn-around time for Java arrays is less than 5 nsec.
Performing the same operations with ByteBuffer.get/put at 1 byte per operation, the bandwidth could be as low as 7MB/sec; giving a turn-around time of less than 150 nsec. At first glance this is a huge difference: it suggests that ByteBuffer is 30 times slower than array access!
The bandwidth of ByteBuffer increases exponentially with the buffer size. When you hit about 256 bytes, array access and ByteBuffer are equivalent. ByteBuffer's performance continues to increase until you hit about 4KB in size --- at which point you are looking at over 1GB/sec.
Size Matters
The moral of the story is: try to concentrate your data. If you can put all the commonly accessed stuff into a block of 256+ bytes, access will be efficient. If you are in the situation where your data is small and scattered, things may get dicey.
Time Will Tell
Another important consideration is time. Ask yourself "do I really need a 5nsec response time?" The CPU cache is around 1 nsec and main memory is around 10 nsec. If you really need to turn one or two operations around that fast, you are pushing the performance boundaries of the platform.
If the answer is "yes, it really needs to be that fast," then it may be better to do this sort of thing entirely in JNI. That way Java index boundary checking can be avoided. Since we are talking about time scales where that could be a factor, it could make a difference.
If the answer is "no, 1 usec is easily fast enough," then ByteBuffer is probably the way to go. It will be much simpler and when you are trying to debug something, you can be much more confident that the problem is not in how shared memory is being read from or written to.
Your Mileage Will Vary
If you try the tests I performed on your own system, you will get different results. The nature of the tests makes them very sensitive to things like cache size, front-side bus speed, etc. The values I mention here like array access times of 300MB/sec are approximate.
Test Code
For those who are interested, you can find the test code here. The source is included in the executable JAR.