RxRingBuffer fixes and improvements#2333
Conversation
|
@akarnokd is that ops/s? Why is it slower in some cases too? edit: it seems that it's slower in most of the cases? if it is really ops/s |
|
Yes, ops per second. These are the most noticeably slower ones: I haven't dig into it but my guess is that the atomic increment-and-get on the peek() and poll(). Some queue users first call peek() and if it returns something, then they do a full poll(). This is 2 increments per value instead of none in 1.x. I've been thinking about the option to remove the phaser from peek() since it doesn't change any queue state. (Alternatively, the two might be merged into a Edit: correction, the only place peek() is used is in zip tick, so it is one atomic increment per poll instead of 0. |
|
@akarnokd you can click "Restart build" in Travis CI. |
|
@zsxwing Thanks, I never logged into Travis so did not see the button. |
|
@akarnokd Is this the one you think is ready for merge? If so I'll do my tests with Flight Recorder to appease my concern on memory and GC behavior. |
|
Yes it is. |
|
Here is the outcome of my perf testing on this: Flight Recorder testing against first minute of "Iteration" stage of this JMH run: 1.x Baseline Overview 1.x Baseline Allocations |
|
The CPU usage appears to be higher on this PR (the average is 19 versus 17). Not sure if there are other changes in 1.x that could affect this as this PR is 17 days old. As per previous comments there are 3 perf tests that take a noticeable performance hit. |
There was a problem hiding this comment.
I'd like to keep these comments in here.
|
I suggest that we merge the JCTools SpscArrayQueue fixes in via another PR since we want those regardless of what else we do and so comparisons across approaches are equivalent. I'm not yet ready to accept the performance hit this PR gives. |
|
Here is comparing #2189 and this PR to 1.x. I rebased both PRs onto the current 1.x branch to try and be as accurate as possible: |




This PR contains the fixes and improvements on the RxRingBuffer and its single-consumer-single-producer queue.
SWSRPhaserwhich is a variant of Gil Tene's WriterReaderPhaser that uses cheaper atomic operations because the single reader and single writer use case. Note that pre Java 8 Unsafe doesn't support atomic addAndGetLong operation. The simplified phaser costs only a single atomic increment per use.SpscArrayQueueto match JCTools' current version: the queue now can be fully utilized to its capacity.RxRingBuffernow uses two phasers: one for the offer side and one for the poll/peek side. The benefits: reduced interference between readers and writers; allows using the simplified phaser because each side is now single threaded (a shared phaser implies up to 2 threads at once).Benchmark results: