Skip to content

[fix][broker] ConcurrentLongHashMap throw ArrayIndexOutOfBoundsException#25644

Open
void-ptr974 wants to merge 3 commits intoapache:masterfrom
void-ptr974:fix_clhm_aoob_new
Open

[fix][broker] ConcurrentLongHashMap throw ArrayIndexOutOfBoundsException#25644
void-ptr974 wants to merge 3 commits intoapache:masterfrom
void-ptr974:fix_clhm_aoob_new

Conversation

@void-ptr974
Copy link
Copy Markdown
Contributor

@void-ptr974 void-ptr974 commented May 1, 2026

Fixes #25643

Bug

ConcurrentLongHashMap.Section publishes keys, values, capacity as three separate volatile fields. During rehash a reader can observe a torn snapshot — e.g. new keys (length 8)
with old values (length 4) — then compute bucket = signSafeMod(hash, values.length) ∈ [0,8) and index keys[bucket] against length 4, throwing ArrayIndexOutOfBoundsException at
Section.get. The exception fires before the optimistic-read validate(stamp) check, so the existing fallback can't recover.

Reproduce in ≈ 2 minutes

# Revert just the production file to upstream master, keeping this PR's tests + benchmark:
git checkout origin/master -- pulsar-common/src/main/java/org/apache/pulsar/common/util/collections/ConcurrentLongHashMap.java

./gradlew :microbench:shadowJar

java -jar microbench/build/libs/microbench-*-benchmarks.jar \
    "ConcurrentLongHashMapBenchmark.concurrentExpandShrink" \
    -wi 1 -i 1 -w 3s -r 3s -f 1 -p implementation=clhm

ArrayIndexOutOfBoundsException fires within the first warmup iteration. 10/10 reproductions across 10 fresh JVMs on the unpatched code, 0/10 after this PR.

Fix

Bundle keys, values, capacity into one immutable inner class published via a single volatile reference:

private static final class Table<V> {
    final long[] keys;
    final V[] values;
    final int capacity;
}
private volatile Table<V> table;

Final-field semantics plus one volatile-acquire-release pair give the reader a consistent triple from one read. The partial-publish window is impossible by construction.

Tests added

pulsar-common/src/test/.../ConcurrentLongHashMapTest:

  • testNoLostGetAfterPublish — writer puts k then publishes k via a volatile counter; every reader observing the counter must see a non-null value for k.
  • testCorrectnessAgainstConcurrentHashMap — 8 threads × 50K random ops mirrored to ConcurrentHashMap; per-call return values and final state must match.
  • testConcurrentMultiWriterExpandShrink — 8 writers × 8 readers on a single section with autoShrink; no torn (k,v) pair is allowed to surface.
  • testForEachDuringWritesforEach during concurrent puts/removes must not throw or expose internal sentinels.
  • testConcurrentExpandAndShrinkAndGet — strengthened existing test.
./gradlew :pulsar-common:test --tests ConcurrentLongHashMapTest

20/20 pass in 2.3s.

A JMH benchmark (ConcurrentLongHashMapBenchmark in the microbench module) is included so this race can be regression-tested on every JDK / hardware change.

Does this pull request potentially affect one of the following parts:

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Bundle keys / values / capacity into one immutable inner class and publish via a single volatile reference
Bundle keys / values / capacity into one immutable inner class and publish via a single volatile reference
Copy link
Copy Markdown
Contributor

@merlimat merlimat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great find!

I just left a couple of comments

Bundle keys / values / capacity into one immutable inner class and publish via a single volatile reference
@void-ptr974
Copy link
Copy Markdown
Contributor Author

Thanks @merlimat — both addressed in the latest commit. PTAL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Master Code ConcurrentLongHashMap throw ArrayIndexOutOfBoundsException

2 participants