feat: add parallel batch insertion via addPoints#366
Closed
andreinknv wants to merge 2 commits into
Closed
Conversation
addPoint, searchKnn (both index classes) and the L2Space / InnerProductSpace distance methods only accepted plain JS Arrays, so a caller holding embeddings in a Float32Array had to copy them into an Array element by element first. Accept a Float32Array directly anywhere a vector is taken. A new extractVector helper copies straight from the typed array's backing buffer, skipping the per-element N-API number conversion the JS Array path requires. The accepted types and the error messages / .d.ts signatures are updated to match; tests cover both input forms. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Building an index one addPoint call at a time inserts every point on the JS thread, serially. hnswlib's HierarchicalNSW addPoint is thread-safe for concurrent calls, so a batch can be inserted in parallel. addPoints(points, labels, options?) extracts the points and labels on the JS thread, then inserts them across a pool of worker threads (options.numThreads, default = CPU core count) inside an AsyncWorker, returning a Promise. The parallelFor helper is modeled on hnswlib's own ParallelFor; the first exception from any worker surfaces as a Promise rejection. Points may be number[] or Float32Array. HierarchicalNSW only — hnswlib's BruteforceSearch::addPoint serializes on an internal lock, so parallel batching there yields nothing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Building an index with
addPointinserts every point on the JS thread,one at a time. hnswlib's
HierarchicalNSW::addPointis thread-safe forconcurrent calls — it is the basis of hnswlib's own Python
add_items(num_threads=…)— so a batch can be inserted in parallel.This PR adds
HierarchicalNSW.addPoints(points, labels, options?): Promise<void>.Details
inserted across a worker-thread pool inside a
Napi::AsyncWorker(theJS event loop stays free), returning a
Promise.options.numThreads— default: CPU core count; validated to 0–1024.options.replaceDeleted— default false.parallelForis modeled on hnswlib's ownParallelFor; the firstexception thrown by any worker surfaces as a
Promiserejection.number[]orFloat32Array.HierarchicalNSWonly — hnswlib'sBruteforceSearch::addPointserializes on an internal lock, so parallel batching there yields
nothing.
Verification
npm test— 117/117 passing (9 new tests covering parallel insert,Float32Arrayinput, thenumThreadsoption, and the validation /rejection paths); native addon builds clean.
binding.gypadds-pthreadon Linux forstd::thread.Notes
it and reuses its
extractVectorhelper. Merge feat: accept Float32Array as data-point input #365 first; this PR'sdiff then reduces to just the batch-insert change.
index_by address — the same pattern theexisting
readIndex/writeIndexworkers already use. CallinginitIndexon the same instance while a batch is in flight isunsupported (a limitation shared with those methods).
🤖 Generated with Claude Code