Fix #1273: bug: Thread leak in searcher._retrieve_paths causes container thread exhaustion by Memtensor-AI · Pull Request #1969 · MemTensor/MemOS

Memtensor-AI · 2026-06-29T15:45:16Z

Description

Fixed unbounded thread accumulation in the tree-text search pipeline (issue #1273) by replacing four per-request with ContextThreadPoolExecutor(...) blocks in src/memos/memories/textual/tree_text_memory/retrieve/searcher.py with two shared, class-level executors created once in Searcher.__init__: _search_executor (max_workers=10, prefix search) for the outer A–F retrieval paths and _search_subtask_executor (max_workers=10, prefix search-sub) for the nested sub-paths in _retrieve_from_long_term_and_user, _retrieve_from_tool_memory, and _deduplicate_rawfile_results. The two-pool split eliminates nested-submission deadlock; total worker threads are now bounded at 20 per Searcher regardless of request volume or downstream latency.

Every Future.result(...) and the as_completed(...) iterator in the four affected methods now carry timeout=SEARCH_TASK_TIMEOUT_SECONDS (30 s default). On concurrent.futures.TimeoutError the call site logs a warning identifying the path and skips that contribution, so a single stuck Neo4j/embedding/HTTP call can no longer block the caller indefinitely while threads accumulate.

Tests: added 12 regression tests in tests/memories/textual/test_tree_searcher_thread_pool.py covering executor singleton identity, pool sizing/prefixes, no-per-request-construction guarantee, bounded thread count under 16 sequential searches, timeout propagation to Future.result, hanging-subtask recovery, and per-method use of the correct pool. Real pytest output: 12 new + 4 pre-existing searcher tests pass; the wider tests/memories/textual suite is green (70 passed); all other observed failures across the repo are pre-existing on dev-20260624-v2.0.22 (confirmed via git stash). Ruff lint and format both clean.

Related Issue (Required): Fixes #1273

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Refactor (does not change functionality, e.g. code style improvements, linting)
Documentation update

How Has This Been Tested?

Automated tests are pending.

Unit Test
Test Script Or Test Steps (please provide)
Pipeline Automated API Test (please provide)

Checklist

I have performed a self-review of my own code
I have commented my code in hard-to-understand areas
I have added tests that prove my fix is effective or that my feature works
I have created related documentation issue/PR in MemOS-Docs (if applicable)
I have linked the issue to this PR (if applicable)
I have mentioned the person who will review this PR

@MatthewZhuang, @CarltonXiang, @syzsunshine219, @World-controller please review this PR.

Reviewer Checklist

closes bug: Thread leak in searcher._retrieve_paths causes container thread exhaustion #1273
Made sure Checks passed
Tests have been provided

Replace per-request ContextThreadPoolExecutor instantiation across the tree-text search pipeline with two shared class-level executors plus a 30 s per-future timeout. The previous code opened a fresh pool inside `with` blocks at four call sites (`_retrieve_paths`, `_retrieve_from_long_term_and_user`, `_retrieve_from_tool_memory`, `_deduplicate_rawfile_results`); whenever a worker hung on a slow Neo4j / embedding / HTTP call, `shutdown(wait=True)` never returned and the worker threads accumulated unboundedly across requests, eventually exhausting the container's pthread limit (reporter observed 8744 threads; /search returned HTTP 200 with empty results, /chat HTTP 503, even docker exec failing). Searcher now owns `_search_executor` (max_workers=10, prefix `search`) for outer A-F retrieval paths and `_search_subtask_executor` (max_workers=10, prefix `search-sub`) for the nested sub-paths and the rawfile-dedup. The two-pool split prevents nested-submission deadlock; total threads are bounded at 20 per Searcher regardless of request volume or downstream latency. Every `Future.result(...)` and the `as_completed(...)` iterator now carry `timeout=SEARCH_TASK_TIMEOUT_SECONDS` (30 s default); on TimeoutError the call site logs a warning and skips that contribution so the caller can still return. Closes #1273

Memtensor-AI assigned CarltonXiang, MatthewZhuang, syzsunshine219 and World-controller Jun 29, 2026

Memtensor-AI requested review from CarltonXiang, MatthewZhuang, World-controller and syzsunshine219 June 29, 2026 15:45

Memtensor-AI added ai-generated bug Something isn't working | 功能异常 labels Jun 29, 2026

Memtensor-AI mentioned this pull request Jun 29, 2026

bug: Thread leak in searcher._retrieve_paths causes container thread exhaustion #1273

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix #1273: bug: Thread leak in searcher._retrieve_paths causes container thread exhaustion#1969

Fix #1273: bug: Thread leak in searcher._retrieve_paths causes container thread exhaustion#1969
Memtensor-AI wants to merge 1 commit into
dev-20260624-v2.0.22from
bugfix/autodev-1273

Memtensor-AI commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

Memtensor-AI commented Jun 29, 2026

Description

Type of change

How Has This Been Tested?

Checklist

Reviewer Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants