Open
Conversation
Adds three new cross-decompiler APIs, wires them through the client, and implements them for angr and Ghidra: - list_strings(filter=regex): list (addr, text) tuples, regex-filterable - get_callers(target): Function, address, or name -> callers - disassemble(addr): text disassembly of a function Also fixes a latent bug in the angr xrefs_to where self.main_instance.kb was used in headless mode (main_instance == self).
Introduces an LLM-friendly command line entry point, `decompiler`, backed by
DecompilerServer + DecompilerClient:
- New commands: load, list, stop, decompile, disassemble, xref_to, xref_from,
rename (func | var), list_strings, get_callers.
- First `load` of a binary spawns a headless server in the background; later
CLI calls auto-connect via the shared registry.
- Multiple servers can run concurrently; each one is keyed by a short server
ID, and commands disambiguate with --id, --binary, or --backend.
- Backend selection via --backend {angr,ghidra,binja,ida}.
- `libbs --server` grows a --server-id flag so subprocesses can be named.
Implementation:
- libbs/api/server_registry.py: per-server JSON records under the platform
state dir, with stale-record pruning (PID/socket liveness check).
- DecompilerServer: accepts server_id, writes/unregisters a registry entry,
exposes server_id + binary_path in server_info.
- DecompilerClient.discover_from_registry: filter by id/binary/hash/backend.
- Tests cover load/list/stop, multi-instance, decompile/disassemble by name
and address, xref_to/xref_from, rename func/var, list_strings (+ regex),
get_callers, and direct tests of the new core APIs.
Adds `libbs/skills/decompiler/SKILL.md` so that after `pip install libbs` an LLM-facing Agent Skill is available describing the full `decompiler` workflow: load, list, stop, decompile, disassemble, xref_to/xref_from, rename, list_strings, get_callers, plus multi-instance targeting. New `decompiler install-skill [--dest DIR] [--force]` copies the skill into `~/.claude/skills/` (or any path) so Claude Code and similar agents can pick it up. Tests verify the skill is present, installs cleanly, errors on re-install, and respects --force.
Changes driven by CLI_FEEDBACK.md: P0 - Add `list_functions [--filter REGEX]` subcommand (was the biggest gap — `decompile main` was the only entry for stripped binaries). - `list_strings` now has `--min-length`, `--rescan` / `--no-rescan`, section labeling for ELF files, and an automatic raw-bytes fallback scan when the backend detector returns fewer than 32 entries (angr in particular is thin). P1 - `xref_to` now returns *every* reference (code AND data) with a `kind` field, distinct from `get_callers` which is call-sites only. Add `--decompile` flag so Ghidra can pull in globals. Function names are enriched from the light cache when backends return (addr, 0) stubs. - JSON output is address-consistent: every `*addr` int field now has a `*_hex` sibling string so either form can be copied verbatim. - Unify every non-success exit code to 1 (rename var previously exited 2). - Distinct error messages for "no function starts at addr" vs "decompile engine failed" vs "target not found". - Add `--raw` to `decompile` and `disassemble` to print the body without JSON wrapping (avoids unreadable `\n` escapes at a terminal). P3 - `install-skill --json` emits real JSON instead of a Python-repr dict. - `decompiler list` now prints the registry directory, with `--show-registry` for just-the-path output. - `load --replace` stops any existing server for the binary+backend and starts a fresh one (vs `--force` which spawns alongside). Docs / Skill - SKILL.md and docs/decompiler_cli.md updated with `xref_to` vs `get_callers` guidance, the `list_strings` fidelity disclaimer, the new address-format / `addr_hex` rules, and a "first moves on a new binary" section pointing at `list_functions` first. Tests - +12 new CLI tests covering every feedback item (list_functions, --raw, not-a-function-start error, rename missing exits 1, --show-registry, --replace tears down old server, --rescan picks up more, --min-length, install-skill text/JSON formats, addr_hex annotations). - Existing tests updated for the `list --json` shape change. - Full suite: 70 passing (57→70), 2 preexisting env failures unchanged.
ef475f6 to
7a15d45
Compare
- IDA: cache server_info metadata at init so the client handshake never re-enters idalib from a worker thread (root cause of "Function can be called from the main thread only"). Use IDA's -o<path> flag to redirect .id*/.i64 sidecars into the project dir. Fill in list_strings, disassemble, xrefs_from, and xrefs_to_addr (with 2-hop data-indirection chasing for PIE string references). - Ghidra: supplement listing.getDefinedData() with a StringSearcher pass so byte arrays typed as uchar[N] (e.g. base64 alphabets) still surface in list_strings. Pass TaskMonitor.DUMMY — null NPE's inside the searcher. - Wire format: JSON instead of TOML (the toml package mangles backslash-x escapes in decompilation text). - xref_from: direct per-function callee query on each backend. - project_dir: default to platformdirs user_cache so backends stop cluttering the binary's directory; --project-dir "" restores legacy. - list_strings rescan removed; docs now point at strings(1) / rabin2 / readelf for exhaustive scans. - Tests: parametrized CLI suite (angr, ghidra, ida subclasses) plus a Ghidra-specific regression for the base64 alphabet at rpc.out:.data:0x5020. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Base interface gains read_memory(addr, size) -> Optional[bytes]:
None means the backend can't reach the region; short reads are
valid and returned as-is.
- angr: project.loader.memory.load; IDA: ida_bytes.get_bytes via
@execute_write so idalib's main-thread rule holds; Ghidra:
Memory.getBytes with a jpype JByte array (signed -> unsigned);
Binja: BinaryView.read.
- CLI: `decompiler read_memory <addr> <size>` with
--format {hexdump,hex,raw} (default: hexdump) and --json
(base64-encoded bytes + hex).
- Tests: parametrized CLI coverage (angr/ghidra/ida subclasses)
and direct library coverage on angr. SKILL.md updated.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Introduce a generic decompiler CLI that gives LLMs access to 4 decompilers, through the CLI, and enables multi-binary analysis.
Usage Preview
You can utilize
--idto have multiple instances at the same time. The backend is a server that hosts the libbs API.