English | 简体中文
A high-performance Rust crate for multi-pattern string matching, validation, filtering, and replacement.
- Find all sensitive words:
find_all - Validate text contains sensitive words:
validate - Remove sensitive words:
filter - Replace sensitive words with a character:
replace - Multi-algorithm engine: Aho-Corasick, Wu-Manber, Regex
- Noise removal via configurable regex
- Variant detection (拼音、形似字)
- Parallel search with
rayon - LRU cache for hot queries
- Batch processing:
find_all_batch - Layered matching:
find_all_layered - Streaming processing:
find_all_streaming
Add to your Cargo.toml:
[dependencies]
sensitive-rs = "0.8.0"use sensitive_rs::Filter;
fn main() {
let mut filter = Filter::new();
filter.add_words(&["rust", "filter", "敏感词"]);
let text = "hello rust, this is a filter demo 包含敏感词";
let found = filter.find_all(text);
println!("Found: {:?}", found);
let cleaned = filter.replace(text, '*');
println!("Cleaned: {}", cleaned);
}Batch processing:
let texts = vec!["text1", "text2"];
let results = filter.find_all_batch( & texts);Layered matching:
let layered = filter.find_all_layered("some long text");Streaming large files:
use std::fs::File;
use std::io::BufReader;
let reader = BufReader::new(File::open("large.txt") ? );
let stream_results = filter.find_all_streaming(reader) ?;Install with the cli feature:
[dependencies]
sensitive-rs = { version = "0.8.0", features = ["cli"] }Or install directly:
cargo install sensitive-rs --features cliBoth sensitive and sensitive-rs commands are available after installation.
# Find sensitive words
sensitive check "含有赌博和色情内容"
# Validate (exit 1 if sensitive words found)
sensitive validate "clean text"
# Replace sensitive words
sensitive replace '*' "含有赌博内容"
# Remove sensitive words
sensitive filter "含有赌博内容"
# Read from file
sensitive check --file input.txt
# Pipe from stdin
echo "text" | sensitive check--dict <path>— custom dictionary file--dict-all— use extended dictionary (27k words)--algorithm <algo>— force algorithm:aho-corasick,wumanber,regex--variant— enable pinyin and shape variant detection--noise-pattern <regex>— custom noise removal regex--json— JSON output format--color— force colored output
For detailed documentation, please refer to Documentation.
Licensed under either of
- Apache License, Version 2.0, LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0
- MIT license LICENSE-MIT or http://opensource.org/licenses/MIT
at your option.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 or MIT license, shall be dual licensed as above, without any additional terms or conditions.