Skip to content

houseme/sensitive-rs

Repository files navigation

Sensitive-rs

English | 简体中文

Build crates.io docs.rs License Downloads

A high-performance Rust crate for multi-pattern string matching, validation, filtering, and replacement.

Features

  • Find all sensitive words: find_all
  • Validate text contains sensitive words: validate
  • Remove sensitive words: filter
  • Replace sensitive words with a character: replace
  • Multi-algorithm engine: Aho-Corasick, Wu-Manber, Regex
  • Noise removal via configurable regex
  • Variant detection (拼音、形似字)
  • Parallel search with rayon
  • LRU cache for hot queries
  • Batch processing: find_all_batch
  • Layered matching: find_all_layered
  • Streaming processing: find_all_streaming

Installation

Add to your Cargo.toml:

[dependencies]
sensitive-rs = "0.8.0"

Quick Start

use sensitive_rs::Filter;

fn main() {
    let mut filter = Filter::new();
    filter.add_words(&["rust", "filter", "敏感词"]);

    let text = "hello rust, this is a filter demo 包含敏感词";
    let found = filter.find_all(text);
    println!("Found: {:?}", found);

    let cleaned = filter.replace(text, '*');
    println!("Cleaned: {}", cleaned);
}

Advanced Usage

Batch processing:

let texts = vec!["text1", "text2"];
let results = filter.find_all_batch( & texts);

Layered matching:

let layered = filter.find_all_layered("some long text");

Streaming large files:

use std::fs::File;
use std::io::BufReader;

let reader = BufReader::new(File::open("large.txt") ? );
let stream_results = filter.find_all_streaming(reader) ?;

CLI Usage

Install with the cli feature:

[dependencies]
sensitive-rs = { version = "0.8.0", features = ["cli"] }

Or install directly:

cargo install sensitive-rs --features cli

Both sensitive and sensitive-rs commands are available after installation.

Commands

# Find sensitive words
sensitive check "含有赌博和色情内容"

# Validate (exit 1 if sensitive words found)
sensitive validate "clean text"

# Replace sensitive words
sensitive replace '*' "含有赌博内容"

# Remove sensitive words
sensitive filter "含有赌博内容"

# Read from file
sensitive check --file input.txt

# Pipe from stdin
echo "text" | sensitive check

Options

  • --dict <path> — custom dictionary file
  • --dict-all — use extended dictionary (27k words)
  • --algorithm <algo> — force algorithm: aho-corasick, wumanber, regex
  • --variant — enable pinyin and shape variant detection
  • --noise-pattern <regex> — custom noise removal regex
  • --json — JSON output format
  • --color — force colored output

Documentation

For detailed documentation, please refer to Documentation.

License

Licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 or MIT license, shall be dual licensed as above, without any additional terms or conditions.

About

Sensitive-rs is a Rust library for finding, validating, filtering, and replacing sensitive words. It provides efficient algorithms to handle sensitive words, suitable for various application scenarios.

Topics

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Stars

Watchers

Forks

Sponsor this project

Packages

 
 
 

Contributors

Languages