Strip Unicode Cf characters in PrintableString#4593
Strip Unicode Cf characters in PrintableString#4593tnull wants to merge 2 commits intolightningdevkit:mainfrom
Cf characters in PrintableString#4593Conversation
`PrintableString` is the sanitiser LDK uses to render untrusted strings
(node aliases, BOLT-12 invoice / offer text, `UntrustedString`, LSPS
messages, `lightning-invoice` descriptions) to logs and UI. It only
replaced `char::is_control` matches (Unicode general category `Cc`)
with U+FFFD, leaving the entire `Cf` (Format) category untouched.
That is the exact category covering the bidirectional override /
isolate codepoints (U+202A..U+202E, U+2066..U+2069) and zero-width
characters (U+200B..U+200D, U+FEFF) behind the "Trojan Source" attack
family (CVE-2021-42574): a peer can set its alias / invoice description
/ offer fields to e.g. `safe\u{202E}cipsxe.exe`, which previously
passed through verbatim while a human reader sees `safeexe.cips` —
defeating the threat model `PrintableString` exists to defend against.
Replace `Cf` codepoints alongside `Cc` ones. The `Cf` ranges are
inlined as a `matches!` table sourced from Unicode 16.0 to keep the
change `no_std`-friendly with no new dependencies.
Co-Authored-By: HAL 9000
|
👋 Thanks for assigning @valentinewallace as a reviewer! |
|
No issues found. The previously flagged off-by-one in the Egyptian Hieroglyph |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4593 +/- ##
==========================================
- Coverage 86.84% 86.18% -0.66%
==========================================
Files 161 156 -5
Lines 109260 108660 -600
Branches 109260 108660 -600
==========================================
- Hits 94882 93646 -1236
- Misses 11797 12405 +608
- Partials 2581 2609 +28
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
valentinewallace
left a comment
There was a problem hiding this comment.
LGTM after squash and CI fix
| } | ||
| } | ||
|
|
||
| // Codepoints in Unicode general category `Cf` (Format), per Unicode 16.0. These are not matched |
There was a problem hiding this comment.
Per some claude exploring of this patch, this should also be valid for Unicode 17.0. Any reason to not specify the latest here?
PrintableStringis the sanitiser LDK uses to render untrusted strings (node aliases, BOLT-12 invoice / offer text,UntrustedString, LSPS messages,lightning-invoicedescriptions) to logs and UI. It only replacedchar::is_controlmatches (Unicode general categoryCc) with U+FFFD, leaving the entireCf(Format) category untouched.That is the exact category covering the bidirectional override / isolate codepoints (U+202A..U+202E, U+2066..U+2069) and zero-width characters (U+200B..U+200D, U+FEFF) behind the "Trojan Source" attack family (CVE-2021-42574): a peer can set its alias / invoice description / offer fields to e.g.
safe\u{202E}cipsxe.exe, which previously passed through verbatim while a human reader seessafeexe.cips— defeating the threat modelPrintableStringexists to defend against.Replace
Cfcodepoints alongsideCcones. TheCfranges are inlined as amatches!table sourced from Unicode 16.0 to keep the changeno_std-friendly with no new dependencies.Co-Authored-By: HAL 9000