Skip to content

Add niche to ColdString to make Option<ColdString> the same size as ColdString#8

Open
luksan wants to merge 2 commits into
tomtomwombat:mainfrom
luksan:non_null_niche
Open

Add niche to ColdString to make Option<ColdString> the same size as ColdString#8
luksan wants to merge 2 commits into
tomtomwombat:mainfrom
luksan:non_null_niche

Conversation

@luksan
Copy link
Copy Markdown

@luksan luksan commented Apr 27, 2026

This changes the type of the internal "encdoded" from *const u8 to NonNull in order to create a niche so that Option has the same size as ColdString. To handle storing of WIDTH NUL bytes the NULL pointer is mapped to a static [0u8; WIDTH] array.

@tomtomwombat
Copy link
Copy Markdown
Owner

Can this handle "\0\0\0\0\0\0\0\0"?

@tomtomwombat
Copy link
Copy Markdown
Owner

I see, it represents "\0\0\0\0\0\0\0\0" as Self::PTR_TAG. Unfortunately, is_inline returns false for "\0\0\0\0\0\0\0\0". Is there a particular reason you picked Self::PTR_TAG to represent 8-null? Why not a new representation that's invalid for every other string, like usize::MAX? usize::MAX's first byte matches the inline tag 11111xxx, but is invalid UTF-8, so it can be treated as 8 null. There might be a better choice....

@luksan
Copy link
Copy Markdown
Author

luksan commented May 30, 2026

Since rust doesn't support custom niches in types NonNull is the only reasonable internal type. That means that we can't store an 8 byte all NUL str inline, so it has to be that value that is special cased.

@tomtomwombat
Copy link
Copy Markdown
Owner

tomtomwombat commented May 30, 2026

I'm with you there. But, is Self::TAG_PTR the only reasonable 8-nul representation?

I'm suggesting

fn new_eight_nul() -> Self {
    // SAFETY: PTR_TAG is non-zero
    unsafe { Self::from_inline_buf(usize::MAX.to_ne_bytes()) }
}

fn inline_len(&self) -> usize {
    let addr = self.addr();
    match addr & Self::INLINE_TAG {
        Self::INLINE_TAG if addr != usize::MAX => (addr & Self::LEN_MASK).rotate_right(Self::ROT),
         _ => WIDTH,
    }
}

so new_eight_nul().is_inline() == true and we don't handle the 8 nul case in heap related functions: heap_ptr, decode_heap etc.

Would that work?

@luksan
Copy link
Copy Markdown
Author

luksan commented May 30, 2026

And having an invalid utf8 string as representation for 8NUL means that we can't return an &str to it, so some kind of tag and indirection is needed. I don't see any disadvantage of considering as a pointer, and then doing null ptr checks, as compared to having a completely different codepath for the niches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants