Redesign runtime Msg#1004
Conversation
…e-message-representation
Msg into a high-performance tagged-union and update runtime/funcs/compiler to use itMsg
# Conflicts: # internal/compiler/utils.go
|
Heads-up: I split out JSON-format correctness fixes into #1030 (safe spacing outside strings + union data JSON quoting/escaping) so this can merge on NAME: USAGE: COMMANDS: GLOBAL OPTIONS: |
|
Heads-up: I split out JSON-format correctness fixes into #1030 (safe spacing outside strings plus union data JSON quoting and escaping) so this can merge on main without waiting for message-representation redesign.\n\nAfter rebasing #1004, please port or cherry-pick equivalent changes from #1030 to preserve the same formatting guarantees in the redesigned message implementation. |
|
Housekeeping link for the runtime-perf track:
|
|
Based on my thorough analysis of this pull request, here is my review: Review Summary: Redesign runtime
|
Handled: removed
Confirmed already in place earlier in this PR track; local validation also included benchmark execution (
Verified current behavior matches |
…e-message-representation # Conflicts: # go.sum
PR Review: Redesign runtime
|
| Item | Status | Notes |
|---|---|---|
| JSON serialization safety | ✅ ADDRESSED | addJSONSpaces removed in commits f9cee287 + f08d9ef3; now uses direct JSON writers with context-aware marshalNestedJSON |
| E2E benchmark | ✅ VERIFIED | benchmarks/bench_test.go has BenchmarkRuntimeE2E; infrastructure exists |
| HTTP body handling | ✅ CORRECT | Uses NewBytesMsg(body) matching std/http.Response.body bytes type |
| Negative integer storage | ✅ CORRECT | Two's-complement bit pattern in uint64 round-trips correctly |
Design Highlights
1. Tagged-Union Layout (message.go:38-45)
type Msg struct {
val any // composites: []Msg, map[string]Msg, StructMsg, UnionMsg
str string // string payloads
bits uint64 // bool(0/1), int64(bit pattern), float64(IEEE-754 bits)
kind MsgKind // tag for dispatch
}- Clean separation: primitives inline in
bits, composites boxed inval - Zero-value
Msg{}isMsgKindInvalidsentinel
2. Fail-Fast Semantics
Accessors panic on kind mismatch (e.g., calling Int() on a String message). This is appropriate for internal invariant violations per the AGENTS.md policy.
3. JSON Serialization (message.go:582-693)
The current implementation uses direct byte-slice building with marshalMapWithSpaces, marshalStructFieldsWithSpaces, and marshalNestedJSON - no unsafe post-processing on serialized strings.
4. Union Representation (message.go:422-427)
type UnionMsg struct {
tag string
data Msg
hasData bool
}String tags are kept for readability/debugging; indexed tags would require compiler/runtime coordination (noted as future optimization).
Test Coverage
The tests in message_test.go cover:
- Struct equality ignoring field order
- Union matching (with/without data)
- Dict/struct JSON marshaling with special characters
- Negative integer round-trip (
TestIntMsgNegativeRoundTrip) - Invalid message panics (
TestInvalidMsgPanics) - Escaped string preservation in nested structures
Benchmarks show zero allocations for hot paths:
BenchmarkMsgListIter: 0 allocs/opBenchmarkMsgDictLookup: 0 allocs/opBenchmarkMsgEqualList: 0 allocs/opBenchmarkMsgStructGet: 0 allocs/op
Minor Observations
-
benchmarks/message_passing/directory doesn't exist in the current branch, though referenced in PR comments as commit3a7d736b. If this benchmark is required for comparison, ensure it's either:- Already merged to
main(the E2E benchmark infrastructure is there) - In a separate PR that should precede this one
- Already merged to
-
HTTP body is bytes - The change in
http.go:71correctly usesNewBytesMsg(body), preserving thebytestype for transport. -
Compiler alignment - Backend and templates properly use
runtime.Msg{},NewUnionMsgNoData, and the new accessor methods (.IsStruct(),.Struct(), etc.).
Conclusion
This PR is ready to merge. The core design is solid, previous blocking concerns have been addressed, and the implementation maintains correctness while improving performance. The tagged-union approach successfully eliminates interface dispatch overhead without resorting to unsafe or reflect.
|
Finalization update for #1004 (head What was done
Validation
Benchmark proof (agreed protocol)Baseline: Commands used:
Results:
Given merge resolution, green CI, addressed review threads, and benchmark evidence above, this PR is in ready-to-merge state. |


Motivation
MsgAPI so hot-path ops (List(),Dict(), struct/union access) avoid unnecessary boxing.Description
Msginterface and concrete*Msgtypes with a singleMsgtagged-union struct andMsgKind, plus constructor helpers likeNewBoolMsg,NewIntMsg,NewFloatMsg,NewStringMsg,NewListMsg,NewDictMsg,NewStructMsg,NewUnionMsgandNewUnionMsgNoData(seeinternal/runtime/message.go).val anyfor composite values, and centralizedEqual,MarshalJSON,String, andMatchlogic to operate on the newMsg(seeinternal/runtime/message.go).Msg{}as the sentinel for “no data”, includingprogramreceive/send paths,interceptors,match/switch/fan_inhandlers and others (see changes ininternal/runtime/program.go,internal/runtime/interceptors.go, andinternal/runtime/funcs/*).runtime.Msg{}for empty configs, useNewUnionMsgNoDatafor tag-only unions, and to call.IsStruct()/.Struct()or.IsUnion()/.Union()instead of previous type assertions (seeinternal/compiler/backend/golang/backend.go,internal/compiler/backend/golang/tpl.go, and generated exports updates).internal/runtime/message_test.goandinternal/runtime/message_bench_test.go), and documented the change inAGENTS.md.Testing
make build(invokes builds/tests); the runtime and most packages compiled and tests ran, but CI-local exampleexamples/http_getfailed due to external network restrictions (test reports: "Domain forbidden").go test ./...; the majority of unit and e2e packages passed, butexamples/http_getfailed for the same network restriction, causing an overall test failure.golangci-lint run ./...; this did not complete due to the linter binary being built with an older Go version than the repository target (Go 1.24 vs 1.25).BenchmarkMsgListIter,BenchmarkMsgDictLookup,BenchmarkMsgEqualList,BenchmarkMsgStructGet) that exercise iteration, lookup, equality, and struct access and are included in the tree for performance evaluation.Codex Task
===
Below is a detailed explanation of how the current tagged‑union Msg design works (as implemented in the latest commit), with emphasis on the bits/any tricks, negative number handling, and why reflect/unsafe were not used. I’ll also explicitly tie this back to the ideas in the “Proposal: Optimize Message representation via native Go types #996”. All references are to the current repo state you asked about.
1.1. Tagged‑union layout
The new Msg is a single struct that uses:
a tag (MsgKind) to know what kind of value is stored,
a fixed bits uint64 field for fast inline storage of numeric/boolean values,
a str string field for string payloads, and
a val any field for composite payloads ([]Msg, map[string]Msg, StructMsg, UnionMsg).
This is all defined in internal/runtime/message.go.
The key shape is:
type Msg struct {
kind MsgKind
bits uint64
str string
val any
}
Because kind tells you which field is “live,” no interface dispatch is needed to read a value: the accessor (Int(), Float(), List(), etc.) just checks the tag and reads the expected field. The tag check is in mustKind, which panics on invalid access, keeping correctness intact while staying cheap in the hot path.
1.2. The bits trick
The bits uint64 field stores small primitive payloads compactly:
bool is stored as bits = 1 for true, 0 for false.
int64 is stored by casting to uint64 (so it preserves the bit pattern).
float64 is stored using math.Float64bits / Float64frombits (the exact IEEE‑754 bit pattern is round‑tripped).
Constructors show this explicitly:
NewBoolMsg sets bits = 1 for true.
NewIntMsg sets bits = uint64(n).
NewFloatMsg sets bits = math.Float64bits(n).
Accessors reverse this: Bool() checks bits==1, Int() casts bits back to int64, Float() uses math.Float64frombits.
1.3. The any trick
The val any field is used only for composite values:
List() returns msg.val.([]Msg)
Dict() returns msg.val.(map[string]Msg)
Struct() returns msg.val.(StructMsg)
Union() returns msg.val.(UnionMsg)
This keeps composite values “boxed” in a single Msg value rather than through a hierarchy of interface values. The constructors NewListMsg, NewDictMsg, NewStructMsg, NewUnionMsg store the composite payload in val and set the appropriate MsgKind.
1.4. Centralized equality / JSON / match semantics
Equality and JSON formatting are now centralized in Msg, so the runtime doesn’t bounce through type-specific methods and interface dispatch:
Msg.Equal compares tags and then compares the correct backing storage (bits, str, slices/maps, or struct/union types).
Msg.MarshalJSON delegates to the correct representation (primitives, list, dict, struct, union).
Match handles union match logic using tag + optional data check. The logic is equivalent to the previous behavior but performed on the new union value directly (with the hasData bit).
This centralization also makes it easier to reason about correctness: only one place to adjust equality or JSON formatting.
Negative integers are stored through the uint64 cast:
func NewIntMsg(n int64) Msg {
return Msg{kind: MsgKindInt, bits: uint64(n)}
}
func (msg Msg) Int() int64 {
return int64(msg.bits)
}
Go’s integer casts preserve the two’s‑complement bit pattern, so a negative int64 value becomes a uint64 with the same raw bits. When it’s cast back to int64, the original negative number is recovered. This is safe and lossless for all int64 values. The behavior is in NewIntMsg and Int() respectively.
3.1. No reflect
Using reflect would:
reintroduce dynamic dispatch and allocations (especially for composite iteration and equality),
complicate the hot path with type switches on reflect.Kind, and
make correctness harder to guarantee (e.g., reflect.DeepEqual is slower and has edge cases).
The current design uses explicit tags and typed accessors, which is faster and more predictable. All runtime dispatch is already O(1) through a switch on MsgKind. That’s directly visible in Msg.Equal, Msg.MarshalJSON, and Msg.String.
3.2. No unsafe
unsafe could theoretically pack more data or avoid bounds checks, but it risks undefined behavior across Go versions and architectures. Given Neva’s emphasis on correctness and portability (multi‑arch builds in make build), staying in safe Go is consistent with the runtime’s long‑term stability. The new approach already avoids interface allocations without unsafe tricks.
3.3. “runtime” package tricks
Direct runtime package usage (e.g., runtime.memmove or internal layout assumptions) would introduce:
implementation‑detail coupling to Go’s runtime,
stricter version constraints (breaking with Go upgrades),
unclear portability for different architectures.
Given the performance gains already captured by eliminating interface dispatch in message handling, it’s not necessary at this stage.
3.4. Alternatives that were effectively chosen
The tagged‑union approach is essentially the best low‑level alternative to reflect/unsafe in Go:
explicit tagging,
fixed layout for primitives,
minimal boxing for composites.
This is already implemented in Msg and its accessors. It’s a classic approach in high‑performance Go code when you want predictable hot‑path behavior.
The issue proposes native composite storage (e.g., []int64, map[string]string, etc.) to avoid []Msg and map[string]Msg overhead. The new tagged‑union design does not yet store native composite slices or maps, but it lays the groundwork for that direction by:
Centralizing composite decoding in Msg.List() / Msg.Dict(), so we can later extend them to detect “native list/map” encodings without changing call sites. The accessors are already centralized and are the sole place to retrieve composites.
Making Msg a flat struct, so future composite variants can be stored in val any with new internal types (e.g., []int64 or map[string]string) and recognized in List() / Dict() or in new helper methods.
Centralizing equality and JSON so that future composite variants can be handled without refactoring all runtime funcs or compiler code. The switch on MsgKind is the single dispatch point to extend.
In other words: the new design doesn’t yet implement the “native composite storage” proposed in #996, but it makes it feasible without another large refactor. The hot‑path methods (List, Dict, Equal, MarshalJSON, Match) are centralized and ready for extension.
What the proposal would still require
To fully implement #996’s idea, you’d need:
additional internal composite variants (e.g., listInt64, listString types),
expanded List()/Dict() to either return []Msg/map[string]Msg or provide “fast path” iterators for native composites,
or introduce new accessors (e.g., ListInt64()), which would be a more invasive API change.
The current design doesn’t do this yet, but it is compatible with the incremental approach suggested in the proposal (e.g., start with []bool|int|string and map[string]bool|int|string and treat them as special composite variants stored in val). The single tagged‑union struct makes that evolution straightforward.
5.1. MsgKind and sentinel values
MsgKindInvalid (tag = 0) is the sentinel “empty” message. Msg.IsValid() simply checks that kind != MsgKindInvalid. This is used in generated code to detect empty results without relying on nil (since Msg is no longer an interface).
5.2. Union “no data” semantics
Unions now track whether they contain data using the hasData bool in UnionMsg. The constructors NewUnionMsg and NewUnionMsgNoData set hasData appropriately. Match logic uses hasData to preserve the pattern‑matching behavior (tag only vs. tag+payload).
5.3. Compiler and generated code changes
The compiler backend now emits:
runtime.Msg{} as the “empty config” sentinel,
runtime.NewUnionMsgNoData for tag-only unions,
typed conversions for list/dict/struct/union fields in getMsgFromGo and getGoFromMsg.
These changes ensure that runtime and generated code align with the new Msg struct representation.
Eliminated interface dispatch in hot paths like List(), Dict(), Equal(), MarshalJSON().
Reduced allocations: primitives are stored inline in a single struct; composite containers do not require extra wrapper structs per element.
Tighter cache behavior: Msg is a compact struct with small fixed‑size fields and a single any for composites.
These benefits are particularly important in runtime funcs like list and dict operations (e.g., list iteration, dict lookup). The new benchmarks were specifically added to measure this (BenchmarkMsgListIter, BenchmarkMsgDictLookup, etc.).
===
Below is a detailed, source‑anchored explanation of why bits uint64 is used for booleans, what alternatives exist, and why strings remain string rather than []byte in the current design. I’m not modifying any code per your instruction.
1.1. The actual layout goal: fast, fixed‑shape Msg
The current Msg struct is designed to be a fixed‑shape tagged union with minimal branching, so all payloads (bool/int/float) fit in the same inline slot (bits uint64). This eliminates per‑type allocations and avoids interface dispatch in hot paths. The struct uses one tag plus fixed payload fields: bits uint64, str string, and val any.
1.2. Why bits is always 64‑bit
Using a single 64‑bit slot for all numerics and bool yields:
No extra switching or per‑type storage fields inside Msg (we don’t need bool boolVal, int64 intVal, float64 floatVal, etc.).
Predictable layout across platforms and versions.
Fast accessors: Bool(), Int(), and Float() just reinterpret bits based on the MsgKind.
This behavior is encoded in constructors and accessors:
NewBoolMsg sets bits = 1 if true, 0 otherwise; Bool() reads bits == 1; NewIntMsg casts to uint64, Int() casts back; NewFloatMsg uses Float64bits/Float64frombits.
1.3. “But bool only needs 1 byte”
That is true at the value level, but struct layout and access cost matter more here than absolute byte size:
The struct already contains str string (two machine words) and val any (an interface header). The incremental memory difference between a 1‑byte bool and an 8‑byte uint64 is negligible in the overall Msg footprint.
If we used a smaller bool field and a separate field for int/float, we’d either add more branching or grow the struct with multiple typed fields, which can increase size and reduce cache locality.
A single uint64 keeps the code path uniform and avoids extra branching in hot paths like Equal, String, or MarshalJSON which all use the bits slot directly based on MsgKind.
1.4. Could we bit‑pack booleans more tightly?
Technically yes, but the cost would be:
introducing bit masking/shift ops and/or
extra fields or special‑case code paths
That would add complexity and risk branch mispredicts in the hot path. The current design optimizes for predictability and minimal dispatch, not maximum byte‑level compression.
2.1. Strings already have efficient representation in Go
A string in Go is a read‑only header (pointer + length) and is immutable. That immutability allows:
safe sharing without copies,
efficient hashing and equality, and
direct JSON encoding without conversions.
The Msg type uses the str string field for string values, and MarshalJSON directly calls json.Marshal(msg.Str()) — no extra conversions needed.
2.2. Using []byte would increase complexity
If strings were stored as []byte, you’d need to:
clone or freeze slices to preserve immutability (to avoid aliasing bugs),
convert []byte to string for JSON or for any string comparison, which allocates unless you use unsafe,
and ensure that code calling Str() doesn’t accidentally mutate internal buffers.
Since many runtime funcs and interceptors treat strings as immutable values (e.g., formatting in interceptors, string concatenations, comparisons), string is the safer, zero‑copy type for “message” semantics. The debug interceptor, for example, formats string messages directly using msg.Str() with quoting; that would require additional conversions if []byte were used internally.
2.3. Would []byte help performance?
Usually, not in this context. The runtime message layer is not doing large byte‑level operations on strings; it primarily passes them through. The more common operations in the runtime are:
equality (Msg.Equal),
JSON serialization,
matching and routing.
These are better served by string. If we introduced []byte, we’d likely pay more conversion costs than we’d save.
Option Benefits Downsides
Single uint64 bits for bool/int/float Fast, uniform, minimal branching “Wastes” bytes for bool, but overall Msg size dominated by other fields
Separate bool field Saves bytes for bool Adds fields/branches, expands struct or complicates accessors
Bit‑packing multiple booleans Best memory density Adds complexity and potentially hurts hot‑path CPU performance
string storage Immutable, zero‑copy, cheap equality None significant for this use case
[]byte storage Potentially mutable Requires conversions or unsafe; higher complexity/risk
Yes, booleans are stored in a 64‑bit slot even though they need fewer bits, but the design optimizes for uniform access and minimal branching, and the incremental memory is negligible compared to the string and any fields already present in Msg. The choice avoids multiple payload fields or bit‑packing complexity and keeps Equal/MarshalJSON/String fast and predictable. Similarly, strings are stored as string because it is immutable, efficient for equality/JSON, and avoids conversions; moving to []byte would likely increase allocations or require unsafe without meaningful performance wins in the runtime’s current workloads.