Skip to content

Feat/generate ubproject and schema validation#394

Open
arnoox wants to merge 266 commits intoeclipse-score:mainfrom
arnoox:feat/generate-sn6-schema-validation
Open

Feat/generate ubproject and schema validation#394
arnoox wants to merge 266 commits intoeclipse-score:mainfrom
arnoox:feat/generate-sn6-schema-validation

Conversation

@arnoox
Copy link
Copy Markdown
Contributor

@arnoox arnoox commented Feb 6, 2026

📌 Description

  • Add sn_schemas.py which translates metamodel.yaml into a schemas.json file consumable by sphinx-needs. This enables schema-based validation at parse time: required fields, regex patterns on option values, mandatory link presence (minItems: 1), and mandatory link target type checks (validate.network).
  • Because ubCode (the VS Code extension for sphinx-needs) evaluates these schemas during editing, all metamodel violations now surface as IDE diagnostics directly in the editor, errors are caught early with lightweight, fast rendering, without waiting for a full Sphinx build.
  • Add unit tests (test_sn_schemas.py) and integration tests (test_sn_schemas_integration.py) that validate generated schemas against the real S-CORE metamodel using jsonschema_rs, the same engine sphinx-needs uses at runtime.
  • Add developer documentation (README.md) covering the two validation layers (schema vs. post-build Python checks) with a detailed coverage comparison table.

🚨 Impact Analysis

  • This change does not violate any tool requirements and is covered by existing tool requirements
  • This change does not violate any design decisions
  • Otherwise I have created a ticket for new tool qualification

✅ Checklist

  • Added/updated documentation for new or changed features
  • Added/updated tests to cover the changes
  • Followed project coding standards and guidelines

MaximilianSoerenPollak and others added 30 commits May 12, 2025 15:11
* Fixed source_code_linker finding external needs
* Added explaining comments

Added simple loop search logic to try all available prefixes per id.
Include inc files in doc build
Signed-off-by: Maximilian Sören Pollak <maximilian.pollak@expleogroup.com>
- Added target for building Github archive
- Take the input files an put them in the archive as-is. No renaming.
- Use empty lists for now, as we are still not sure how to use or adapt the source code linker in multirepo

Addresses: eclipse-score#16

---------

Signed-off-by: Dan Calavrezo <dan.calavrezo.ext@qorix.ai>
updated version for release

Addresses: eclipse-score#16

Signed-off-by: Dan Calavrezo <dan.calavrezo.ext@qorix.ai>
* Fixed wrong check activation
* Fixed test & rst files

Test and rst files needed fixing to comply with new check rules
Check found False positives due to 'process' being moved and loosing
it's prefix.
Cleaned up unused imports and small fix inside the README

Also-by: Aymen Soussi aymen.soussi@expleogroup.com
- Add multiple 'uni-direcional' docs build examples
- Bugfix 'docs_needs' build target in Module imports
- Patch some spelling mistakes
- Delete 'docs'. It has been moved to examples/simple already.

--- 

Addresses a bug that caused the 'docs_needs' import via a Module (like in the linking-release example) to not be executed correctly, as it would be missing the dependencies and missing the 'sphinx_build binary'
Signed-off-by: Alexander Lanin <Alexander.Lanin@etas.com>
* Add bugfix for new module json_encoding quirks
Remove defined external 'id_prefixes' from to be checked links.
Added another example to have an rst file inside a folder as well.
Added some explanation to descriptions of needs
* Increase version for release
trustable framework needs,
eclipse-score/process_description#27

add security tag to the document need

Resolves: eclipse-score/score#947
Signed-off-by: Nicolae Dicu <nicolae.dicu.ext@qorix.ai>
Signed-off-by: Andrey Babanin <andrey.babanin@bmw.de>
@arnoox arnoox changed the title Feat/generate sn6 schema validation Feat/generate ubproject and schema validation Apr 13, 2026
Arnaud Riess and others added 9 commits April 13, 2026 15:13
When merging global_base_opts into optional_options in _parse_need_type(),
fields that are already declared as mandatory_options for that type were
being duplicated. This caused:
- Schema generation to emit a conflicting pattern ('^$|^.*$' instead of
  '^.*$') when the same field appeared in both optional and mandatory.
- test_optional_options_not_required: 'version' for doc_tool was listed
  in optional_options despite being mandatory, causing a test failure.

Fixes:
- yaml_parser.py: filter global_base_opts before merging into optional_options
- sn_schemas.py: defensive guard to skip optional fields already in mandatory
- TestCompSchema._make_valid(): add 'belongs_to' mandatory link (comp
  requires belongs_to: feat)
- TestFeatSchema: feat.includes is an optional link, not mandatory;
  replace failing link tests with tests for mandatory options (security,
  safety) that are actually required by the schema
The needextend directive was adding ':+satisfies: tool_req__docs_metamodel'
to all needs with 'metamodel.yaml' in their source_code_link. However,
the need 'tool_req__docs_metamodel' is not defined anywhere, causing 39
'[needs.link_outgoing]' Sphinx warnings which are fatal under -W.
@arnoox arnoox marked this pull request as ready for review April 27, 2026 07:38
Arnaud Riess and others added 2 commits April 27, 2026 07:42
…alidity period consistency

Co-authored-by: Copilot <copilot@github.com>
…emporary files

Co-authored-by: Copilot <copilot@github.com>
Copy link
Copy Markdown
Contributor

@ubmarco ubmarco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this and it looks functionally working well. Here are some finding, a mix of AI, and my personal findings.

Found 2 issues:

  1. The error message in _classify_links says only the first regex will be used when multiple regex patterns are supplied for the same link field, but the implementation unconditionally overwrites on each iteration so the last regex actually wins. The companion test even asserts the last value is kept (# Last regex overwrites previous ones), confirming the message is wrong.

    for link_value in link_values:
    if link_value.startswith("^"):
    if field in regexes:
    LOGGER.error(
    f"Multiple regex patterns for {label} link field "
    f"'{field}' in need type '{type_name}'. "
    "Only the first one will be used in the schema."
    )
    regexes[field] = link_value
    else:
    if field not in targets:

    Test that confirms the actual (last-wins) behavior:

    def test_multiple_regex_for_same_field_logs_error(self) -> None:
    links = {"field": "^regex1$, ^regex2$"}
    with patch("src.extensions.score_metamodel.sn_schemas.LOGGER") as mock_logger:
    regexes, _ = _classify_links(links, "my_type", mandatory=True)
    mock_logger.error.assert_called_once()
    # Last regex overwrites previous ones
    assert regexes == {"field": "^regex2$"}

    Fix: either change the message to say "the last one will be used", or continue after logging so the first one is actually retained (and update the test accordingly).

  2. needs_metamodel_generated.toml should not be written into confdir. _generate_needs_types_toml, _generate_needs_fields_toml and _generate_needs_links_toml build TOML strings purely in memory, but the setup then writes the concatenation to Path(app.confdir) / "needs_metamodel_generated.toml" solely so the path can be appended to needscfg_merge_toml_files. needs_config_writer then merges the same content into ubproject.toml, so the on-disk file contains nothing the final ubproject.toml doesn't already have — and it lands inside the user's source tree (docs/), forcing a .gitignore entry. Running bazel run //:docs_check produces both files side-by-side with overlapping contents, which is confusing and serves no consumer.

    # Generate TOML fragments for types, fields, and links from the metamodel.
    # needs_config_writer cannot serialise these structures itself, so we combine
    # them into a single file and register it for merging.
    # Write to a deterministic path so it's always overwritten (no temp file leak).
    metamodel_toml = (
    _generate_needs_types_toml(app)
    + _generate_needs_fields_toml(app)
    + _generate_needs_links_toml(app)
    )
    metamodel_path = Path(app.confdir) / "needs_metamodel_generated.toml"
    metamodel_path.write_text(metamodel_toml, encoding="utf-8")
    # Merge the generated metamodel TOML (types, fields, links) into the final ubproject.toml.
    app.config.needscfg_merge_toml_files.append(str(metamodel_path))

    .gitignore entry added to hide the generated file (also covers the redundancy):

    https://github.com/eclipse-score/docs-as-code/blob/5f1263b1fccd9acde0d608e29e672f448e53979e/.gitignore

    README explicitly labels the file as an "Intermediate merge input for needs_config_writer":

    https://github.com/eclipse-score/docs-as-code/blob/5f1263b1fccd9acde0d608e29e672f448e53979e/src/extensions/score_sync_toml/README.md#L130-L137

    Fix options (in order of preference):

    • Write the intermediate to app.outdir (or a tempfile.mkdtemp() directory cleaned up via a build-finished handler) so it never appears under confdir. The .gitignore entry can then be removed.
    • Better still: extend needs_config_writer to accept an in-memory TOML fragment (string or dict) so the intermediate file is not needed at all. The current "deterministic path so no temp file leak" rationale (__init__.py:157) is solving the wrong problem — the leak only exists because confdir was chosen as the target.

Lower-confidence findings (below the 80-point bar)

The 5 parallel review agents surfaced these additional candidates. Each was scored 0–100 by an independent agent; only items at ≥80 are reported as confirmed issues above. The rest are listed here for completeness — they are either latent (no current trigger), intentional, or pre-existing.

2. Optional-field regex alternation ^$|{pattern} (score: 60)

In get_pattern_schema, optional fields wrap the metamodel pattern as f"^$|{pattern}". JSON Schema's pattern keyword is defined as a substring search (ECMA-262 semantics, not full match), whereas the post-build Python checker uses re.fullmatch. For patterns that are not fully ^…$ anchored (e.g. ^https://github.com/.* used in source_code_link / testlink fields in metamodel.yaml), the alternation ^$|^https://github.com/.* may behave differently in jsonschema-rs (used by sphinx-needs / ubCode) than in the Python checker, allowing values one engine accepts and the other rejects.

"""Return a JSON schema that validates a string against a regex pattern.
For optional fields, allows either an empty string OR a string matching
the pattern, matching the Python checker's behavior where empty strings
are treated as "absent" and not validated.
"""
if is_optional:
# Allow empty strings for optional fields (Python checker treats "" as absent)
# Use regex alternation to match either empty string or the original pattern
return {
"type": "string",
"pattern": f"^$|{pattern}",
}
return {
"type": "string",
"pattern": pattern,
}
def get_array_pattern_schema(pattern: str, is_optional: bool = False) -> dict[str, Any]:
"""Return a JSON schema that validates an array where each item matches a regex.

Why scored 60: The current metamodel mostly uses fully anchored patterns, so this is a real but latent inconsistency rather than a bug hitting today. Worth a TODO and a test for an unanchored pattern.

3. Missing TOML escaping in _generate_needs_types_toml (score: 60)

_generate_needs_fields_toml and _generate_needs_links_toml apply .replace("\\", "\\\\").replace('"', '\\"') before interpolating values into TOML strings. _generate_needs_types_toml does not — directive, title, and prefix are written raw. If any need-type title or prefix ever contains a backslash or double quote, the generated TOML will be syntactically invalid and downstream parsing will fail.

https://github.com/eclipse-score/docs-as-code/blob/5f1263b1fccd9acde0d608e29e672f448e53979e/src/extensions/score_sync_toml/__init__.py

Why scored 60: Real divergence from the sibling helpers, but no current metamodel value contains the problem characters, so the bug is latent. Easy to fix by mirroring the escape calls used in the field/link helpers.

4. Default ID pattern broadened to allow uppercase (score: 65)

The auto-generated fallback ID pattern in yaml_parser.py widened from ^{prefix}[0-9a-z_]+$ (lowercase only) to ^{prefix}[0-9a-zA-Z_]+$ (mixed case). This affects every need type that does not declare an explicit id: pattern.

# Ensure ID regex is set
if "id" not in t["mandatory_options"]:
prefix = t["prefix"]
t["mandatory_options"]["id"] = f"^{prefix}[0-9a-zA-Z_]+$"
if "color" in yaml_data:
t["color"] = yaml_data["color"]
if "style" in yaml_data:
t["style"] = yaml_data["style"]

Why scored 65: The change appears to have ridden in on an earlier sphinx-needs 6.3.0 upgrade commit on the same branch (not introduced in this PR's primary diff hunks), and there is no commit message or comment explaining the policy shift. Confirm intentional; if so, document; if not, restore lowercase.

5. needscfg_exclude_vars may duplicate needs_config_writer defaults (score: 25)

In score_sync_toml/__init__.py, the setup code does list(getattr(app.config, "needscfg_exclude_vars", None) or []) + ["needs_from_toml", "needs_from_toml_table", "needs_schema_definitions_from_json"]. If needs_config_writer (loaded earlier in the extension order) already registers those three entries as the default value, the result will contain duplicates.

https://github.com/eclipse-score/docs-as-code/blob/5f1263b1fccd9acde0d608e29e672f448e53979e/src/extensions/score_sync_toml/__init__.py

Why scored 25: Could not confirm needs_config_writer actually registers these defaults; even if duplicates appear, exclusion-list duplication is typically benign. Worth a quick check during review.

6. Optional-link validate.network intentionally omitted (score: 5 — false positive)

sn_schemas.py deliberately skips emitting validate.network for optional links, with an in-source explanation: the post-build Python checker reports optional-link target-type violations at treat_as_info=True, while sphinx-needs schema validators only support a single violation severity per schema rule. Including optional links in the schema would escalate info-level findings to hard errors. A possible solution is to write optional links into dedicated rules, so they can be assigned a dedicated severity. The README's coverage table documents this trade-off.

Why scored 5: Documented intentional design decision with rationale and tests. Not an issue.


Follow-up suggestions (not blocking this PR)

These are improvements to track in subsequent PRs — they are not regressions in #394, but the surrounding code makes them natural next steps.

F1. Replace empty-string sentinels with nullable fields

_generate_needs_fields_toml always writes default = "" for any field that has no metamodel-supplied default. Throughout the codebase this empty string is then re-interpreted as "field not given" in later checks. Modelling absence with nullable fields (null / Optional) is the cleaner alternative — distinguishing a deliberately empty value from an unset one — and avoids the readability and type-safety problems that come with overloaded empty strings.

def _generate_needs_fields_toml(app: Sphinx) -> str:
"""Serialize ``app.config.needs_fields`` as ``[needs.fields.*]`` TOML entries.
``needs_config_writer`` cannot serialize the nested dicts that make up
``needs_fields`` (e.g. ``{"schema": {"type": "string"}, "default": ""}``),
so custom fields like ``safety``, ``security``, and ``reqtype`` are missing
from ``ubproject.toml``. Without them, ubCode does not know these are valid
options and will report them as unknown fields.
Returns only the ``default`` value for each field, which is sufficient for
ubCode to recognise the field as valid.
Must be called *after* ``score_metamodel.setup()`` has run (i.e. after
``app.config.needs_fields`` has been extended with the metamodel fields).
"""
lines: list[str] = []
for field_name, field_config in sorted(app.config.needs_fields.items()):
default = field_config.get("default", "")
# TOML-escape the default value (handle quotes)
escaped = str(default).replace("\\", "\\\\").replace('"', '\\"')
lines.append(f"[needs.fields.{field_name}]\n")
lines.append(f'default = "{escaped}"\n')
lines.append("\n")
return "".join(lines)

F2. Promote shared enum values to field-level schemas

Where a field carries the same enum across every need type that uses it, the per-type duplication can be hoisted into a single [needs.fields.<name>] schema entry, e.g.

[needs.fields.status]
schema.enum = ["draft", "review", "approved", "done"]

This would shrink the generated schemas.json, make ubCode autocompletion show enum values for the field everywhere it appears, and keep the metamodel DRY. _generate_needs_fields_toml currently emits only default — extending it with schema.enum (and possibly schema.type) when the same constraint holds across every type that uses the field is the natural follow-up.

def _generate_needs_fields_toml(app: Sphinx) -> str:
"""Serialize ``app.config.needs_fields`` as ``[needs.fields.*]`` TOML entries.
``needs_config_writer`` cannot serialize the nested dicts that make up
``needs_fields`` (e.g. ``{"schema": {"type": "string"}, "default": ""}``),
so custom fields like ``safety``, ``security``, and ``reqtype`` are missing
from ``ubproject.toml``. Without them, ubCode does not know these are valid
options and will report them as unknown fields.
Returns only the ``default`` value for each field, which is sufficient for
ubCode to recognise the field as valid.
Must be called *after* ``score_metamodel.setup()`` has run (i.e. after
``app.config.needs_fields`` has been extended with the metamodel fields).
"""
lines: list[str] = []
for field_name, field_config in sorted(app.config.needs_fields.items()):
default = field_config.get("default", "")
# TOML-escape the default value (handle quotes)
escaped = str(default).replace("\\", "\\\\").replace('"', '\\"')
lines.append(f"[needs.fields.{field_name}]\n")
lines.append(f'default = "{escaped}"\n')
lines.append("\n")
return "".join(lines)

F3. Emit $schema header in generated ubproject.toml

The generated ubproject.toml should start with

"$schema" = "https://ubcode.useblocks.com/ubproject.schema.json"

so that editors with TOML schema support give schema-aware completion and validation while editing the file. See ubCode configuration docs. This would likely live as a top-of-file fragment in shared.toml or as a needs_config_writer option, depending on how the merge tool orders keys.

https://github.com/eclipse-score/docs-as-code/blob/5f1263b1fccd9acde0d608e29e672f448e53979e/src/extensions/score_sync_toml/shared.toml

F4. Convert needs_schema_debug_path to a relative path

The generated ubproject.toml currently contains an absolute, machine-specific path:

schema_debug_path = "/home/marco/git/eclipse-score/docs-as-code/docs/schema_debug"

needs_config_writer already supports rewriting absolute paths via needscfg_relative_path_fields, and its own docstring explicitly names needs_schema_debug_path as the example field for this option:

"List of config path patterns to convert to relative paths. Can be strings (e.g., 'needs_schema_debug_path') ..."
needs_config_writer/main.py:94

The fix is to add "needs_schema_debug_path" to the existing needscfg_relative_path_fields.extend([...]) call. If sphinx-needs cannot consume a relative value for this field, that is an upstream bug to file against sphinx-needs / needs-config-writer; otherwise this is a one-line change.

# Relative paths to confdir for Bazel provided absolute paths.
app.config.needscfg_relative_path_fields.extend(
[
"needs_external_needs[*].json_path",
{
"field": "needs_flow_configs.score_config",
"prefix": "!include ",
},
]
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.