Skip to content

[spark] Support merge schema in MERGE INTO#7789

Open
Zouxxyy wants to merge 2 commits intoapache:masterfrom
Zouxxyy:dev/merge-update
Open

[spark] Support merge schema in MERGE INTO#7789
Zouxxyy wants to merge 2 commits intoapache:masterfrom
Zouxxyy:dev/merge-update

Conversation

@Zouxxyy
Copy link
Copy Markdown
Contributor

@Zouxxyy Zouxxyy commented May 8, 2026

Purpose

Add schema evolution support for MERGE INTO and fix nested-field alignment.

  • With spark.paimon.write.merge-schema=true, UPDATE * / INSERT * evolves target schema with new source columns. Star clauses pull from source by name; explicit clauses fill NULL.
  • A FROM_STAR TreeNodeTag preserves the original star intent, so a fully-listed explicit clause is not mistaken for *.
  • AssignmentAlignmentHelper now reorders nested struct / array / map fields by name.

Scope

  • UPDATE * / INSERT * → evolve
  • Explicit clauses → no evolve
  • Mixed → evolve, star pulls source, explicit fills NULL
  • Nested struct / array new fields

Tests

13 new cases in MergeIntoTableTestBase plus WHEN NOT MATCHED BY SOURCE coverage in MergeIntoNotMatchedBySourceTest.

@Zouxxyy Zouxxyy marked this pull request as draft May 8, 2026 15:04
@Zouxxyy Zouxxyy force-pushed the dev/merge-update branch from 688a4bd to aeb72e2 Compare May 8, 2026 16:54
@Zouxxyy Zouxxyy marked this pull request as ready for review May 9, 2026 00:40
}

/** Reorder source struct fields to match target field order by name, recursing into nested types. */
private def reorderStructByName(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reorderStructByName crashes when target struct has fields absent from source

Should we support this?

The same issue applies to MapType value reordering in reorderFieldsByName.

* reorder and fill nulls for missing sub-fields.
*/
private def alignColumns(
def alignColumns(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SchemaHelper.scala now handles ArrayType alignment via transform, but MapType is not handled. Meanwhile, AssignmentAlignmentHelper.reorderFieldsByName does handle MapType. This inconsistency means the DataFrame write path won't align map values while the MERGE path will.

Is this a problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants