Skip to content

Add ZCU102 (UltraScale+ A53 EL3) bare-metal wolfIP port with GEM3#121

Open
dgarske wants to merge 6 commits into
wolfSSL:masterfrom
dgarske:port_amd_fpga
Open

Add ZCU102 (UltraScale+ A53 EL3) bare-metal wolfIP port with GEM3#121
dgarske wants to merge 6 commits into
wolfSSL:masterfrom
dgarske:port_amd_fpga

Conversation

@dgarske
Copy link
Copy Markdown
Member

@dgarske dgarske commented May 18, 2026

Summary

First Cortex-A / aarch64 port. Verified end-to-end on hardware: DHCP, ping, bidirectional UDP echo on port 7.

Features

  • New src/port/zcu102/ port: GCC bare-metal, single Cortex-A53 at EL3, no Xilinx Standalone BSP, no xparameters.h.
  • Clean-room Cadence GEM driver for PS-GEM3 (1 GbE RGMII) with TI DP83867IR PHY.
  • Polled RX + polled TX, cache-coherent BD recycle, Q1-3 dummy BDs.
  • MMU at EL3: DDR Normal WB, Normal-NC DMA carve-out, peripherals Device-nGnRnE, OCM Normal-WB executable.
  • GIC-400 init (GICv2), PS-UART0 polled console, MDIO, ARP/ICMP/DHCP/UDP via wolfIP core.
  • UDP-only build profile (MAX_TCPSOCKETS=2 only for the timer-heap minimum; app opens no TCP sockets).
  • xsdb JTAG loader (jtag/boot.sh + jtag/boot.tcl) - OCM-only iteration, no SD swap.
  • bootgen/ template + flash_sd.sh for SD boot via stock FSBL.

Notable fixes captured during bring-up

  • DMACR[30] must be clear with 8-byte BDs (setting it switches GEM to 16-byte BD format with addr_hi; MAC then writes frames to bogus high addresses, counted but never delivered). 64-bit AXI bus width comes from NWCFG[21] alone.
  • cache_clean after every BD recycle in gem_isr and eth_poll, and cache_inval before reading TXBUF_USED in eth_send (MAC writes USED-back to DDR, not coherent with CPU D-cache; without the inval the TX spin loop times out and wedges sustained UDP TX).
  • Q1-Q3 dummy BDs (TX USED|WRAP|LAST, RX WRAP|OWN_SW) to keep MAC from walking uninitialised priority queues.
  • DP83867 RX_CTRL strap quirk (clear CFG4 bit 7) + 100ms wait after AN_COMPLETE.
  • newlib aarch64 memset/memcpy wrapped via -Wl,--wrap to avoid dc zva hang on this A53 setup (even with SCTLR_EL3.DZE=1).
  • L2_PERIPH entry 511 (0xFFE00000-0xFFFFFFFF) mapped Normal-WB executable so code runs from OCM after MMU enable.

Known limitations

  • A53 IRQ exception not delivered (GIC latches the SPI/SGI and GICC_IAR ack works when polled). Worked around by driving gem_isr() from eth_poll() in the main loop. Real root cause is open.
  • MAX_TCPSOCKETS=2 is the minimum the current wolfIP core allows (MAX_TIMERS = MAX_TCPSOCKETS * 3); upstream follow-up should decouple.

Test

  • make -C src/port/zcu102 CROSS_COMPILE=aarch64-none-elf-
  • JTAG-load app.elf (or SD-boot via BOOT.BIN), watch UART0 @ 115200.
  • Expect DHCP lease, then ping <ip> and nc -u <ip> 7.

@dgarske dgarske self-assigned this May 18, 2026
Copilot AI review requested due to automatic review settings May 18, 2026 18:52
@dgarske dgarske requested a review from danielinux May 18, 2026 18:54
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new bare-metal AArch64 (Cortex-A53 EL3) wolfIP port targeting the Xilinx ZCU102 board, including a clean-room Cadence GEM3 + DP83867 PHY driver, minimal EL3 MMU/GIC/UART bring-up, and supporting JTAG/bootgen/SD tooling.

Changes:

  • Introduces src/port/zcu102/ (startup vectors, MMU setup, GICv2 driver, polled UART, GEM3 Ethernet + DP83867 PHY, UDP echo + DHCP demo, build/link scripts).
  • Adds ZCU102 JTAG loader scripts (generic tools/scripts/zcu102/ and port-specific src/port/zcu102/jtag/).
  • Adds BOOT.BIN generation templates and an SD flashing helper.

Reviewed changes

Copilot reviewed 27 out of 27 changed files in this pull request and generated 16 comments.

Show a summary per file
File Description
tools/scripts/zcu102/README.md Documents the generic ZCU102 xsdb loader pattern and constraints.
tools/scripts/zcu102/jtag_load.tcl Generic xsdb JTAG loader for AArch64 EL3 apps (OCM load + RVBAR loop + entry jump).
src/port/zcu102/.gitignore Ignores local build artifacts for the ZCU102 port.
src/port/zcu102/board.h Board-specific base addresses/IRQs/clock/reset regs and default MAC.
src/port/zcu102/config.h UDP-focused wolfIP configuration for the ZCU102 port.
src/port/zcu102/flash_sd.sh Helper to copy BOOT.BIN to an SD boot partition with safety checks.
src/port/zcu102/gem.h Public GEM3 + MDIO API surface for the port.
src/port/zcu102/gem.c Clean-room GEM3 driver (BD rings, MDIO, polled RX/TX integration with wolfIP).
src/port/zcu102/gic.h Minimal GICv2 interface and IRQ dispatch hooks.
src/port/zcu102/gic.c GIC-400 bring-up and dispatch implementation (plus polled dispatch helper).
src/port/zcu102/jtag/boot.sh Port-local wrapper to build a flat binary and invoke xsdb boot sequence.
src/port/zcu102/jtag/boot.tcl Port-local xsdb sequence to init PS, load OCM, and run the app.
src/port/zcu102/jtag/boot_iter.sh Developer iteration helper (power-cycle + hw_server restart + boot).
src/port/zcu102/main.c Demo app (wolfIP init, DHCP, UDP echo) + wrapped memset/memcpy + exception reporting.
src/port/zcu102/mmu.h Declares EL3 MMU enable entrypoint.
src/port/zcu102/mmu.c Static EL3 page tables and MMU enable sequence (TCR/MAIR/TTBR setup).
src/port/zcu102/phy_dp83867.h DP83867 PHY init/link-status API.
src/port/zcu102/phy_dp83867.c DP83867 configuration (strap fix, delays, AN/link polling) via MDIO.
src/port/zcu102/README.md Port-level documentation (features, build/boot workflow, expected output).
src/port/zcu102/startup.S EL3 vectors + startup (BSS clear, MMU enable, IRQ trampoline, exception trampolines).
src/port/zcu102/target.ld AArch64 linker script for OCM-based layout and special sections.
src/port/zcu102/timer.h Generic timer-based delay utilities.
src/port/zcu102/uart.h UART API for the port.
src/port/zcu102/uart.c Polled Cadence UART0 driver and small print helpers.
src/port/zcu102/bootgen/boot.bif BOOT.BIN template for bootgen.
src/port/zcu102/bootgen/build_bootbin.sh Script to render the BIF template and run bootgen.
src/port/zcu102/Makefile Port-local build (app.elf + BOOT.BIN) and core compilation strategy.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/port/zcu102/startup.S Outdated
Comment on lines +160 to +172
el3_irq_trampoline:
sub sp, sp, #(16 * 16)
stp x0, x1, [sp, #(0 * 16)]
stp x2, x3, [sp, #(1 * 16)]
stp x4, x5, [sp, #(2 * 16)]
stp x6, x7, [sp, #(3 * 16)]
stp x8, x9, [sp, #(4 * 16)]
stp x10, x11, [sp, #(5 * 16)]
stp x12, x13, [sp, #(6 * 16)]
stp x14, x15, [sp, #(7 * 16)]
stp x16, x17, [sp, #(8 * 16)]
stp x18, x29, [sp, #(9 * 16)]
str x30, [sp, #(10 * 16)]
Comment thread src/port/zcu102/mmu.c Outdated
Comment on lines +138 to +147
/* L2_PERIPH: 3..4 GB range. All Device-nGnRnE except the last
* 2 MB block which contains OCM (0xFFFC0000..0xFFFFFFFF) and
* must be Normal+executable so we can fetch our code from OCM. */
for (i = 0; i < 511; i++) {
addr = 3ULL * L1_BLOCK_SIZE + (uint64_t)i * L2_BLOCK_SIZE;
L2_PERIPH[i] = BLOCK_DEVICE(addr);
}
L2_PERIPH[511] = BLOCK_NORMAL(3ULL * L1_BLOCK_SIZE
+ 511ULL * L2_BLOCK_SIZE);

Comment thread src/port/zcu102/gic.c Outdated
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1335, USA
*
* GIC-400 (ARM GICv2) minimal driver for Cortex-A53 EL3 on ZynqMP.
* Configures all SPIs as Group 1, level-triggered, targeted at CPU0,
Comment thread src/port/zcu102/gem.c
Comment on lines +21 to +27
* Cadence GEM driver for ZynqMP GEM3 (on-board RJ45 on ZCU102).
*
* - 32-bit DMA addressing (DDR low bank only, < 4 GB).
* - Polled TX (matches existing wolfIP port pattern, simplest cert).
* - IRQ-driven RX (GIC SPI 63 - see board.h).
* - BDs and frame buffers in .dma_buffers, MMU-marked Device-nGnRnE.
*
Comment thread src/port/zcu102/target.ld Outdated
Comment on lines +3 to +16
* Memory map:
* DDR low : 2 GB @ 0x00000000 (FSBL hands control with DDR initialized)
* OCM : 256 KB @ 0xFFFC0000 (not used by this app)
*
* App layout in DDR:
* 0x00000000 - 0x000FFFE0 vectors, .text, .rodata, .data, .bss
* (linker just packs them in order; stack at top)
* 0x00100000 _stack_top (1 MB)
* 0x00200000 - 0x003FFFFF .dma_buffers (2 MB, 2 MB-aligned, mapped
* Device-nGnRnE by the MMU table in mmu.c)
* 0x00400000+ free for future use (e.g. heap)
*
* The 2 MB alignment of .dma_buffers is required because the MMU page
* tables flip its attribute at L2 block (2 MB) granularity.
Comment thread src/port/zcu102/main.c Outdated
Comment thread src/port/zcu102/board.h Outdated
Comment thread src/port/zcu102/main.c Outdated
Comment thread src/port/zcu102/README.md Outdated
Comment thread src/port/zcu102/Makefile Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants