C++23 · GPU-First · Data-Oriented

KAPINext-Generation Game Engine

Born from 25 years of industry experience. A modern runtime shipping today on Windows and macOS, designed from day one for the full console matrix — GPU-first where it matters, data-oriented across CPU and GPU boundaries, and explicit about ownership and fallback behavior.

9
Live Systems
14
Partial · In Progress
3
RHI Backends Shipping
What's New Architecture

What's New in Kapi

Recent renderer milestones — the modern temporal stack, vendor upscalers, hardware ray tracing, and tri-backend parity (D3D12, Vulkan, Metal 4) all landed in production code.

New

TLRC Global Illumination

Temporal Light Radiance Cache — first modern GI path live on D3D12 & Vulkan. Screen-probe resolve, temporal reprojection, adaptive ray allocation, and a world-cache feedback loop.

Modern GI
New 🌍

Metal 4 on Apple Silicon

Production macOS backend on the Metal 4 API. MetalFX upscaling, hardware ray tracing, residency sets, and TBDR-aware pass ordering — full feature parity with D3D12 and Vulkan.

Apple · Metal 4 · MetalFX
New 🔥

DLSS Suite

NVIDIA NGX integration covering Super Resolution, Ray Reconstruction, and Frame Generation. Auto/manual denoiser routing through the temporal guide buffers.

NVIDIA · SR + RR + FG
New

FSR3 Frame Generation

AMD FidelityFX Super Resolution 3 frame interpolation wired across both D3D12 and Vulkan paths, sharing the engine's motion-vector and depth contracts.

AMD · FSR3-FG
New

RTXDI Path Tracing

NVIDIA RTXDI ReSTIR for direct illumination and full path tracing. Temporal/spatial resampling with boiling filter and NRD denoising for production-grade results.

ReSTIR PT
New

DDGI Probe Volumes

Dynamic Diffuse Global Illumination with probe grids and screen-probe support. Selectable alongside TLRC; both paths share the temporal foundation.

Dynamic GI
New 🌈

HDR Scene-to-Present

Complete HDR ownership from lighting through post-processing. HDR10 output on D3D12 & Vulkan with correct EOTF/PQ encoding at present.

HDR10
New

NRD Denoising Wrapper

NVIDIA Real-time Denoiser integration covering REBLUR/RELAX for raytraced shadows, reflections, and GI — with a custom temporal denoiser as a feature-flagged fallback.

NRD · Custom
New

Bindless Heap (65k+)

Unified bindless slot tracking across D3D12 and Vulkan. Materials, textures, structured buffers, samplers and acceleration structures all routed through a single index space.

D3D12 + Vulkan
New

Visibility Buffer + Mesh Shaders

Production VisBuffer pipeline with 12-bit drawcall, 8-bit triangle, 12-bit meshlet packed into a 32-bit payload. 64-vertex/124-triangle meshlets driven by GPU culling.

GPU-Driven
New

Motion Vectors & TAA

Graph-owned motion vector texture with camera/static reprojection. Halton(2,3) jittered TAA with neighborhood clamping, plus camera view history for any temporal consumer.

Temporal Foundation
New

GTAO + CMAA2

Ground Truth Ambient Occlusion and Conservative Morphological AA, both production-validated and selectable through the post-process path enum on either backend.

AO + AA
New

Tri-Backend Renderer

Full feature parity across D3D12 (Agility 1.719+, SM 6.9), Vulkan 1.4, and Metal 4. Side-by-side viewport comparison mode lets you A/B any two backends in the editor.

D3D12 · Vulkan · Metal 4

Key Highlights

Built from the ground up with no legacy constraints. Every system designed for modern hardware, explicit APIs, and production-scale workloads.

GPU-First Architecture

Work moves to the GPU when throughput gain is real. GPU-driven culling, mesh shaders, skinning dispatch, scene BVH, and visibility-buffer material evaluation all run on the device today.

Data-Oriented ECS

Archetype-based entity component system with SoA layout in 16KB cache-friendly chunks. SIMD-vectorized iteration, zero-allocation queries, and deterministic job scheduling.

VisBuffer + Mesh Shaders

32-bit packed visibility payload (12+8+12 bits) feeds compute material evaluation. 64-vertex/124-triangle meshlets driven by GPU frustum/occlusion/backface culling.

No Semantic Drift

Same data contract, different execution path. Lower-tier fallbacks reduce quality but never require different gameplay code. One codebase, all platforms.

Job-System-First

Fiber-agnostic work-stealing scheduler with DAG dependencies. Lock-free hot paths via atomics. One worker thread per hardware core with cache-affinity pinning.

Explicit API Mindset

Render-graph DAG with automatic barrier coalescing and transient resource aliasing (30-50% VRAM savings). Live D3D12 + Vulkan + Metal 4; NVN, AGC, GDK on the roadmap.

Offline-First Assets

Deterministic content-addressed cooking with DDC. Budget validation at cook time. Runtime consumes prebuilt streaming-friendly formats exclusively.

Layered Hot Reload

Module / DLL reload via DynamicModuleLibrary today; shader reload, engine hot-restart, and Live++ C++ patching designed for sub-second iteration.

Architectural Pillars

Six invariants that hold across all capability profiles and all subsystems.

01

Job-System-First Execution

Systems schedule jobs and declare dependencies. No direct OS threads as authoring model. Synchronization on hot paths is minimized via lock-free atomics.

02

Data-Oriented Layout

Game state structured for locality, batching, and vectorization. ECS with contiguous storage, SIMD-friendly data layout, and predictable iteration patterns.

03

Explicit API Rendering

Render-graph style dependency control with deliberate pass dependencies, resource transitions, and feature fallbacks. No hidden state machines.

04

GPU-First, Not Dogmatic

Throughput-critical work moves to GPU. Gameplay-critical, deterministic, or rollback-sensitive work may remain CPU-owned when latency demands it.

05

Ownership Contracts

Every simulation product declares CPU-owned, GPU-owned, or mirrored. Six sync phases with explicit visibility rules prevent readback stalls.

06

Budget Enforcement

CPU, GPU, memory, I/O, and thermal envelopes per subsystem. Cook-time validation. Runtime budget broker with degradation ladders and hysteresis.

// C++23 · No exceptions · No RTTI · No GC
// Engine-owned types: Arena, Pool, Frame allocators · mimalloc v3 backing

auto query = world.Query<Transform, Velocity, const Physics>();
for (auto [transform, velocity, physics] : query) {
  transform.position += velocity.linear * FixedDT();
  // SIMD-vectorized SoA iteration over 16KB archetype chunks
}

Capability Profiles

Four profiles drive every subsystem decision — rendering, simulation, streaming, audio, budgets. Same contract, scaled execution.

P0

Compatibility

Low-end / Fallback
  • Forward rendering path
  • CPU BVH frustum culling
  • 10K entity budget
  • 60 FPS target (16ms)
  • Battery-saver 30 FPS mode
  • FXAA, basic bloom
  • Zone/portal audio
P1

Sustained Mobile

Phones & Tablets
  • Thermal governance
  • Unified memory aware
  • 50K entity budget
  • 60 FPS target (16ms)
  • Battery-saver 30 FPS mode
  • TBDR optimization
  • 4 Gerstner waves
P2

Fixed High-BW

PS5 · Xbox Series X
  • Visibility buffer path
  • GPU-driven culling
  • 250K entity budget
  • Filmic 30 / Quality 60 / Performance 120 FPS
  • Virtual textures
  • GPU particles & audio
P3

Scalable Discrete

High-End PC
  • Mesh shaders + RT
  • Meshlet cluster culling
  • 500K entity budget
  • 120+ FPS target (sub-8ms)
  • RT GI, RT reflections
  • Full GPU simulation

Engine Features

Click any card to reveal the full feature set. 40+ engine systems — some shipping today, others fully specified and on the implementation roadmap.

Live — production code shipping Partial — core implemented, advanced features in progress Design — full specification, implementation on roadmap

Render Hardware Interface

Platform-native graphics abstraction with explicit APIs and three-tier feature progression.
GPU-DrivenCore v1Live
  • Live backends: D3D12 (Agility SDK 1.719+, SM 6.9), Vulkan 1.4, and Metal 4 — all with full feature parity
  • Roadmap backends: NVN (Switch 2), AGC (PS5), GDK / D3D12X (Xbox Series)
  • Render graph DAG with automatic barrier coalescing and memory aliasing
  • Bindless heap (65k+ slots) covering textures, buffers, samplers and acceleration structures
  • 30-50% VRAM savings via transient resource aliasing across passes

Visibility Buffer

  • Default opaque path on P2/P3 — 8 bytes/pixel vs 64-128 for G-buffer
  • Compute shader material evaluation from triangle ID + barycentrics
  • Forward fallback on P0/P1 preserves same surface-evaluation contract

GPU-Driven Submission

  • Indirect draw and compute-driven culling paths
  • Meshlet-compatible content with mesh shader submission (Tier 3)
  • GPU-driven meshlet/cluster culling with parent fallback
  • Vertex pulling: buffer indices, no IA vertex layouts

Async Compute

  • Queue overlap on P2/P3 for parallel GPU workloads
  • TBDR subpass optimization for tile-based GPUs (Metal, mobile)
  • Virtual texture page request generation during resolve

Lighting & Global Illumination

From baked static to real-time path-traced GI — TLRC, DDGI, and ReSTIR PT all selectable at runtime.
GPURay TracingLive
  • TLRC (Temporal Light Radiance Cache) — first modern GI path live on D3D12 + Vulkan
  • DDGI dynamic probe grids with screen-probe support, selectable alongside TLRC
  • RTXDI ReSTIR direct illumination + reference and ReSTIR path tracing
  • Up to 256 punctual lights (directional, point, spot) with shadow-flag controls
  • Cascaded shadow maps and ray-traced shadows; both use the same temporal denoiser

IBL & Probes

  • Prefiltered envmap (9 mips), diffuse irradiance (cubemap or SH-L2)
  • BRDF LUT and Charlie LUT (sheen/hair) precomputed at init
  • Parallax-corrected reflection probes with runtime capture priority

Shadow Systems

  • Raster shadow maps with cascade authoring and PCF/PCSS filtering
  • Hardware RT shadows with REBLUR/RELAX or custom temporal denoising
  • SSR fallback shares the RT reflection feature contract

Modern GI Paths

  • TLRC: world-cache feedback + adaptive ray budgets, scales gracefully on lower tiers
  • DDGI: probe-grid irradiance with classifier-driven relocation
  • Reference + ReSTIR path tracing modes for offline-quality previews

Material System

Unified PBR shader pipeline with glTF-extension surface model and Slang shaders.
SlangCore v1Partial
  • Standard metallic-roughness PBR (GGX/Smith-Fresnel) with energy conservation
  • Single EvaluateSurface() compiles to compute, fragment, and depth-only paths
  • Bindless texture access with per-instance parameter overrides via GPU scene buffer
  • glTF reference fields for clear coat, sheen, anisotropy, SSS, transmission already in the upload struct
  • Cloth/sheen runtime ready (Charlie LUT precomputed at init)

Roadmap Layers

  • Skin SSS, car paint flake, hair Marschner, glass refraction — design specs ready, runtime evaluators next
  • Decal projection with deferred material blending — designed
  • Material variants at cook time — design contract drafted

Post-Processing

Composable post-FX chain with TAA, GTAO, CMAA2, tonemap operators and selectable post paths.
GPUCore v1Partial
  • TAA with Halton(2,3) 16-sample jitter, neighborhood clamping, history rejection
  • GTAO (Ground Truth Ambient Occlusion) with configurable radius/quality
  • CMAA2 (Conservative Morphological AA) as a lightweight post-AA option
  • Tonemap operators: Linear, Reinhard, ACES Narkowicz, ACES Hill, Khronos PBR Neutral
  • HDR10 output with PQ/EOTF encoding at present

Post Paths

  • Selectable: CopyOnly · GTAO · CMAA2 · TAA · GTAO+CMAA2 · GTAO+TAA
  • Color grading LUT slot; per-profile path selection

Roadmap

  • Bloom, auto-exposure histogram, motion blur, DOF, film grain — designed, not yet implemented
🔥

Upscaling & Frame Generation

Vendor super-resolution and frame generation from NVIDIA and AMD wired across both backends.
DLSSFSR3RT ReconstructionLive
  • DLSS Super Resolution (SR) with quality presets and DLAA mode
  • DLSS Ray Reconstruction (RR) replacing per-feature denoisers when selected
  • DLSS Frame Generation (FG) with optical-flow-driven interpolation
  • FSR3 Frame Generation across D3D12 and Vulkan
  • XeSS surface area in place; production wiring next

Integration Detail

  • Shared motion-vector and depth contracts across all upscalers
  • Auto/manual denoiser routing: NRD · DLSS-RR · custom temporal · backend-default
  • Camera view history feeds reprojection-aware features uniformly
  • Per-feature capability state: Available · PreviewOnly · ProductionValidated

Real-Time Ray Tracing

Hardware RT with software fallbacks — reflections, shadows, GI, and full path tracing.
DXR + KHR-RTRTXDILive
  • RT Tier 1.0/1.1/1.2 detection; inline ray queries and pipeline mode both supported
  • RT Reflections with temporal stabilization (disocclusion/roughness/depth rejection)
  • RT Shadows with REBLUR/RELAX denoising and raster fallback
  • RTXDI ReSTIR direct illumination + temporal/spatial resampling, boiling filter
  • Reference path tracing and ReSTIR PT modes for ground-truth comparison

Acceleration Structures

  • Segmented BLAS/TLAS with flattened global index buffer
  • Primitive-base stored in TLAS instance ID to side-step the 24-bit limit
  • Refit-on-update for static/near-static geometry; rebuild only on topology change

Denoising Routes

  • NRD wrapper (REBLUR / RELAX) for shadows, reflections and GI signals
  • Custom temporal denoiser as feature-flagged fallback
  • DLSS Ray Reconstruction takes the denoise path when selected

Terrain

Streaming large-scale heightmap terrain with clipmap LOD and vegetation.
GPU LODDesign
  • 256m tiles with 16-bit heightmap (0.9mm precision over 300m range)
  • Clipmap LOD: 8 rings on P3 (32km draw), 4 rings on P0 (2km draw)
  • Splatmap multi-layer blending (up to 8 layers on P2/P3)
  • Visibility buffer integration with vertex pulling from heightmap texture

Technical Detail

  • GPU-driven LOD selection with per-patch screen-space error evaluation
  • Tri-planar projection on steep surfaces to prevent stretching
  • Index buffer stitching for seamless LOD transitions (P2/P3)
  • Terrain tiles as streaming assets with predictive prefetch
  • Virtual texture participation with page request generation
🌊

Water System

Wave simulation with buoyancy physics and interaction effects.
GPU ComputeDesign
  • Gerstner waves (SIMD-evaluated) + FFT ocean spectrum on P3
  • GPU displacement maps (1024x1024 on P3) via compute shader
  • Reflection scaling: env cubemap → SSR → planar → RT
  • Buoyancy physics with per-body sample-point force/torque

🌊 Water Detail

  • Screen-space refraction with Beer-Lambert depth absorption
  • Jacobian-driven foam at wave crests, depth-threshold shoreline foam
  • Flow maps for surface velocity, normal animation, foam accumulation
  • Subsurface scattering glow on crests proportional to sun angle
  • Ocean, lake, river body types with distinct wave behavior

Environment & Sky

Physically-based atmosphere rendering with volumetric clouds and weather.
GPUDesign
  • Atmospheric scattering (Rayleigh + Mie) with precomputed LUTs
  • Volumetric clouds with ray-marched density fields
  • Time-of-day system driving sun position, sky color, fog parameters
  • Weather system with rain, snow, fog transitions

Atmosphere

  • Height fog with density falloff and directional inscattering
  • Volumetric fog with temporal reprojection for stable results
  • Dynamic sky dome with star field and moon phases

Entity Component System

Archetype-based storage with cache-friendly SoA layout and deterministic scheduling.
SIMDCore v1Partial
  • 16KB archetype chunks (L1 cache fit, 64-byte alignment)
  • Lazy query caching — O(1) iteration between structural commits
  • Deferred structural changes batched at Phase 2
  • Component access declarations drive automatic job parallelism

Advanced Features

  • POD-only components — value types, no pointers across boundaries
  • Tag components: zero storage, affect archetype identity
  • Singleton components for global state (time, physics config)
  • Change detection: per-chunk dirty tracking with O(1) skip
  • Entity budgets: P3=500K, P2=250K, P1=50K, P0=10K

Job System

Lock-free work scheduler with DAG dependencies, priority levels, and deterministic mode.
Core v1Live
  • Atomic bounded queue per worker with hardware-concurrency thread pool
  • Five priority levels (Critical · High · Normal · Low · Idle)
  • Atomic dependency counters; jobs declare prerequisites and signal completion
  • Deterministic mode: single-threaded reproducible execution via KAPI_DETERMINISTIC_JOBS
  • Tracy profiler integration via shared trace hooks

Roadmap

  • Console fiber backends (PS5 SCE, Switch, Xbox) — designed, not yet implemented
  • Cache-affinity pinning, configurable stack tiers — designed

Memory System

Engine-owned allocators in hot paths with frame arenas and fixed-capacity pools.
Core v1Partial
  • Frame linear allocator: bump-pointer per worker, bulk reset at frame end
  • Lock-free fixed-capacity packet pools with intrusive free lists
  • 16-byte alignment for GPU upload boundaries; per-thread arenas to avoid contention
  • std types used pragmatically outside hot paths

GPU & Platform

  • D3D12 / Vulkan native memory paths; persistent-mapped upload buffers
  • Discrete vs unified memory topology detection for staging strategy

Roadmap

  • Layered budget-broker design with TLSF for real-time audio — specified, partial
  • Cook-time per-level budget validation — designed
  • Debug poison patterns, guard pages — designed

Game Loop & Frame Orchestration

Pipelined frame model with fixed timestep, accumulator interpolation, and CPU/GPU overlap.
Core v1Partial
  • Six phases: Input → CPU Sim → Structural Changes → Events → Render Submit → Present
  • Fixed timestep (1/60s) with accumulator model decoupling sim from render
  • Spiral-of-death prevention: clamp to 5 steps max
  • Max-flight-frames: CPU bounded relative to GPU (2-3 frames ahead)

Timing

  • VSync modes: Off (tearing), On (locked), Adaptive (dynamic)
  • Independent frame rate cap with sleep + spin-wait
  • Time dilation for pause/slow-mo (gameplay time only)
  • Deterministic simulation enables replay and rollback

Scene Graph & Spatial

ECS-based transforms with BVH acceleration and GPU-driven culling.
GPU CullCore v1Partial
  • Parent-child hierarchy via ECS components with lifecycle hooks
  • Binary BVH with incremental update and SAH build
  • GPU-resident BVH on P2/P3 for frustum + Hi-Z occlusion culling
  • Streaming chunks: axis-aligned regions with predictive load/unload

Culling Pipeline

  • Change detection: O(1) chunk skip for stable transforms
  • Spatial hash/grid for proximity queries (AI, audio, triggers)
  • LOD selection via screen-space metric with distance bias
  • Overlap regions for cross-chunk entity visibility

Hardware Abstraction Layer

Platform abstraction for window management, input, and OS services.
Core v1Partial
  • Windows platform layer with HWND, raw mouse input, exclusive fullscreen
  • macOS platform layer with Metal 4 surface bring-up
  • Window management, fullscreen, DPI, HDR, monitor enumeration
  • Structured logging with crash ring buffer context

Roadmap

  • Console SDKs (PS5, Xbox GDK, Switch 2) — designed
  • Heterogeneous core topology awareness (P/E cores) — designed
  • Console fiber/coroutine support — designed
  • Large page support, console async I/O — designed

Physics & Collision

Dual-authority: CPU primary (low-latency gameplay) + GPU secondary (high-count debris, ragdolls).
GPU PhysicsSIMDDesign
  • CPU primary physics: rigid body, raycasts, overlap — zero-frame latency
  • GPU secondary: position-based dynamics for mass debris simulation
  • BVH broadphase (primary) + spatial hash (GPU secondary)
  • SIMD constraint batching: AVX2 = 8/batch, NEON = 4/batch

Constraints & Bodies

  • 6 joint types: Fixed, Hinge, Ball-Socket, Slider, Distance, ConeTwist
  • Breakable constraints with impulse accumulation thresholds
  • Ragdolls: P3=32 active, P2=16, P1=4 with priority eviction
  • GPU XPBD solver for cloth and rope on P2/P3; CPU fallback P0/P1

Character Controller

  • Engine-owned CharacterMovement component
  • Capsule sweep, step-up, ground detection, slope limits
  • Movement modes: walking, falling, swimming, climbing

Animation & Skinning

Skeleton runtime + GPU skinning dispatch; blend trees and state machines on the roadmap.
GPU SkinningPartial
  • State machines with condition-driven transitions between layers
  • Hierarchical blend trees: sample, blend, additive, parameter-driven
  • GPU skinning via compute shader in Phase 3
  • SIMD batch evaluation of sampled bone curves

Advanced

  • Attachment sockets for weapons, accessories, FX
  • Skeleton LOD: distant characters skip leaf bones
  • Keyframe compression with time-windowed event callbacks
  • IK and procedural post-processing applied post-blend
💥

Destruction

Pre-authored fracture with constraint-driven breaking and budget-managed debris lifecycle.
GPU DebrisDesign
  • Offline-authored fracture (DCC pre-split chunks, no runtime Voronoi)
  • Constraint graph with impulse accumulation → breakable thresholds
  • Union-find connectivity for O(1) debris group discovery
  • GPU debris simulation with spatial hash broadphase on P2/P3

Lifecycle

  • Spawned → Active → Settling → Expired → Destroyed
  • Per-chunk LOD meshes (full detail → simplified → particle burst)
  • Persistence: bitmask of broken constraints saved per destructible
  • Budget: force-expire oldest/distant debris when ceiling exceeded

Particle Systems

GPU-first simulation with indirect draw, prefix-sum compaction, and deterministic warm-up.
GPU ComputeDesign
  • Full GPU pipeline: emit → update → compact → sort → render
  • SoA double-buffered layout, no CPU readback for dispatch
  • Sub-emitters: event-driven spawning (death, collision, age)
  • Lit particles with probe sampling and shadow cascade receiving

Technical

  • Deterministic per-particle RNG: hash(emitter_id, frame, spawn_index)
  • Warm-up: burst simulation at fixed timestep before first visible frame
  • Emission proxy: summary weighted average as transient point light
  • Screen-space depth collision or physics broadphase (budget-limited)
  • GPU memory: 256MB (P3), 128MB (P2)
💄

Hair Simulation & Rendering

Strand-based hair design with PBD GPU simulation and Marschner BSDF.
GPU ComputeDesign
  • PBD-based GPU simulation for 30K-50K rendered strands
  • Compute software rasterizer with visibility buffers
  • Marschner BSDF with dual scattering and deep opacity shadows
  • Proven: 8 characters on-screen at 60 FPS on current-gen consoles

Fallback Chain

  • Shell/fin rendering fallback on P1/P0
  • LOD: strand count reduction with distance
  • Wind and collision forces applied in GPU compute pass
🍂

Cloth & Foliage Simulation

GPU XPBD cloth solver with dual-authority vertex pinning and foliage interaction.
GPU XPBDDesign
  • Low-res proxy mesh simulation with pin_weight blending to render mesh
  • Self-collision reserved for hero garments on higher profiles
  • Foliage displacement from player/physics interaction
  • GPU solver on P2/P3, CPU fallback on P0/P1, static on P0
🎧

Spatial Audio

GPU ray-casting propagation with zone/portal fallback and middleware-agnostic backend.
GPU RaysDesign
  • GPU audio rays: 256 rays/source, 32 sources on P3
  • AcousticsSummary readback: occlusion, reflection delay, reverb
  • CPU zone/portal fallback on P0/P1 — same gameplay contract
  • Voice management: 128 rendered voices on P3 with priority stealing

Audio Pipeline

  • Material acoustic properties cooked alongside visual properties
  • Audio LOD: distance-based update frequency reduction
  • Adaptive music system with crossfade transitions
  • Wwise/FMOD integration via standard interface; built-in mixer fallback
🎬

Cinematics & Sequencer

Data-driven timeline with camera, animation, audio, and event tracks.
Design
  • Editor-authored timeline-based cutscenes with smooth blending
  • Track types: property, animation, camera, event, audio, video, visibility, script
  • Skippable with guaranteed deterministic playback
  • Integrated camera paths with keyframed DOF, FOV, roll

Video Playback

Hardware-decoded video with streaming and in-world render-to-texture.
Design
  • Platform-native hardware decoders (HEVC/H.264)
  • Fullscreen overlay and in-world render-to-texture modes
  • Audio/video sync with 20ms lip-sync tolerance
  • Subtitle rendering with localization integration

Editor & Tools

Real-time in-engine editor built on the same runtime as the game.
Core v1Partial
  • Editor-as-game architecture: same executable, same render pipeline
  • Scene viewport with transform gizmos and play-in-editor
  • Property inspector with auto-generated type editors and batch editing
  • Undo/redo with 64MB memory budget

Editor Features

  • Dear ImGui-based docking UI with profiler overlay
  • Asset browser with thumbnail generation and re-cook triggers
  • Debug rendering: immediate-mode API, per-category, <10M vertex budget
  • [EditorOnly] components stripped from cooked builds

Asset Pipeline

Deterministic offline cooking with content-addressed caching and parallel builds.
Core v1Live
  • Deterministic per-platform cooking for bit-identical outputs
  • Content-addressed DDC with incremental builds via topological DAG
  • Budget validation at cook time — rejects over-budget assets
  • Standalone executable, runs on CI without GPU or display

Pipeline Detail

  • LOD generation, compression, shader compilation offline
  • Per-platform: texture compression, shader targets, audio codecs, endianness
  • Watch mode for hot-reload integration with editor
  • Structured CLI error reporting

Live Reload

Layered hot-reload stack — module loading live; deeper layers on the roadmap.
Dev-OnlyPartial
  • Module / DLL reload via DynamicModuleLibrary — live
  • Shader hot-reload — designed
  • Engine hot-restart with state preservation — designed
  • Live++ C++ function patching — design only, not yet integrated

Contract

  • Zero reload infrastructure in shipping builds
  • Three module modes: built-in · dynamic (dev) · static (shipping)

Build Toolchain

CMake build with VS 2026 / C++23, dual-config presets, and CTest matrix.
Core v1Live
  • CMake with explicit module targets and dependency validation
  • 4 build configs: Debug, Development, Test, Shipping
  • Per-profile shader/asset compilation in CI pipeline
  • Single-source-of-truth toolchain manifest for all platforms

Scripting API

AngelScript with ECS binding, coroutines, and cycle-free memory contract.
Design
  • AngelScript VM with reference counting (no GC at runtime)
  • Copy-in/copy-out ECS binding — no dangling references
  • Coroutines: yield() for multi-frame behaviors
  • CI enforces zero GC candidate count at build time

Design

  • Scripts as orchestration; bulk ops exposed as native C++ functions
  • Hot-reload via live reload stack
  • Per-profile VM memory: P3=16MB, P0=4MB
  • Single-lane execution; designed for future multi-context sharding
🚶

AI & Navigation

Runtime navmesh, hierarchical pathfinding, hybrid BT+utility AI, and crowd avoidance.
GPU CrowdDesign
  • Runtime hierarchical navigation graphs with dynamic link updates
  • GPU-assisted crowd simulation with obstacle avoidance
  • Hybrid behavior tree + utility AI framework
  • Seamless LOD transitions for distant and close-range agents
🎮

Input System

Action/axis binding with device abstraction and context-sensitive binding sets.
Core v1Partial
  • Devices: keyboard, mouse, gamepad, touch (10 points), gyro/accel
  • Actions (boolean) + Axes (float) with many-to-many binding
  • Context-sensitive binding sets (Gameplay, Vehicle, Menu, Dialogue)
  • Haptics: rumble, adaptive triggers (DualSense), trigger vibration, light bar

UI System

Dual-mode: retained-mode game UI and immediate-mode debug overlay with SDF fonts.
GPU UIDesign
  • Retained-mode widgets with anchor + flex layout and style cascade
  • SDF font rendering, UTF-8, complex script shaping, CJK
  • Data binding decouples UI from gameplay
  • Aggressive batching: <20 draw calls per UI screen

GPU UI (P2/P3)

  • UIPrepare stage for geometry expansion and composition effects
  • Resolution-independent with DPI awareness
  • Console safe area compliance
  • Debug overlay (profiler, memory, console) stripped from shipping
🌐

Networking

Platform-agnostic transport with replication, prediction/rollback, and console backends.
Design
  • Transport: GameNetworkingSockets (PC), GDK (Xbox), PSN (PS5), NEX (Switch)
  • Client-server, listen server, P2P, dedicated server topologies
  • Reliable-Ordered, Reliable-Unordered, Unreliable channels
  • P2P host migration via independent peer consensus
🎥

Camera System

Priority stacking, trauma shake, cinematic rails, and photo mode.
Design
  • Multiple cameras with priority stacking and blend transitions
  • Controllers: Free-Look, Third-Person (spring-arm), First-Person, Cinematic
  • Trauma-based screen shake with Perlin noise and asymmetric profiles
  • Photo mode: freeze sim, detach camera, super-sample up to 4x
💾

Save System

Component-based serialization with forward-compatible versioning and async I/O.
Design
  • Schema migration chains for forward compatibility
  • LZ4 compression with integrity checksums
  • Async pipeline preserves frame pacing during saves
  • Platform-specific storage backends
🌍

Localization

Multi-language support with HarfBuzz shaping, RTL, and CJK glyph handling.
Design
  • HarfBuzz text shaping for Arabic, Thai, CJK scripts
  • CLDR pluralization rules
  • RTL paragraph layout with bidirectional text support
  • Per-language font atlases with lazy-loaded glyph pages
📈

Telemetry & Profiling

Always-on profiling hooks with Tracy, PIX, and RenderDoc integration.
Core v1Live
  • Per-job, per-system, per-phase CPU/GPU timing
  • Ring buffer architecture for zero-allocation markers
  • Budget broker feed for real-time performance adaptation
  • ETW, Instruments, Superluminal, Tracy integration

Plugin & Module System

Module loading with three modes; C ABI mod boundary on the roadmap.
Core v1Partial
  • Three modes: built-in, dynamic (dev), static (shipping)
  • Declarative dependencies and registration contracts
  • Mod support via sandboxed AngelScript execution
  • C ABI boundary for binary stability
📥

I/O & Streaming

Async-first with DirectStorage, GPU-direct decompression, and predictive prefetch.
GPU-DirectDesign
  • DirectStorage (PC/Xbox) / PS5 I/O complex for GPU-direct reads
  • VFS with pak overlay priority: base → patch → DLC → loc → mods
  • Predictive prefetch based on camera velocity and frustum
  • Residency management with priority-based eviction

Event System

Lock-free MPSC ring queue primitive; full event-system layering on the roadmap.
Core v1Partial
  • Shared ring buffers eliminate per-subscriber copying
  • Frame-buffered visibility respects sync phases
  • UI event bubbling with parent-chain propagation
  • GPU event summary readback for simulation feedback

Resource System

Stable asset handles, dependency tracking, hot-reload, and thread-safe access.
Core v1Partial
  • Stable handles with dependency invalidation + hot-reload propagation
  • Pin/unpin semantics for residency control
  • Thread-safe read access contract
  • Runtime face of the content system between pipeline and gameplay

Configuration

Layered TOML config with runtime CVars and per-profile overrides.
Core v1Live
  • Four-layer hierarchy: engine → project → user → CLI overrides
  • Dynamic CVars with change callbacks for runtime quality switching
  • Per-profile rendering and memory budget configuration
  • TOML-based, version-controlled, human-readable

Performance Budgeting

Explicit frame budgets with runtime broker-driven degradation ladders.
Design
  • Per-subsystem CPU, GPU, memory, I/O, thermal envelopes
  • Degradation ladders with hysteresis to prevent oscillation
  • Cook-time validation against profile envelopes
  • Cross-system LOD policy with shared importance metric

Simulation Ownership

Ownership contracts preventing GPU readback stalls in gameplay-critical paths.
Design
  • CPU-owned, GPU-owned, and mirrored data product declarations
  • Six sync phases with explicit visibility rules
  • Summary components expose GPU results without bulk readback
  • No subsystem invents its own ownership vocabulary

Target Platforms

Shipping today on Windows (DirectX 12 & Vulkan 1.4) and macOS (Metal 4). Designed from day one for the full console matrix.

Shipping
💻
Windows PC
DirectX 12 · Vulkan 1.4
🌍
macOS
Metal 4 · MetalFX · RT
Roadmap

Code Philosophy

C++23 Restricted Subset

No exceptions, no RTTI. Engine-owned types with explicit allocators in hot paths; std used pragmatically where it pulls its weight. constexpr/consteval over template metaprogramming.

No-GC Runtime

No tracing garbage collection. Exclusively manual or deterministic reference counting with cycle-free invariants validated by telemetry.

Max Depth 1 Inheritance

Interface + implementation only. No virtual dispatch on hot paths. No dynamic_cast. Composition over inheritance throughout.

Template Discipline

No SFINAE, CRTP frameworks, expression templates, or policy-template design. Narrow typed templates for containers and math only.

Explicit Allocators

No global new/delete. Arena, pool, frame, stack, and TLSF allocators. Every allocation has a known lifetime and budget owner.

Multi-Language Tooling

Zig and Rust permitted for offline tools (asset processors, build utilities). AngelScript for gameplay scripting with GC-dormant contract.

A Feel-Good Engine

Kapi is a sane-defaults engine that aims to do a lot for you — transparently and accessibly. Smart defaults get you running fast; every decision is inspectable, overridable, and documented. No hidden magic, no opaque pipelines — just an engine that respects your time and gets out of the way.

Future upgrades will expand the creative toolkit with visual scripting, material and game-AI node graphs, and visual storytelling tools — making Kapi a complete creative platform for teams of every size.

Visual Scripting
Material Nodes
Game-AI Nodes
Visual Storytelling