C++23 · GPU-First · In Development · Late 2027

KAPINext-Generation Game Engine

Born from years of industry experience. A modern runtime in active development on Windows, macOS, Xbox Series X|S, and Linux, designed from day one for the full console matrix. GPU-first where it matters, data-oriented in spirit, and explicit about ownership and fallback behavior. Targeting late 2027 release.

9
Live Systems
13
Partial · In Progress
4
RHI Backends Live
See It Run What You Can Build

Build The Way You Couldn't Before

Most engine pages are a wall of features. Here's what you actually do differently with Kapi: the workflows it enables, the iteration loops it shortens, and the things it lets you ship.

01

Compare multiple render backends side by side.

Mount additional viewports on different RHIs and inspect them together: final frame, render-graph passes, shader output, motion vectors, depth, debug surfaces. Catch cross-vendor regressions before they ship.

Editor
02

Switch GI techniques live. No bake.

TLRC, DDGI, RTXDI ReSTIR, full path tracing: all swappable on the same scene at runtime. Look-dev moves at thinking speed.

Look-Dev
03

Hot-reload your gameplay code while the world is running.

Recompile a game module, swap in the DLL, world state preserved. Shipping builds collapse to a static link with zero overhead.

Iteration
04

Iterate materials without losing your flow.

glTF and OpenUSD on import, Slang shaders compiled once for every backend. Whatever your DCC tool exports is what the engine consumes; cold-start is bounded by what changed since last run. PBR with glTF-extension surface model (clear coat, sheen, anisotropy, SSS, transmission) wired from day one.

Artist Workflow
05

Imports run at hardware speed.

Texture compression, mesh optimisation, meshlet generation, and shader variants dispatch across every CPU core, with multi-GPU acceleration where it pays. Content-addressed caching skips what didn't change; iteration loops in seconds, not coffee breaks.

Fast Imports
06

Run the engine without a GUI. Run the entire pipeline too.

Every runtime and pipeline operation has a CLI path: cook, render, capture, replay, test. Software mode anywhere; multi-GPU acceleration when present.

Headless / CLI
07

Scale with the hardware. Every core, every GPU.

Parallel-first end to end. Job system scales to extreme core counts (XCR), NUMA-group aware, multi-GPU pipeline. Add hardware, the engine uses it.

Parallel / Scalable
08

Cook content on a CI box that doesn't have a GPU.

Standalone deterministic asset cooker with content-addressed cache and incremental DAG builds. Build farms work without graphics drivers; dev machines pull cached blobs.

Pipeline / CI
09

Profile every render-graph pass with one keystroke.

Tracy hooks ship in the engine: per-pass GPU markers, per-job CPU markers, per-system phase timing. PIX and RenderDoc captures work too, with real names.

Profiling
10

Toggle path-traced reference any time, on any GPU vendor.

NVIDIA, AMD, Intel, Apple: all four vendors run the bindless heap, the visibility buffer, and the path tracer. One code path, one feature contract.

Cross-Vendor
11

Author once. Run on every tier without content drift.

Same authoring data drives all four profiles (P0–P3). Lower tier means lighter lighting and fewer rays, never different gameplay or different content.

Scalability
12

Read the engine. Understand a frame.

Small, layered, explicit. No proprietary scripting layer mediating every system, no reflection compiler regenerating headers. When a frame takes 18 ms instead of 16, you can find out why.

Transparency

Different Choices, Different Days

A subset of the friction that shows up in big-engine workflows, and how Kapi is designed differently. Not a flame; just the design choices that change your day.

Shader compilation hitches mid-frame.The first time you walk into a new region, the GPU stalls while the driver compiles permutations.
All permutations cooked offline.The runtime touches only variants the cook step blessed. Nothing compiles at play.
Lighting bake required to see the final look.Move a light, click "Build Lighting," go for coffee.
Real-time GI is the default.TLRC, DDGI, ReSTIR PT all selectable at runtime. Bake is optional and additive.
Five-minute editor cold-start, proprietary import pipeline.Custom asset formats, conversion steps that lose precision, full re-import every time the DCC tool updates.
Open formats and incremental cooking.glTF, OpenUSD, and Slang straight through. Background DDC cooks saves while you work. Cold-start is bounded by what actually changed.
Re-importing a texture pack takes a coffee break.Single-thread compression, sequential mesh processing, full-asset re-cook on every change. Iteration loops measured in minutes.
Imports parallelise across every core and GPU.Texture compression, mesh optimisation, meshlet generation, and shader variants all dispatch in parallel. Content-addressed caching skips what didn't change.
Engine restart for every code change.Some engines offer partial live patching, but it tends to be fragile; the rest of the time you're rebooting the editor and reloading the level.
Module DLL hot-reload preserves world state.Three module modes (built-in, dynamic for dev, static for shipping) built into the loader.
Reflection compiler regenerating headers.Engine-specific binding macros and a codegen step that runs before every build.
C++23 directly. No codegen step.Standard tooling works exactly as you'd expect; no engine-specific build phase between you and the binary.
Hard to compare two render backends.Two editor builds, two PIX captures, three monitors, a spreadsheet.
Side-by-side viewport comparison in one editor window.Pick any two backends, scrub the camera, compare pixels.
Profile-tier mismatch causes content drift."It looks fine on the test rig but not on Steam Deck", usually because the lower tier silently uses different content paths.
Same authoring drives all four profiles.P0–P3 differ by execution and budget, never by content. No semantic drift between tiers.
Editor required to do anything.Cooks, validations, smoke tests, even quick render captures want a windowing system and a graphics driver.
Every operation runs from the CLI.Editor is one shell over the same primitives. Cook, render, validate, replay from any terminal. Software mode anywhere; multiple GPUs when available.
Single-thread bottlenecks in cook and build.Past 8–16 cores, more hardware stops helping. The build agent waits.
Parallel-first across runtime and tooling.Extreme core counts (XCR), NUMA-group aware, multi-GPU pipeline. Throw bigger hardware at it; the engine uses it.
Engine-specific config dialect.A handful of similarly-named ini files plus per-platform overrides, every section a small mystery.
TOML throughout, layered.Engine → project → user → CLI overrides. Diff-friendly, version-controllable, plain text.

What's New in Kapi

Recent renderer milestones landed in the engine codebase: full path tracing across every GPU vendor, the modern temporal stack, vendor upscalers, and multi-backend parity. The engine is targeting late 2027 release.

New

TLRC Global Illumination

Temporal Light Radiance Cache. A modern lightweight GI option live on D3D12 & Vulkan, with screen-probe resolve, temporal reprojection, adaptive ray allocation, and a world-cache feedback loop.

Modern GI
New 🌍

Metal 4 on Apple Silicon

macOS backend on the Metal 4 API with full path tracing, MetalFX upscaling, residency sets, and TBDR-aware pass ordering.

Apple · Metal 4 · MetalFX
New 🔥

DLSS Super Resolution & Ray Reconstruction

NVIDIA NGX integration: DLSS Super Resolution and Ray Reconstruction wired into the render graph. Frame Generation capability is detected and reported; presentation routing is the next phase.

NVIDIA · SR + RR
New

FSR3 Frame Generation

AMD FidelityFX Super Resolution 3 frame interpolation wired across both D3D12 and Vulkan paths, sharing the engine's motion-vector and depth contracts.

AMD · FSR3-FG
New

4-Vendor Path Tracing

RTXDI ReSTIR direct illumination and full path tracing running across all four GPU vendors (NVIDIA, AMD, Intel, and Apple) via DXR, Vulkan KHR-RT, and Metal 4 ray tracing.

NVIDIA · AMD · Intel · Apple
New

DDGI Probe Volumes

Dynamic Diffuse Global Illumination using ray-traced irradiance probes laid out on a regular grid, classifier-driven probe relocation, and 8-direction visibility. Selectable alongside TLRC.

Probe-Based GI
New 🌈

HDR Scene-to-Present

Complete HDR ownership from lighting through post-processing. HDR10 output on D3D12 & Vulkan with correct EOTF/PQ encoding at present.

HDR10
New

NRD Denoising Wrapper

NVIDIA Real-time Denoiser integration covering REBLUR/RELAX for raytraced shadows, reflections, and GI, with a custom temporal denoiser as a feature-flagged fallback.

NRD · Custom
New

Bindless Heap (65k+)

Unified bindless slot tracking across D3D12 and Vulkan. Materials, textures, structured buffers, samplers and acceleration structures all routed through a single index space.

D3D12 + Vulkan
New

Visibility Buffer + Mesh Shaders

Production VisBuffer pipeline with 12-bit drawcall, 8-bit triangle, 12-bit meshlet packed into a 32-bit payload. 64-vertex/124-triangle meshlets driven by GPU culling.

GPU-Driven
New

Motion Vectors & TAA

Graph-owned motion vector texture with camera/static reprojection. In-house TAA with neighborhood clamping, plus camera view history available to any temporal consumer.

Temporal Foundation
New

GTAO + CMAA2

Ground Truth Ambient Occlusion and Conservative Morphological AA, both production-validated and selectable through the post-process path enum on either backend.

AO + AA
New

Multi-Backend Renderer

Feature parity across the live RHIs (DirectX 12, Vulkan, Metal 4, GDK / D3D12X for Xbox Series X|S, Vulkan on Linux). Side-by-side viewport comparison lets you A/B any pair in the editor. Remaining console SDKs on the roadmap.

Multi-RHI

Under the Hood

Six points that shape how the engine is built. Expand for the full set of architectural invariants and code-philosophy rules.

GPU-First Architecture

Work moves to the GPU when throughput gain is real. GPU-driven culling, skinning dispatch, scene BVH upload, and visibility-buffer material evaluation run on the device today.

VisBuffer + Mesh Shaders

32-bit packed visibility payload (12+8+12 bits) feeds compute material evaluation. 64-vertex/124-triangle meshlets driven by GPU frustum/occlusion/backface culling.

Multi-Core Job Scheduling

Job system scales to all available CPU cores; NUMA-aware worker pinning via env flag. Lock-free atomic queues, DAG dependencies, deterministic mode for repro.

Explicit API Mindset

Render-graph DAG with automatic barrier coalescing and transient resource aliasing for VRAM savings. Live on DirectX 12, Vulkan, Metal 4, Xbox GDK (D3D12X), and Linux (Vulkan); remaining consoles on the roadmap.

Data-Oriented Design

POD-by-default data, contiguous storage, deterministic job scheduling. The full archetype ECS is on the roadmap; current systems use scene-graph storage with cache-friendly access.

No Semantic Drift

Same data contract, different execution path. Lower-tier fallbacks reduce quality but never require different gameplay code. One codebase, all platforms.

More highlights

Offline-First Assets

Deterministic content-addressed cooking with DDC. Budget validation at cook time. Runtime consumes prebuilt streaming-friendly formats exclusively.

Layered Hot Reload

Module / DLL reload via DynamicModuleLibrary today; shader reload, engine hot-restart, and Live++ C++ patching designed for sub-second iteration.

Architectural pillars
01

Job-System-First Execution

Systems schedule jobs and declare dependencies. No direct OS threads as authoring model. Synchronization on hot paths is minimized via lock-free atomics.

02

Data-Oriented Layout

Game state structured for locality, batching, and vectorization. ECS with contiguous storage, SIMD-friendly data layout, and predictable iteration patterns.

03

Explicit API Rendering

Render-graph style dependency control with deliberate pass dependencies, resource transitions, and feature fallbacks. No hidden state machines.

04

GPU-First, Not Dogmatic

Throughput-critical work moves to GPU. Gameplay-critical, deterministic, or rollback-sensitive work may remain CPU-owned when latency demands it.

05

Ownership Contracts

Every simulation product declares CPU-owned, GPU-owned, or mirrored. Sync phases with explicit visibility rules prevent readback stalls.

06

Budget Enforcement

CPU, GPU, memory, I/O, and thermal envelopes per subsystem. Cook-time validation. Runtime budget broker with degradation ladders and hysteresis.

Code philosophy

C++23 Restricted Subset

No exceptions, no RTTI. Engine-owned types with explicit allocators in hot paths; std used pragmatically where it pulls its weight. constexpr/consteval over template metaprogramming.

No-GC Runtime

No tracing garbage collection. Exclusively manual or deterministic reference counting with cycle-free invariants validated by telemetry.

Max Depth 1 Inheritance

Interface + implementation only. No virtual dispatch on hot paths. No dynamic_cast. Composition over inheritance throughout.

Template Discipline

No SFINAE, CRTP frameworks, expression templates, or policy-template design. Narrow typed templates for containers and math only.

Explicit Allocators

No global new/delete. Arena, pool, frame, stack, and TLSF allocators. Every allocation has a known lifetime and budget owner.

Multi-Language Tooling

Zig and Rust permitted for offline tools (asset processors, build utilities). AngelScript for gameplay scripting with GC-dormant contract.

Capability Profiles

Four profiles drive every subsystem decision. Same content, same contract, scaled execution.

P0
Compatibility
Low-end / Fallback
P1
Sustained Mobile
Phones & Tablets
P2
Fixed High-BW
PS5 · Xbox Series X
P3
Scalable Discrete
High-End PC
Render pathForwardForward, TBDR-optimisedVisibility bufferVisBuffer + mesh shaders + RT
CullingCPU BVH frustumCPU BVH frustumGPU-drivenMeshlet cluster + Hi-Z
GI / lightingBaked + SSGI fallbackBaked + selective probesRT shadows, screen-probe GIFull path tracing · RT GI · RT reflections
Post / AAFXAA, basic bloomTAA, GTAOTAA + virtual texturesDLSS-RR, FSR3, full post chain
SimulationCPU onlyCPU + light GPUGPU particles & audioFull GPU simulation
NotableThermal governance, unified memoryTier-1/2/3 RT detectionMulti-GPU pipeline acceleration

Engine Subsystems

For when you want to drill into a specific subsystem. 40+ engine systems organised into seven categories. Click any category to browse, or open the full reference grid below.

Live: production code shipping Partial: core implemented, advanced features in progress Design: full specification, implementation on roadmap
Category:
Status:

Render Hardware Interface

Multi-backend graphics abstraction with render graph, bindless heaps, and visibility-buffer pipeline.
GPU-DrivenCore v1Live
  • Live backends: DirectX 12, Vulkan, Metal 4, GDK / D3D12X (Xbox Series X|S), Vulkan on Linux — with feature parity
  • Render graph DAG with automatic barrier coalescing and transient aliasing
  • Bindless heap (65k+ slots) for textures, buffers, samplers, acceleration structures
  • Visibility buffer driving compute material evaluation; meshlet culling on capable hardware
  • Indirect draw and GPU-driven culling paths

Roadmap

  • Remaining console backends: NVN (Switch 2), AGC (PS5)
  • Forward fallback path for lower-tier hardware

Lighting & Global Illumination

Baked, screen-probe, and full path-traced GI selectable at runtime: TLRC, DDGI, ReSTIR PT.
GPUPath TracingLive
  • TLRC (Temporal Light Radiance Cache): a modern lightweight GI option
  • DDGI dynamic probe grids and RTXDI ReSTIR for direct + path tracing
  • Up to 256 punctual lights with shadow-flag controls
  • Raster cascaded shadow maps and hardware RT shadows (NRD denoising)
  • IBL: prefiltered envmap, diffuse irradiance (cubemap or SH-L2), BRDF + Charlie LUTs

Roadmap

  • Parallax-corrected reflection probes
  • Reference offline path tracing for ground-truth comparison

Material System

Unified PBR shader pipeline with glTF-extension surface model and Slang shaders.
SlangCore v1Partial
  • Standard metallic-roughness PBR (GGX/Smith-Fresnel) with energy conservation
  • Single EvaluateSurface() compiles to compute, fragment, and depth-only paths
  • Bindless texture access with per-instance parameter overrides via GPU scene buffer
  • glTF reference fields for clear coat, sheen, anisotropy, SSS, transmission already in the upload struct
  • Cloth/sheen runtime ready (Charlie LUT precomputed at init)

Roadmap Layers

  • Skin SSS, car paint flake, hair Marschner, glass refraction (design specs ready, runtime evaluators next)
  • Decal projection with deferred material blending (designed)
  • Material variants at cook time (design contract drafted)

Post-Processing

Composable post-FX chain with TAA, GTAO, CMAA2, tonemap operators and selectable post paths.
GPUCore v1Partial
  • In-house TAA pass with neighborhood clamping and history rejection
  • GTAO (Ground Truth Ambient Occlusion) with configurable radius/quality
  • CMAA2 (Conservative Morphological AA) as a lightweight post-AA option
  • Tonemap operators: Linear, Reinhard, ACES Narkowicz, ACES Hill, Khronos PBR Neutral
  • HDR10 output with PQ/EOTF encoding at present

Post Paths

  • Selectable: CopyOnly · GTAO · CMAA2 · TAA · GTAO+CMAA2 · GTAO+TAA
  • Color grading LUT slot; per-profile path selection

Roadmap

  • Bloom, auto-exposure histogram, motion blur, DOF, film grain (designed, not yet implemented)
🔥

Upscaling & Frame Generation

Vendor super-resolution wired into the render graph with shared motion-vector contracts.
DLSSFSR3Partial
  • DLSS Super Resolution (SR) with DLAA mode, integrated
  • DLSS Ray Reconstruction (RR), integrated
  • FSR3 Frame Generation surface across DirectX 12 and Vulkan
  • DLSS Frame Generation: capability detected, presentation routing pending
  • Shared motion-vector and depth contracts; auto/manual denoiser routing

Roadmap

  • XeSS integration (enum exposed, wrapper pending)
  • DLSS-FG presentation cadence wiring

Real-Time Path Tracing

Full path tracing as a first-class lighting path, with hardware RT for reflections, shadows, and GI.
Path TracingRTXDI ReSTIRLive
  • RTXDI ReSTIR direct illumination + full path tracing modes selectable at runtime
  • Reference path tracing for ground-truth look-dev; production path-traced GI on capable hardware
  • RT Reflections with temporal stabilization; RT Shadows with NRD denoising
  • Inline ray queries and pipeline mode both supported across DirectX 12, Vulkan, Metal 4
  • Segmented BLAS/TLAS; refit-on-update for static/near-static geometry
  • Denoising: NRD (REBLUR/RELAX) · custom temporal · DLSS Ray Reconstruction

Terrain

Streaming large-scale heightmap terrain with clipmap LOD and vegetation.
GPU LODDesign
  • 256m tiles with 16-bit heightmap (0.9mm precision over 300m range)
  • Clipmap LOD: 8 rings on P3 (32km draw), 4 rings on P0 (2km draw)
  • Splatmap multi-layer blending (up to 8 layers on P2/P3)
  • Visibility buffer integration with vertex pulling from heightmap texture

Technical Detail

  • GPU-driven LOD selection with per-patch screen-space error evaluation
  • Tri-planar projection on steep surfaces to prevent stretching
  • Index buffer stitching for seamless LOD transitions (P2/P3)
  • Terrain tiles as streaming assets with predictive prefetch
  • Virtual texture participation with page request generation
🌊

Water System

Wave simulation with buoyancy physics and interaction effects.
GPU ComputeDesign
  • Gerstner waves (SIMD-evaluated) + FFT ocean spectrum on P3
  • GPU displacement maps (1024x1024 on P3) via compute shader
  • Reflection scaling: env cubemap → SSR → planar → RT
  • Buoyancy physics with per-body sample-point force/torque

🌊 Water Detail

  • Screen-space refraction with Beer-Lambert depth absorption
  • Jacobian-driven foam at wave crests, depth-threshold shoreline foam
  • Flow maps for surface velocity, normal animation, foam accumulation
  • Subsurface scattering glow on crests proportional to sun angle
  • Ocean, lake, river body types with distinct wave behavior

Environment & Sky

Dynamic atmospheric rendering with sky shading and time-of-day support.
GPUPartial
  • Atmospheric scattering (Rayleigh + Mie) with precomputed LUTs
  • Volumetric clouds with ray-marched density fields
  • Time-of-day system driving sun position, sky color, fog parameters
  • Weather system with rain, snow, fog transitions

Atmosphere

  • Height fog with density falloff and directional inscattering
  • Volumetric fog with temporal reprojection for stable results
  • Dynamic sky dome with star field and moon phases

Entity Component System

Scene-graph storage today; archetype ECS designed and on the implementation roadmap.
Core v1Design
  • Current world runtime uses scene-graph storage with cache-friendly access patterns
  • POD-by-default component philosophy in design contract
  • Component declarations driving automatic job parallelism (specified)

Roadmap

  • Archetype-based storage with cache-friendly SoA chunks
  • Lazy query caching with deferred structural commits
  • Tag and singleton components
  • Per-chunk change detection
  • Entity budgets per capability profile

Job System

Multi-core work scheduler with priority levels, dependency counters, and deterministic mode.
Core v1Live
  • Scales to all available CPU cores via std::thread workers
  • Five priority levels (Critical · High · Normal · Low · Idle)
  • Atomic counters for job dependencies and completion signaling
  • NUMA-aware worker pinning available as opt-in (KAPI_JOBS_NUMA_PIN env flag)
  • Deterministic mode for repro via KAPI_DETERMINISTIC_JOBS
  • Tracy profiler integration via shared trace hooks

Roadmap

  • Console fiber backends (PS5 SCE, Switch, Xbox) · designed, not yet implemented
  • Cache-affinity pinning, configurable stack tiers · designed

Memory System

Engine-owned allocators in hot paths with frame arenas and lock-free pools.
Core v1Partial
  • Frame linear allocator: shared bump-pointer with bulk reset at frame end
  • Lock-free fixed-capacity packet pools with intrusive free lists and ABA-safe generation tags
  • 16-byte alignment for GPU upload boundaries
  • std types used pragmatically outside hot paths

GPU & Platform

  • D3D12 / Vulkan native memory paths; persistent-mapped upload buffers

Roadmap

  • Per-thread arenas to reduce contention
  • Layered budget broker with TLSF for real-time audio
  • Cook-time per-level budget validation
  • Debug poison patterns, guard pages

Game Loop & Frame Orchestration

Frame loop with CPU/GPU overlap; full phased pipeline on the implementation roadmap.
Core v1Design
  • App host tick loop with module lifecycle
  • Render-graph-driven submission with multi-frame in-flight bounding

Roadmap

  • Six phases: Input → CPU Sim → Structural Changes → Events → Render Submit → Present
  • Fixed timestep (1/60s) accumulator model decoupling sim from render
  • VSync modes: Off (tearing), On (locked), Adaptive (dynamic)
  • Time dilation for pause/slow-mo
  • Deterministic simulation enabling replay and rollback

Scene Graph & Spatial

Hierarchical scene graph with GPU-resident BVH for culling; advanced spatial queries on the roadmap.
GPU CullCore v1Partial
  • Hierarchical scene graph with parent-child transforms
  • GPU scene BVH upload for frustum + GPU-driven culling
  • Binary BVH node layout with AABB bounds

Roadmap

  • SAH build with incremental update
  • Hi-Z occlusion culling pass
  • Streaming chunks with predictive load/unload
  • Spatial hash/grid for proximity queries
  • LOD selection via screen-space metric

Hardware Abstraction Layer

Platform abstraction for window management, input, and OS services.
Core v1Partial
  • Windows platform layer with HWND, raw mouse input, exclusive fullscreen
  • macOS platform layer with Metal 4 surface bring-up
  • Xbox Series X|S platform layer (GDK lifecycle, suspend/resume, user/save services)
  • Linux platform layer (X11 / Wayland surface bring-up, Vulkan WSI)
  • Window management, fullscreen, DPI, HDR, monitor enumeration
  • Structured logging with crash ring buffer context

Roadmap

  • Remaining console SDKs (PS5, Switch 2) · designed
  • Heterogeneous core topology awareness (P/E cores) · designed
  • Console fiber/coroutine support · designed
  • Large page support, console async I/O · designed

Physics & Collision

Dual-authority: CPU primary (low-latency gameplay) + GPU secondary (high-count debris, ragdolls).
GPU PhysicsSIMDDesign
  • CPU primary physics: rigid body, raycasts, overlap (zero-frame latency)
  • GPU secondary: position-based dynamics for mass debris simulation
  • BVH broadphase (primary) + spatial hash (GPU secondary)
  • SIMD constraint batching: AVX2 = 8/batch, NEON = 4/batch

Constraints & Bodies

  • 6 joint types: Fixed, Hinge, Ball-Socket, Slider, Distance, ConeTwist
  • Breakable constraints with impulse accumulation thresholds
  • Ragdolls: P3=32 active, P2=16, P1=4 with priority eviction
  • GPU XPBD solver for cloth and rope on P2/P3; CPU fallback P0/P1

Character Controller

  • Engine-owned CharacterMovement component
  • Capsule sweep, step-up, ground detection, slope limits
  • Movement modes: walking, falling, swimming, climbing

Animation & Skinning

Skeleton runtime + GPU skinning dispatch; blend trees and state machines on the roadmap.
GPU SkinningPartial
  • State machines with condition-driven transitions between layers
  • Hierarchical blend trees: sample, blend, additive, parameter-driven
  • GPU skinning via compute shader in Phase 3
  • SIMD batch evaluation of sampled bone curves

Advanced

  • Attachment sockets for weapons, accessories, FX
  • Skeleton LOD: distant characters skip leaf bones
  • Keyframe compression with time-windowed event callbacks
  • IK and procedural post-processing applied post-blend
💥

Destruction

Pre-authored fracture with constraint-driven breaking and budget-managed debris lifecycle.
GPU DebrisDesign
  • Offline-authored fracture (DCC pre-split chunks, no runtime Voronoi)
  • Constraint graph with impulse accumulation → breakable thresholds
  • Union-find connectivity for O(1) debris group discovery
  • GPU debris simulation with spatial hash broadphase on P2/P3

Lifecycle

  • Spawned → Active → Settling → Expired → Destroyed
  • Per-chunk LOD meshes (full detail → simplified → particle burst)
  • Persistence: bitmask of broken constraints saved per destructible
  • Budget: force-expire oldest/distant debris when ceiling exceeded

Particle Systems

GPU-first simulation with indirect draw, prefix-sum compaction, and deterministic warm-up.
GPU ComputeDesign
  • Full GPU pipeline: emit → update → compact → sort → render
  • SoA double-buffered layout, no CPU readback for dispatch
  • Sub-emitters: event-driven spawning (death, collision, age)
  • Lit particles with probe sampling and shadow cascade receiving

Technical

  • Deterministic per-particle RNG: hash(emitter_id, frame, spawn_index)
  • Warm-up: burst simulation at fixed timestep before first visible frame
  • Emission proxy: summary weighted average as transient point light
  • Screen-space depth collision or physics broadphase (budget-limited)
  • GPU memory: 256MB (P3), 128MB (P2)
💄

Hair Simulation & Rendering

Strand-based hair design with PBD GPU simulation and Marschner BSDF.
GPU ComputeDesign
  • PBD-based GPU simulation for 30K-50K rendered strands
  • Compute software rasterizer with visibility buffers
  • Marschner BSDF with dual scattering and deep opacity shadows
  • Proven: 8 characters on-screen at 60 FPS on current-gen consoles

Fallback Chain

  • Shell/fin rendering fallback on P1/P0
  • LOD: strand count reduction with distance
  • Wind and collision forces applied in GPU compute pass
🍂

Cloth & Foliage Simulation

GPU XPBD cloth solver with dual-authority vertex pinning and foliage interaction.
GPU XPBDDesign
  • Low-res proxy mesh simulation with pin_weight blending to render mesh
  • Self-collision reserved for hero garments on higher profiles
  • Foliage displacement from player/physics interaction
  • GPU solver on P2/P3, CPU fallback on P0/P1, static on P0
🎧

Spatial Audio

GPU ray-casting propagation with zone/portal fallback and middleware-agnostic backend.
GPU RaysDesign
  • GPU audio rays: 256 rays/source, 32 sources on P3
  • AcousticsSummary readback: occlusion, reflection delay, reverb
  • CPU zone/portal fallback on P0/P1 (same gameplay contract)
  • Voice management: 128 rendered voices on P3 with priority stealing

Audio Pipeline

  • Material acoustic properties cooked alongside visual properties
  • Audio LOD: distance-based update frequency reduction
  • Adaptive music system with crossfade transitions
  • Wwise/FMOD integration via standard interface; built-in mixer fallback
🎬

Cinematics & Sequencer

Data-driven timeline with camera, animation, audio, and event tracks.
Design
  • Editor-authored timeline-based cutscenes with smooth blending
  • Track types: property, animation, camera, event, audio, video, visibility, script
  • Skippable with guaranteed deterministic playback
  • Integrated camera paths with keyframed DOF, FOV, roll

Video Playback

Hardware-decoded video with streaming and in-world render-to-texture.
Design
  • Platform-native hardware decoders (HEVC/H.264)
  • Fullscreen overlay and in-world render-to-texture modes
  • Audio/video sync with 20ms lip-sync tolerance
  • Subtitle rendering with localization integration

Editor & Tools

Real-time in-engine editor built on the same runtime as the game.
Core v1Partial
  • Editor-as-game architecture: same executable, same render pipeline
  • Scene viewport with transform gizmos and play-in-editor
  • Property inspector with auto-generated type editors and batch editing
  • Undo/redo with 64MB memory budget

Editor Features

  • Dear ImGui-based docking UI with profiler overlay
  • Asset browser with thumbnail generation and re-cook triggers
  • Debug rendering: immediate-mode API, per-category, <10M vertex budget
  • [EditorOnly] components stripped from cooked builds

Asset Pipeline

Deterministic offline cooking with content-addressed caching and parallel builds.
Core v1Live
  • Deterministic per-platform cooking for bit-identical outputs
  • Content-addressed DDC with incremental builds via topological DAG
  • Budget validation at cook time (rejects over-budget assets)
  • Standalone executable, runs on CI without GPU or display

Pipeline Detail

  • LOD generation, compression, shader compilation offline
  • Per-platform: texture compression, shader targets, audio codecs, endianness
  • Watch mode for hot-reload integration with editor
  • Structured CLI error reporting

Live Reload

Layered hot-reload stack: module loading live; deeper layers on the roadmap.
Dev-OnlyPartial
  • Module / DLL reload via DynamicModuleLibrary · live
  • Shader hot-reload · designed
  • Engine hot-restart with state preservation · designed
  • Live++ C++ function patching · design only, not yet integrated

Contract

  • Zero reload infrastructure in shipping builds
  • Three module modes: built-in · dynamic (dev) · static (shipping)

Build Toolchain

CMake build with VS 2026 / C++23, dual-config presets, and CTest matrix.
Core v1Live
  • CMake with explicit module targets and dependency validation
  • 4 build configs: Debug, Development, Test, Shipping
  • Per-profile shader/asset compilation in CI pipeline
  • Single-source-of-truth toolchain manifest for all platforms

Scripting API

AngelScript with ECS binding, coroutines, and cycle-free memory contract.
Design
  • AngelScript VM with reference counting (no GC at runtime)
  • Copy-in/copy-out ECS binding (no dangling references)
  • Coroutines: yield() for multi-frame behaviors
  • CI enforces zero GC candidate count at build time

Design

  • Scripts as orchestration; bulk ops exposed as native C++ functions
  • Hot-reload via live reload stack
  • Per-profile VM memory: P3=16MB, P0=4MB
  • Single-lane execution; designed for future multi-context sharding
🚶

AI & Navigation

Runtime navmesh, hierarchical pathfinding, hybrid BT+utility AI, and crowd avoidance.
GPU CrowdDesign
  • Runtime hierarchical navigation graphs with dynamic link updates
  • GPU-assisted crowd simulation with obstacle avoidance
  • Hybrid behavior tree + utility AI framework
  • Seamless LOD transitions for distant and close-range agents
🎮

Input System

Action/axis binding with device abstraction and context-sensitive binding sets.
Core v1Partial
  • Devices: keyboard, mouse, gamepad, touch (10 points), gyro/accel
  • Actions (boolean) + Axes (float) with many-to-many binding
  • Context-sensitive binding sets (Gameplay, Vehicle, Menu, Dialogue)
  • Haptics: rumble, adaptive triggers (DualSense), trigger vibration, light bar

UI System

Dual-mode: retained-mode game UI and immediate-mode debug overlay with SDF fonts.
GPU UIDesign
  • Retained-mode widgets with anchor + flex layout and style cascade
  • SDF font rendering, UTF-8, complex script shaping, CJK
  • Data binding decouples UI from gameplay
  • Aggressive batching: <20 draw calls per UI screen

GPU UI (P2/P3)

  • UIPrepare stage for geometry expansion and composition effects
  • Resolution-independent with DPI awareness
  • Console safe area compliance
  • Debug overlay (profiler, memory, console) stripped from shipping
🌐

Networking

Platform-agnostic transport with replication, prediction/rollback, and console backends.
Design
  • Transport: GameNetworkingSockets (PC), GDK (Xbox), PSN (PS5), NEX (Switch)
  • Client-server, listen server, P2P, dedicated server topologies
  • Reliable-Ordered, Reliable-Unordered, Unreliable channels
  • P2P host migration via independent peer consensus
🎥

Camera System

Priority stacking, trauma shake, cinematic rails, and photo mode.
Design
  • Multiple cameras with priority stacking and blend transitions
  • Controllers: Free-Look, Third-Person (spring-arm), First-Person, Cinematic
  • Trauma-based screen shake with Perlin noise and asymmetric profiles
  • Photo mode: freeze sim, detach camera, super-sample up to 4x
💾

Save System

Component-based serialization with forward-compatible versioning and async I/O.
Design
  • Schema migration chains for forward compatibility
  • LZ4 compression with integrity checksums
  • Async pipeline preserves frame pacing during saves
  • Platform-specific storage backends
🌍

Localization

Multi-language support with HarfBuzz shaping, RTL, and CJK glyph handling.
Design
  • HarfBuzz text shaping for Arabic, Thai, CJK scripts
  • CLDR pluralization rules
  • RTL paragraph layout with bidirectional text support
  • Per-language font atlases with lazy-loaded glyph pages
📈

Telemetry & Profiling

Opt-in profiling hooks with Tracy and per-pass GPU markers; PIX/RenderDoc integration for capture.
Core v1Live
  • Tracy profiler integration via shared trace hooks (opt-in via KAPI_TRACY_ENABLED)
  • Per-pass GPU timing through render graph queries
  • PIX and RenderDoc capture-friendly markers
  • Budget broker feed for performance adaptation (designed)

Plugin & Module System

Module loading with three modes and version-checked dependencies.
Core v1Live
  • Three modes: built-in, dynamic (dev), static (shipping)
  • DynamicModuleLibrary with native handle and version checks
  • Declarative dependencies and registration contracts

Roadmap

  • C ABI mod boundary for binary stability
  • Mod support layer (sandboxed scripting)
📥

I/O & Streaming

Async-first with DirectStorage, GPU-direct decompression, and predictive prefetch.
GPU-DirectDesign
  • DirectStorage (PC/Xbox) / PS5 I/O complex for GPU-direct reads
  • VFS with pak overlay priority: base → patch → DLC → loc → mods
  • Predictive prefetch based on camera velocity and frustum
  • Residency management with priority-based eviction

Event System

Lock-free MPSC ring queue primitive; full event-system layering on the roadmap.
Core v1Partial
  • Shared ring buffers eliminate per-subscriber copying
  • Frame-buffered visibility respects sync phases
  • UI event bubbling with parent-chain propagation
  • GPU event summary readback for simulation feedback

Resource System

Stable asset handles, dependency tracking, hot-reload, and thread-safe access.
Core v1Partial
  • Stable handles with dependency invalidation + hot-reload propagation
  • Pin/unpin semantics for residency control
  • Thread-safe read access contract
  • Runtime face of the content system between pipeline and gameplay

Configuration

Layered TOML config with runtime CVars and per-profile overrides.
Core v1Live
  • Four-layer hierarchy: engine → project → user → CLI overrides
  • Dynamic CVars with change callbacks for runtime quality switching
  • Per-profile rendering and memory budget configuration
  • TOML-based, version-controlled, human-readable

Performance Budgeting

Explicit frame budgets with runtime broker-driven degradation ladders.
Design
  • Per-subsystem CPU, GPU, memory, I/O, thermal envelopes
  • Degradation ladders with hysteresis to prevent oscillation
  • Cook-time validation against profile envelopes
  • Cross-system LOD policy with shared importance metric

Simulation Ownership

Ownership contracts preventing GPU readback stalls in gameplay-critical paths.
Design
  • CPU-owned, GPU-owned, and mirrored data product declarations
  • Six sync phases with explicit visibility rules
  • Summary components expose GPU results without bulk readback
  • No subsystem invents its own ownership vocabulary

Tools & Reference

Companion tools that let you drill through Kapi's execution layers, render passes, and runtime steps — without launching the engine.

Target Platforms

Live on Windows (DirectX 12 & Vulkan), macOS (Metal 4), Xbox Series X|S (GDK · D3D12X), and Linux (Vulkan). Remaining console matrix in development. Engine targets late 2027 release.

Live
💻
Windows PC
DirectX 12 · Vulkan
🌍
macOS
Metal 4 · MetalFX · RT
🎮
Xbox Series X|S
GDK · D3D12X
🐧
Linux
Vulkan
In Development