I’m not a developer or mathematician — I’m a systems administrator. I had an intuition about how meaning might organise itself through phase relationships and wave mechanics rather than vector distances. I collaborated with AI (Claude) to formalise and test those ideas, and the result aligned with published research by Listopad (2025) — Wave-Based Semantic Memory with Resonance-Based Retrieval (arXiv:2509.09691).
The core idea: semantic relationships encoded as harmonic waveforms on circular embeddings, with retrieval through constructive interference rather than cosine similarity.
Repo: GitHub - atech-hub/Wave-Coherence-as-a-Computational-Primitive: Harmonic coherence as a universal relationship detection operator
Sharing it here in case anyone working in embedding spaces or retrieval finds it interesting or wants to take it further.
Hi, I just upload a version of the tests in Phyton. I’m sure this will make easier for the comunity.
Regards
Update: we’ve added Test 21 which shows cosine similarity returning 0.0 where harmonic sweep finds 1.0 on controlled data. The open question is whether real model embeddings contain this harmonic structure. If anyone has run the sweep against actual embeddings, I’d genuinely like to know the result — positive or negative.
Update: We ran the test ourselves — results are positive.
We used all-MiniLM-L6-v2 (384 dimensions) and applied spectral coherence analysis to real embeddings across 44 words in 6 relationship groups.
Cosine similarity cannot distinguish antonyms from synonyms. Look at these scores:
┌────────────────────┬───────────┬────────┐
│ Pair │ Type │ Cosine │
├────────────────────┼───────────┼────────┤
│ big / large │ synonym │ 0.81 │
├────────────────────┼───────────┼────────┤
│ fast / slow │ antonym │ 0.75 │
├────────────────────┼───────────┼────────┤
│ big / small │ antonym │ 0.68 │
├────────────────────┼───────────┼────────┤
│ happy / joyful │ synonym │ 0.68 │
├────────────────────┼───────────┼────────┤
│ happy / sad │ antonym │ 0.37 │
├────────────────────┼───────────┼────────┤
│ banana / democracy │ unrelated │ 0.19 │
└────────────────────┴───────────┴────────┘
“big/small” (opposites) scores the same as “happy/joyful” (same meaning). “fast/slow” (opposites) scores higher than some synonyms. The model knows these words are related but cosine similarity cannot express how — it has one number for everything.
Spectral variance can. When we decompose the embeddings via FFT and measure coherence per frequency band (rather than summing everything into one dot product), the variance across bands is:
- Synonyms: 0.0031 (flat — uniformly coherent across all bands)
- Antonyms: 0.0094 (3x higher — coherent in some bands, anti-coherent in others)
- Unrelated: 0.0215 (7x higher — incoherent noise)
Synonyms are coherent everywhere. Antonyms are coherent in some frequency bands but opposed in others — that opposition is the information cosine similarity sums away.
Different relationship types have distinct spectral profiles. Hierarchical pairs (animal→dog, vehicle→car), functional pairs (doctor→hospital, chef→kitchen), and analogical pairs (king→queen, father→mother) each produce a different shape of coherence across frequency bands. These are relationship-type fingerprints that a single cosine score destroys by summing.
Nobody designed this model for harmonic structure — it learned band-specific coherence patterns through gradient descent on sentence similarity. Our harmonic analysis framework provides the tool to detect what’s there but invisible to the standard comparison measure.
The analysis script is available at: python/embedding_analysis.py in the GitHub - atech-hub/Wave-Coherence-as-a-Computational-Primitive: Harmonic coherence as a universal relationship detection operator . Run it yourself with pip install sentence-transformers && python embedding_analysis.py.
24 tests, 5 corrective findings, all passing. Tagged as Release Real Embedding Validation — Cosine Similarity Blindness Confirmed in Production Models · atech-hub/Wave-Coherence-as-a-Computational-Primitive · GitHub.
Update 2: We built a transformer without tokens — harmonic embeddings match trained baseline when completely frozen
Following our previous update where we showed cosine similarity is blind to harmonic structure in all-MiniLM-L6-v2 embeddings, we asked the next question: if the structure is already there in trained models, what happens when you provide it from the start?
The experiment
Three identical character-level transformers (4 layers, 128 dim, 4 heads) trained on Shakespeare. No tokenizer. No BPE. Raw characters mapped to phase angles on the unit circle, embedded via harmonic expansion: [cos(theta), sin(theta), cos(2theta), sin(2theta), …]
┌────────────────────────────────────────────────────────────────┬──────────┬─────────────┐
│ Mode │ Val Loss │ vs Baseline │
├────────────────────────────────────────────────────────────────┼──────────┼─────────────┤
│ Baseline — random Gaussian init, trainable (industry standard) │ 1.5570 │ — │
├────────────────────────────────────────────────────────────────┼──────────┼─────────────┤
│ Harmonic — phase-encoded init, trainable │ 1.5223 │ -2.2% │
├────────────────────────────────────────────────────────────────┼──────────┼─────────────┤
│ Frozen — phase-encoded, NOT trainable │ 1.5567 │ -0.02% │
└────────────────────────────────────────────────────────────────┴──────────┴─────────────┘
Why this matters
Harmonic beats random at every single checkpoint. From step 0 through step 5,000, the structured initialization leads and the gap never closes. The model starts closer to the answer because the geometric relationships between characters are built in, not discovered through gradient descent.
The frozen result is the headline. Zero gradient updates to the embedding layer. 40,768 fewer trainable parameters. The model generates coherent Shakespearean dialogue using nothing but fixed cos(n * theta) vectors. The geometry alone carries the signal.
This means the standard approach — initialize with random noise, burn GPU cycles for the model to discover structure — is doing unnecessary work. The structure the model converges toward (as we showed in our previous post with spectral analysis of all-MiniLM-L6-v2) can be provided for free by construction.
And no tokens were needed. 65 characters. 65 phase angles. 128-dimensional harmonic vectors. No vocabulary table, no tokenizer training, no subword merges. The question of “how many tokens” becomes “how many characters” — and the answer is: however many are in the text.
The three-test chain
┌───────────────────────────┬───────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ Test │ What it proved │
├───────────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Test 21 (synthetic) │ Cosine similarity is blind to harmonic structure — per-channel sweep recovers what dot product destroys │
├───────────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Test 24 (real embeddings) │ Real model vectors contain this structure — 3x spectral variance difference between synonyms and antonyms │
├───────────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Test 25 (this result) │ Providing the structure from the start beats learning it from random noise — and works even when frozen │
└───────────────────────────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────┘
Try it yourself
pip install torch --index-url https://download.pytorch.org/whl/cu128
cd python
python harmonic_transformer.py
Requires a CUDA GPU. Shakespeare dataset downloads automatically (~1MB). Trains in ~10 minutes on a consumer GPU.
All code is open source: GitHub - atech-hub/Wave-Coherence-as-a-Computational-Primitive: Harmonic coherence as a universal relationship detection operator , tagged as Release Harmonic Transformer — A Model That Doesn't Need To Learn Its Embeddings · atech-hub/Wave-Coherence-as-a-Computational-Primitive · GitHub .
25 tests, 5 corrective findings, all passing.
I wonder what harmonic decomposition would show on orbital frequency data.
Update — March 2026
The framework has reached its current ceiling at our available compute scale (RTX 4070 Ti, character-level Shakespeare):
-
Phase C result: 98.1% of MLP performance at 44% of parameters (Kerr-ODE + maestro bottleneck + progressive curriculum)
-
34 experimental phases, 64 defensive engine patterns, 7 corrective findings
-
Key findings since last update: wave-native FFN layers converge toward MLP with depth, a learned global coordination bottleneck (maestro) provides consistent improvement at all depths, and the ODE structure provides implicit regularisation — stable where MLP overfits at high parameter counts
-
Honest boundary: scaling beyond this point requires compute resources we don’t have. The architecture properties are measured and the blueprint is complete for anyone with a cluster to test at LLM scale.
Repository: https://github.com/atech-hub/Wave-Coherence-as-a-Computational-Primitive DOI: 10.5281/zenodo.18820365
Kerr Engine — Pure Rust training engine for Kerr-ODE transformers (3x faster than PyTorch at 128-dim, full GPU via WGSL)
Released the training engine for the Wave Coherence project as a standalone repo under Apache 2.0.
What it is: a specialised Rust engine for training and running Kerr-ODE transformers — the architecture that replaces dense MLP layers with physics-inspired wave propagation (98.1% of MLP at 44% of parameters).
Key numbers:
-
3x faster than PyTorch+CUDA at 128-dim, on CPU alone, GPU off
-
Full GPU backward pass at 768-dim via 13 WGSL compute shaders (NVIDIA, AMD, Intel, Apple — no CUDA dependency)
-
Hand-derived analytical gradients verified against PyTorch autograd (max diff 7.63e-6)
-
6,500 lines of Rust, 4 dependencies,
cargo build --releaseand you’re done
The WGSL backward pass shaders (attention backward, batched outer product, batched linear backward) appear to be the first open-source implementations of ML training backward passes in WGSL.
Repo: GitHub - atech-hub/kerr-engine · GitHub Parent project: GitHub - atech-hub/Wave-Coherence-as-a-Computational-Primitive: Harmonic coherence as a universal relationship detection operator · GitHub
Note: The Kerr-ODE is a novel architecture — it doesn’t work with LM Studio, Ollama, or llama.cpp today. The engine trains and runs inference natively. Ecosystem connector patterns are documented and published as prior art (Pattern 68) for anyone who wants to build bridges.
Built with AI collaboration (Claude Desktop + Claude Code). Stated openly.
Kerr-ODE: Full Stack Now Live — Train, Serve, Chat
Three updates since the last post:
1. Inference Server Released kerr-server — 640 lines of Rust, OpenAI-compatible API. Load a checkpoint, serve on localhost, connect any chat UI. SSE streaming, bearer token auth, temperature/top-k/top-p sampling.
Tested with LM Studio 0.4.6 via the openai-compat-endpoint plugin. Selected “gpt-4.1” from the model dropdown (the server doesn’t care what name the client sends — it serves whatever checkpoint is loaded). Typed “hello.” A 354K parameter Shakespeare-trained Kerr-ODE model responded with character-level Shakespeare fragments through the GPT-4.1 label. Not exactly GPT-4.1’s output, but the pipeline works end to end.
The full pipeline: train a model with the engine → serve it with kerr-server → chat through LM Studio or any OpenAI-compatible client. No custom clients needed.
2. Engine Performance: 13s → 1.72s at 768-dim Three new WGSL compute shaders (matvec_batch, layer_norm_batch, kerr_step_batch) batch all forward pass operations across positions in single dispatches. Forward pass went from 8 seconds to 500ms — 16x speedup. Total iteration from 13s to 1.72s. 17 shaders total, ~7,200 lines of Rust. GPU at 38%, 49°C, 1.2GB VRAM training a 12M parameter model on an RTX 4070 Ti.
3. BPE Tokenizer Support The engine and server now accept any HuggingFace tokenizer.json via --bpe flag. Qwen, Llama, GPT-2 — borrow their vocabulary, train your own Kerr-ODE model from scratch. No more character-level only.
The stack:
-
Wave Coherence — research framework, 68 defensive patents (MIT)
-
Kerr Engine — training, 17 WGSL shaders, 3x faster than PyTorch at 128-dim (Apache 2.0)
-
Kerr Server — inference, OpenAI-compatible API (Apache 2.0)
All open source. All documented. Contributor targets listed in both READMEs.
Wave Memory: Persistent Experience for Kerr-ODE Models
New investigation results and a fourth repo.
The concept: Model weights are the education — frozen, never change. A separate 1.5KB file stores accumulated harmonic band states — the experience. Each conversation shifts the Kerr-ODE’s starting position on the unit circle. Same model, different trajectory, different output. The model reads and writes memory in the same coordinate system it thinks in — no translation, no vector database, no RAG retrieval.
The mechanism: During inference, the ODE final states feed an exponential moving average. Bands consistently active across tokens accumulate. Bands that spike once and fade contribute nothing. At conversation end, the accumulator merges into the persistent file. Next conversation starts from a different position because of what came before.
5 experiments, 4 passes, 1 honest null:
| Experiment | Result |
|---|---|
| Injection sensitivity | Random noise at α=0.05 improves perplexity by 8.8% (stochastic resonance) |
| Accumulation stability | Converges over 20 conversations, growth rate 280%→9.5% |
| Topic separation (char-level) | NULL — captures corpus texture, not topic. Bounded by model capacity |
| Topic separation (word-level) | PARTIAL POSITIVE — love→"fair",“give thee” vs war→"dishonour",“death” from same prompt |
| Memory reset | Bit-identical to baseline after deletion |
| Anomaly detection | Spike caught immediately before affecting output |
The stochastic resonance finding is the unexpected one. Injecting random noise into ODE initial conditions makes the model generate better, not worse. The nonlinear Kerr dynamics (self-phase modulation, cross-phase coupling) use the perturbation constructively. Standard transformers degrade with any perturbation — they have no mechanism to exploit noise. The Kerr-ODE does.
The topic separation scaled with tokenisation: character-level saw only texture (0.987 correlation between love and war memories). Word-level saw tone — same top bands but 2x energy difference, reordered peaks, and measurably different generation. BPE with a larger model is predicted to enable full semantic separation.
The safety model: Delete the file → model returns to trained baseline (verified bit-identical). Inspect the file → harmonic census shows exactly what accumulated. The model weights never change during inference. Memory is experience, not education. Neither corrupts the other.
The repos:
-
kerr-memory — wave memory library, 920 lines Rust, zero dependencies (Apache 2.0)
-
kerr-server — serves models with
--memoryflag (Apache 2.0) -
kerr-engine — trains models that produce the checkpoints (Apache 2.0)
-
Wave Coherence — research framework, 70 defensive patents (MIT)
Full pipeline works: train → serve → chat → accumulate → save → inspect. All Rust, all open source.
**Wave-Engine: New Architecture, Honest Numbers
**
The kerr-engine and kerr-server are now parked. Two new repos replace them:
-
wave-engine — training (Apache 2.0)
-
wave-server — inference, OpenAI-compatible API with KV-cache (Apache 2.0)
The kerr repos stay public as historical reference. Their READMEs point to the new repos. Everything validated in kerr-engine carries forward — maestro dim=16, curriculum training, stochastic resonance, all of it.
What changed and why
The kerr-engine proved the core concept (98.1% of MLP at 44% params). But three architectural limits showed up during scaling:
-
Sequential blocks — attention had to finish before FFN could start. Wave-engine uses GPT-J parallel blocks: attention and FFN read the same normalised input, run simultaneously, outputs sum into the residual stream.
-
Trained attention — standard dot-product attention consumed parameters and compute. Wave-engine uses frozen harmonic coherence attention — phase-based scoring from a mathematical structure, zero attention parameters trained. The attention pattern is determined entirely by harmonic embedding geometry.
-
RK4-16 ODE — 16 sequential integration steps per layer, not parallelisable. Wave-engine uses a perturbative ODE inspired by techniques from telecom fiber optics DSP. Single-pass analytical computation. 14x fewer arithmetic operations, MSE 0.000005 vs RK4-16. Trains better because the true gradient flows through the perturbative computation, not an identity backward approximation.
The numbers
Training tiers from a single Rust binary, single cargo run command:
| Tier | Flag | Loss @ 199 | Speed | Params | Hardware |
|---|---|---|---|---|---|
| CPU | (none) | 2.52 | 520ms/iter | 2.63M | Any computer |
| wgpu GPU | --gpu |
2.52 | 520ms/iter | 2.63M | Any GPU (Vulkan/Metal/DX12) |
| Candle CUDA | --candle |
2.81 | 213ms/iter | 657K | NVIDIA only |
Measured March 22 2026: 4 layers, seq=64, batch=4, 200 iters, Shakespeare, no curriculum, RTX 4070 Ti.
CPU and wgpu produce identical loss at every single iteration — same init, same math. Candle is 2.4x faster with block-diagonal output projection (6 groups of 128×128) and perturbative ODE — 4x fewer FFN parameters, faster convergence, slightly higher loss from cosine LR warmup. VRAM rock-solid at 1329MB.
All three tiers produce compatible checkpoints. A model trained on CPU can be served from GPU, or vice versa. The wgpu tier runs on AMD, Intel, Apple Silicon — no CUDA required.
One honest null
We built a hybrid converter — take Qwen 2.5 0.5B, keep the attention layers, replace the MLP with our ODE layers, distill. The idea: maybe trained MLP weights contain hidden wave structure we can tap into.
Ran SVD and DFT analysis on all 72 weight matrices across 24 layers. The answer: no. Full effective rank 896/896 everywhere. Flat frequency spectrum (33/33/33% low/mid/high). No near-identity layers. No structured frequency content to exploit.
The “translate existing model to waves” path doesn’t exist. Wave-engine models need to be trained from scratch. The efficiency gains come from learning a different representation, not compressing an existing one. Finding archived. Hybrid experiment parked.
What’s next
Training a wave-engine model on diverse real English with BPE tokenization and education curriculum (grammar → children’s literature → general English → domain-specific). The architecture is validated. The infrastructure is built. The next question is whether it generates coherent text at 24 layers on real data.
80 defensive patterns now published in the research repo under MIT. Everything that makes the engine work is documented and open.
The stack
| Repo | What | License |
|---|---|---|
| Wave Coherence | Research framework, 80 defensive patterns | MIT |
| wave-engine | Training, 3 tiers, perturbative ODE, block-diagonal | Apache 2.0 |
| wave-server | Inference, OpenAI-compatible API, KV-cache, wave memory | Apache 2.0 |
| kerr-memory | Wave memory library, works with both engines | Apache 2.0 |
| kerr-engine | Original prototype (parked, historical) | Apache 2.0 |
| kerr-server | Original server (parked, historical) | Apache 2.0 |
No Python. No pip. No CUDA toolkit required. One cargo build --release, one cargo run.
I understand. It is because the system has no “North Star”. Look into the Life-First Decision Invariant (LFDI) protocol and try that and let me know if it then works better. I have had 3am text here telling me, “it is working!!!”, from an MIT mathematician who was involved with scaleAI and openAI. ![]()