Add frontend stress test results and sample files to model card
Browse files
README.md
CHANGED
|
@@ -92,6 +92,33 @@ The raw frankenmerge had code formatting issues (garbled code blocks, missing br
|
|
| 92 |
|
| 93 |
Three programming tests still fail on the healed version: one function naming issue, one missing JS paren, and one that doesn't produce a code block for pytest generation. These are residual formatting artifacts from the merge.
|
| 94 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 95 |
## Architecture
|
| 96 |
|
| 97 |
| Property | Value |
|
|
|
|
| 92 |
|
| 93 |
Three programming tests still fail on the healed version: one function naming issue, one missing JS paren, and one that doesn't produce a code block for pytest generation. These are residual formatting artifacts from the merge.
|
| 94 |
|
| 95 |
+
## Frontend Code Generation — Stress Test Results
|
| 96 |
+
|
| 97 |
+
We put the healed model through a rigorous frontend stress test: 6 increasingly complex HTML/CSS/JS generation tasks, each requiring thousands of tokens of structurally valid code output. The results speak for themselves:
|
| 98 |
+
|
| 99 |
+
| Test | What We Asked For | Checks Passed | Output Size |
|
| 100 |
+
|---|---|---|---|
|
| 101 |
+
| Weather Dashboard | Responsive dashboard, CSS vars, dark mode toggle, 5-day forecast grid | **9/9** | 14.5K chars |
|
| 102 |
+
| E-Commerce Product Page | Image gallery, color swatches, quantity selector, tabbed content, sticky mobile bar | **12/12** | 16.7K chars |
|
| 103 |
+
| Animated SaaS Landing | Moving gradient, typing animation, IntersectionObserver scroll reveals, auto-rotating testimonial carousel, 3 pricing tiers | **13/13** | 24.1K chars |
|
| 104 |
+
| Analytics Dashboard | SVG bar chart with tooltips, SVG donut chart, sortable data table, collapsible sidebar, dark theme | **13/13** | 22.3K chars |
|
| 105 |
+
| Multi-Step Registration | 3-step form wizard, real-time validation, password strength meter, state dropdown, animated transitions, success modal | **12/12** | 23.3K chars |
|
| 106 |
+
| Snake Game | Canvas game loop, arrow key controls, collision detection, localStorage high score, increasing difficulty | **11/12** | 11.2K chars |
|
| 107 |
+
|
| 108 |
+
**62/63 total checks passed (98.4%)**
|
| 109 |
+
|
| 110 |
+
Every single output had:
|
| 111 |
+
- **Perfectly balanced CSS braces** (zero imbalance across all 6 files)
|
| 112 |
+
- **Perfectly balanced JS parentheses** (zero imbalance across all 6 files)
|
| 113 |
+
- **Zero garbled or hallucinated text**
|
| 114 |
+
- **Working JavaScript** — dark mode toggles, IntersectionObserver animations, SVG chart rendering, form validation, canvas game loops
|
| 115 |
+
|
| 116 |
+
The only miss: the Snake game had a minor closing tag typo (`html>` instead of `</html>`) at the very end.
|
| 117 |
+
|
| 118 |
+
This is remarkable for a frankenmerge of two 9B models with only 1000 steps of QLoRA healing. The model is producing **production-quality frontend code** — not just syntactically valid HTML, but sophisticated interactive applications with modern CSS (Grid, Flexbox, custom properties, keyframe animations) and non-trivial JavaScript (IntersectionObserver, requestAnimationFrame game loops, real-time form validation, SVG chart generation).
|
| 119 |
+
|
| 120 |
+
All 6 sample HTML files are included in the `samples/` directory of this repo — download them and open in a browser to see for yourself.
|
| 121 |
+
|
| 122 |
## Architecture
|
| 123 |
|
| 124 |
| Property | Value |
|