Szukam feedbacku — własna reprezentacja semantyczna "bryła" dla małych modeli

Cześć! Jestem Krzysiek — samouk, pracuję nad tym po nocnych
zmianach, sam, od grudnia 2025.

Zbudowałem coś co nazywam “bryłami” — zamiast podawać
modelowi surowy tekst, parser rozkłada zdanie na obiekty
z 20 tokenami metadanych: kolor emocji, pilność, siła
intencji, źródło informacji, ważność, kontynuacja tematu,
relacje między elementami.

Wyniki (ten sam Transformer 53M, ten sam korpus 1.92MB,
te same hiperparametry):

  • v7 (surowy tekst): val_ppl = 31.06
  • v8 (+ affect/is_core): val_ppl = 24.12
  • v9 (+ pragmatyki): val_ppl = 24.02

22% poprawy bez zmiany architektury modelu — tylko
bogatsze wejście. Wszystko na RTX 2060 12GB.

Hipoteza: “bryła wozi gotowe znaczenie, żeby model
mniej liczył od nowa.”

Szukam:

  • Feedbacku — czy to ma sens? Co poprawić?
  • Ludzi z podobnymi problemami na słabym sprzęcie
  • Pomysłów na zastosowania których nie widzę
  • Może współpracy

Nie jestem ekspertem. Nie udaję. Uczę się i eksperymentuję.
Jeśli robisz coś podobnego — odezwij się!

1 Like

Seems like a promising approach for now:


My view

Yes — your idea makes sense.

Not as “I solved meaning,” but as:

“I gave a small model a better, more structured input, so it had less hidden work to do.”

That is a real research direction. Recent work like L2T and Structural Guidance for Transformer Language Models argues that models can learn better when structure is made more explicit instead of leaving everything inside raw next-token prediction. (arxiv.org)

What bryła seems to be

To me, bryła looks like a mix of three things:

  • semantic representation — who did what, what relates to what
  • discourse representation — what continues the topic, what is central
  • control information — urgency, emotion, strength of intent, source

That mix is interesting because standard meaning-representation work like AMR and UMR focuses more on events, arguments, coreference, time, and modality than on things like urgency or emotional color. So bryła is not just “another AMR clone.” It seems broader and more practical. (aclanthology.org)

What your result probably means

Your result probably means:

“For this model and this dataset, bryła makes the learning problem easier.”

That is already a strong result.

I would be careful with a stronger claim like “the model understands better,” because when the representation changes, plain token-level perplexity can become hard to compare directly. The Paloma benchmark paper explicitly warns that perplexity is tied to tokenization. (arxiv.org)

So I would say:

  • your result is real
  • your result is promising
  • your result is not yet final proof of better semantic understanding

What looks strongest in your numbers

This part stands out:

  • v7 → v8: big gain
  • v8 → v9: small gain

That usually means the first added signals did most of the work.

So my first guess would be:

  • affect
  • core / salience
  • maybe basic discourse structure

are doing more than the later pragmatic additions.

That is good news. It suggests you may already have the important part, and the next step is probably simplifying, not adding more tags.

Why it could work

I think bryła may help in four simple ways.

1. Less ambiguity

Different sentences with similar meaning may become more similar after parsing. That makes learning easier. This is one of the classic reasons people use meaning representations like AMR. (aclanthology.org)

2. Hidden information becomes visible

Small models often struggle to infer things like source, salience, continuity, or intent from a tiny corpus. bryła exposes those signals directly. That is very close to the logic behind structured pretraining and control-style conditioning. (arxiv.org)

3. Shorter path to useful patterns

Instead of forcing the model to discover everything from surface text, you hand it some of the structure up front. That is exactly the kind of shortcut that can help small models more than big ones. (aclanthology.org)

4. Better controllability

Some of your fields are not only “meaning.” They are also useful control variables. Recent work on continuous control signals is relevant here. (arxiv.org)

What I would improve next

These would be my top priorities.

1. Find out which tags matter most

Do ablations:

  • remove affect
  • remove is_core
  • remove urgency
  • remove source
  • remove topic continuation
  • remove relations

Right now the biggest unanswered question is:

Which part of bryła is doing the real work?

2. Test raw text + bryła together

Do not test only:

  • raw text
  • bryła only

Also test:

  • raw + bryła

My guess is that this may become your best setup. Recent structured-pretraining work points more toward hybrid setups than total replacement of raw text. (arxiv.org)

3. Add one or two real tasks

Not only validation perplexity.

Try tasks where your metadata should help:

  • complaint vs request
  • urgency classification
  • source-aware summarization
  • semantic retrieval
  • dialogue-state tracking

That will make your claim much stronger.

4. Test noise

Corrupt some bryła tags on purpose.

If performance crashes instantly, the system may be brittle.
If it degrades slowly, that is much better evidence.

5. Try continuous values for some fields

Fields like:

  • urgency
  • importance
  • strength of intent
  • emotional intensity

may work better as numbers or continuous embeddings than as discrete labels. That is a real open question, and recent control-signal work suggests it is worth testing. (arxiv.org)

What I would be careful about

1. Perplexity comparisons

If tokenization or serialization changed, the comparison may be less clean than it looks. That does not make the gain fake, but it changes how strong the conclusion is. (arxiv.org)

2. Shortcut labels

Some fields might act like hints or labels rather than general semantics.

3. Parser errors

If the parser is noisy, the whole system becomes noisy. The AMR/LLM literature shows that parser-generated structures can create cascading errors. (aclanthology.org)

4. Overclaiming

I would avoid saying:

  • “I built a universal semantic representation”
  • “I proved symbolic structure is better than raw text”

I would say:

  • “I built a semantic-pragmatic representation that seems to improve learning efficiency for small models.”

That is a strong claim and much easier to defend.

Where people like this gather

For people working on related ideas with limited resources, I would watch:

  • BabyLM — best match for small-model, limited-data thinking (babylm.github.io)
  • LoResLM — closer to low-resource language-model work (aclanthology.org)
  • DMR — best fit if you want feedback on the representation itself (aclanthology.org)

Best use cases

I think bryła may be especially good for:

  • support / triage
  • source-aware summarization
  • semantic search
  • dialogue memory / topic continuity
  • low-resource or narrow-domain assistants

Why these? Because your tags are about importance, source, continuity, and intent — and those matter a lot in these settings. The AMR applications literature also shows that structured meaning is often most useful in targeted downstream tasks rather than everywhere equally. (aclanthology.org)

Bottom line

My simple verdict:

  • The idea makes sense
  • The result is strong enough to be taken seriously
  • The safest claim is “better structured supervision for small models,” not “solved meaning”
  • Your next step should be ablations, hybrid input, and task-based evaluation

That is the path from “interesting experiment” to “credible result.”

Thank you for this incredible analysis. This is exactly
the kind of feedback I was hoping for.

Your priorities match my intuition — I suspected that
affect and is_core do most of the work, and I’m planning
ablation experiments to confirm this.

The hybrid approach (raw + bryla) is something I haven’t
tried yet — that’s now high on my list.

I’ll look into BabyLM and the papers you referenced.
This gives me a clear roadmap.

Really appreciate you taking the time.

1 Like