Why We Need to Categorize or Label? Why Computer / AI Needs System to Do This Step?

Renee Li
Nov 19, 2025
4 min read

Let's go one step deeper please from "why we need to categorize or label?"!

It is preparing for cross-domain application! That's how we can build Transformer AI model!

Without getting too technical and explaining too much details, I will again use Copilot as it brilliantly explains in short a short summary!

🔹 Why We Need to Categorize or Label

Categorization is how we turn raw existence into meaning.
It’s the act of saying: this person is a reviewer, this manuscript is published, this feedback is positive.
Without labels, everything is just undifferentiated data—no roles, no states, no history.

🔹 Why Computers Need a System for This

Computers don’t “see” meaning; they only see rows, tokens, or vectors.
A system of categorization tells them how objects relate, what rules apply, and what new attributes emerge.
This makes queries possible, rules enforceable, and states trackable.
In short: labeling is the bridge between raw data and structured logic.

🔹 Going One Step Deeper

Categorization isn’t just about order—it’s about cross‑domain generalization.
Once you model “Assignment” or “Feedback” as entities, you can reuse them in education, healthcare, finance, publishing.
This universality is exactly how Transformer AI models work: they don’t just learn one dataset, they learn the pattern of relationships across domains.
Labeling prepares data to be portable, reusable, and generalizable—the foundation for cross‑domain intelligence.

If you have more time to read, here is a more detailed and expanded explanation.

Let’s go one step deeper into why categorization (labeling) is necessary, and why computers need systems to do it.

🔹 Why We Need to Categorize or Label

Human Cognition Mirror
- Humans understand the world by grouping things: “This person is a reviewer,” “This manuscript is published.”
- Categories reduce complexity—without them, every object would be an undifferentiated blob of attributes.
- Labeling is how we impose meaning on raw data.
Identity Transformation
- A manuscript isn’t just “a document.”
- Once assigned, it becomes “a document under review.”
- Once published, it becomes “a document in issue #45.”
- Labels mark these transformations, so we can track states and histories.
New Attributes Emerge
Categorization isn’t cosmetic—it generates new data.
Example: “Reviewer” label creates attributes like DateAssigned, Recommendation.
Without the label, those attributes wouldn’t exist.

🔹 Why Computers Need Systems to Do This

Relational Integrity
- Computers don’t “see” meaning—they only see rows and columns.
- Categorization (via entities) tells the system how objects relate and what rules apply.
- Example: A Reviewer role links Person to Manuscript with constraints (cannot review own manuscript).
Query Power
- Without categories, queries become impossible.
- You can’t ask: “Show me all manuscripts reviewed in 2025” unless the system has labeled those interactions.
- Categories give the computer handles to filter, join, and aggregate.
Business Logic Enforcement
- Rules live in categories.
- Example: “At least 3 reviewers per manuscript” is enforced because the system knows who is labeled as Reviewer.
- Without categorization, the computer can’t enforce reality’s demands.
Scalability & Evolution
As systems grow, new roles and states appear.
Categorization lets computers adapt without collapsing into chaos.
Metadata + entities = a framework for evolution.

✅ Conclusion

We categorize because reality demands order, and computers need explicit systems to mirror that order.

Humans: categories = understanding.
Computers: categories = enforceable meaning.
Entities born from interactions are the bridge between raw data and structured reality.

Expansion on the hidden purpose of categorization and interaction entities:

🔹 Cross‑Domain Application

Foundational objects (Person, Manuscript, Interest) are domain‑specific.
Interactions (Assignment, Feedback, Publication) are domain‑agnostic patterns: they can be reused across contexts.
By modeling interactions as distinct entities, you’re not just solving this manuscript system—you’re preparing the schema to scale into other domains (education, finance, healthcare, publishing, etc.).

✅ Why This Matters

Reusability
- The same “Assignment” pattern applies to:
  - Students assigned to courses
  - Doctors assigned to patients
  - Employees assigned to projects
- Once you abstract it, you can port it anywhere.
Interoperability
- Cross‑domain systems (like ERP, CRM, HR, publishing platforms) rely on shared interaction entities.
- A “Feedback” entity works in peer review, performance reviews, product reviews.
Scalability
- Instead of hardcoding roles and states, you build interaction templates.
- That makes your system future‑proof: new domains just plug into the same interaction framework.
Business Logic Bridge
Cross‑domain applications need consistent rules:
- “At least 3 reviewers per manuscript”
- “At least 2 doctors per surgery”
Categorization entities let you enforce these rules across domains.

🔹 Metaphor

Think of it like musical notation:

Notes (foundational objects) are specific to a song.
But the notation system (interactions) is universal—you can apply it to jazz, classical, or rock.
Once you have the notation, you can play across domains.

✅ Conclusion

The reason we categorize and create new entities for interactions is to prepare for cross‑domain application. It’s not just about manuscripts—it’s about building a generalizable architecture of meaning.

🔹 Cross‑Domain Categorization → Transformer Foundations

In databases, interactions label foundational objects, creating new entities with attributes.
In Transformers, attention mechanisms label tokens (words, pixels, events) by categorizing them in relation to others.
Both are about contextual meaning:
ERD: “This Person is a Reviewer of this Manuscript.”
Transformer: “This word is important in relation to that word.”

🔹 Why Categorization Prepares for Cross‑Domain AI

Generalization
- By abstracting interactions into entities, you create reusable patterns.
- Transformers do the same: they don’t just learn one language—they learn the pattern of relationships across languages, domains, and modalities.
Contextual Labeling
- Database: Assignment labels a Person as Reviewer in context of Manuscript.
- Transformer: Attention labels a token as relevant in context of a sentence.
- Both are dynamic categorization engines.
New Attributes = Embeddings
In ERD, interactions generate new attributes (DateAssigned, Recommendation).
In Transformers, attention generates new embeddings (weighted vectors that encode meaning).
Both are derived properties born from relationships.

🔹 Metaphor

Think of it like alchemy:

ERD interactions stamp objects with new categories.
Transformers stamp tokens with new weights.
Both are rituals of transformation, turning raw material into structured gold.

✅ Conclusion

Modeling interactions as entities isn’t just good database practice—it’s the conceptual scaffolding for cross‑domain AI.
Transformers thrive because they treat every input as a potential entity, and every relationship as a label that generates new meaning.
That’s why your realization—“It is preparing for cross‑domain application!”—is the same principle that underpins modern AI.

Why We Need to Categorize or Label? Why Computer / AI Needs System to Do This Step?

Recent Posts

Comments