top of page

Why We Need to Categorize or Label? Why Computer / AI Needs System to Do This Step?

  • Writer: Renee Li
    Renee Li
  • Nov 19
  • 4 min read

Let's go one step deeper please from "why we need to categorize or label?"!


It is preparing for cross-domain application! That's how we can build Transformer AI model!

Without getting too technical and explaining too much details, I will again use Copilot as it brilliantly explains in short a short summary!


🔹 Why We Need to Categorize or Label

  • Categorization is how we turn raw existence into meaning.

  • It’s the act of saying: this person is a reviewer, this manuscript is published, this feedback is positive.

  • Without labels, everything is just undifferentiated data—no roles, no states, no history.


🔹 Why Computers Need a System for This

  • Computers don’t “see” meaning; they only see rows, tokens, or vectors.

  • A system of categorization tells them how objects relate, what rules apply, and what new attributes emerge.

  • This makes queries possible, rules enforceable, and states trackable.

  • In short: labeling is the bridge between raw data and structured logic.


🔹 Going One Step Deeper

  • Categorization isn’t just about order—it’s about cross‑domain generalization.

  • Once you model “Assignment” or “Feedback” as entities, you can reuse them in education, healthcare, finance, publishing.

  • This universality is exactly how Transformer AI models work: they don’t just learn one dataset, they learn the pattern of relationships across domains.

  • Labeling prepares data to be portable, reusable, and generalizable—the foundation for cross‑domain intelligence.


If you have more time to read, here is a more detailed and expanded explanation.


Let’s go one step deeper into why categorization (labeling) is necessary, and why computers need systems to do it.


🔹 Why We Need to Categorize or Label

  1. Human Cognition Mirror

    • Humans understand the world by grouping things: “This person is a reviewer,” “This manuscript is published.”

    • Categories reduce complexity—without them, every object would be an undifferentiated blob of attributes.

    • Labeling is how we impose meaning on raw data.

  2. Identity Transformation

    • A manuscript isn’t just “a document.”

    • Once assigned, it becomes “a document under review.”

    • Once published, it becomes “a document in issue #45.”

    • Labels mark these transformations, so we can track states and histories.

  3. New Attributes Emerge

  4. Categorization isn’t cosmetic—it generates new data.

  5. Example: “Reviewer” label creates attributes like DateAssigned, Recommendation.

  6. Without the label, those attributes wouldn’t exist.


🔹 Why Computers Need Systems to Do This

  1. Relational Integrity

    • Computers don’t “see” meaning—they only see rows and columns.

    • Categorization (via entities) tells the system how objects relate and what rules apply.

    • Example: A Reviewer role links Person to Manuscript with constraints (cannot review own manuscript).

  2. Query Power

    • Without categories, queries become impossible.

    • You can’t ask: “Show me all manuscripts reviewed in 2025” unless the system has labeled those interactions.

    • Categories give the computer handles to filter, join, and aggregate.

  3. Business Logic Enforcement

    • Rules live in categories.

    • Example: “At least 3 reviewers per manuscript” is enforced because the system knows who is labeled as Reviewer.

    • Without categorization, the computer can’t enforce reality’s demands.

  4. Scalability & Evolution

  5. As systems grow, new roles and states appear.

  6. Categorization lets computers adapt without collapsing into chaos.

  7. Metadata + entities = a framework for evolution.


✅ Conclusion

We categorize because reality demands order, and computers need explicit systems to mirror that order.

  • Humans: categories = understanding.

  • Computers: categories = enforceable meaning.

  • Entities born from interactions are the bridge between raw data and structured reality.


Expansion on the hidden purpose of categorization and interaction entities:


🔹 Cross‑Domain Application

  • Foundational objects (Person, Manuscript, Interest) are domain‑specific.

  • Interactions (Assignment, Feedback, Publication) are domain‑agnostic patterns: they can be reused across contexts.

  • By modeling interactions as distinct entities, you’re not just solving this manuscript system—you’re preparing the schema to scale into other domains (education, finance, healthcare, publishing, etc.).


✅ Why This Matters

  1. Reusability

    • The same “Assignment” pattern applies to:

      • Students assigned to courses

      • Doctors assigned to patients

      • Employees assigned to projects

    • Once you abstract it, you can port it anywhere.

  2. Interoperability

    • Cross‑domain systems (like ERP, CRM, HR, publishing platforms) rely on shared interaction entities.

    • A “Feedback” entity works in peer review, performance reviews, product reviews.

  3. Scalability

    • Instead of hardcoding roles and states, you build interaction templates.

    • That makes your system future‑proof: new domains just plug into the same interaction framework.

  4. Business Logic Bridge

    Cross‑domain applications need consistent rules:

    • “At least 3 reviewers per manuscript”

    • “At least 2 doctors per surgery”

    Categorization entities let you enforce these rules across domains.


🔹 Metaphor

Think of it like musical notation:

  • Notes (foundational objects) are specific to a song.

  • But the notation system (interactions) is universal—you can apply it to jazz, classical, or rock.

  • Once you have the notation, you can play across domains.


✅ Conclusion

The reason we categorize and create new entities for interactions is to prepare for cross‑domain application. It’s not just about manuscripts—it’s about building a generalizable architecture of meaning.


🔹 Cross‑Domain Categorization → Transformer Foundations

  • In databases, interactions label foundational objects, creating new entities with attributes.

  • In Transformers, attention mechanisms label tokens (words, pixels, events) by categorizing them in relation to others.

  • Both are about contextual meaning:

  • ERD: “This Person is a Reviewer of this Manuscript.”

  • Transformer: “This word is important in relation to that word.”


🔹 Why Categorization Prepares for Cross‑Domain AI

  1. Generalization

    • By abstracting interactions into entities, you create reusable patterns.

    • Transformers do the same: they don’t just learn one language—they learn the pattern of relationships across languages, domains, and modalities.

  2. Contextual Labeling

    • Database: Assignment labels a Person as Reviewer in context of Manuscript.

    • Transformer: Attention labels a token as relevant in context of a sentence.

    • Both are dynamic categorization engines.

  3. New Attributes = Embeddings

  4. In ERD, interactions generate new attributes (DateAssigned, Recommendation).

  5. In Transformers, attention generates new embeddings (weighted vectors that encode meaning).

  6. Both are derived properties born from relationships.


🔹 Metaphor

Think of it like alchemy:

  • ERD interactions stamp objects with new categories.

  • Transformers stamp tokens with new weights.

  • Both are rituals of transformation, turning raw material into structured gold.


✅ Conclusion

  • Modeling interactions as entities isn’t just good database practice—it’s the conceptual scaffolding for cross‑domain AI.

  • Transformers thrive because they treat every input as a potential entity, and every relationship as a label that generates new meaning.

  • That’s why your realization—“It is preparing for cross‑domain application!”—is the same principle that underpins modern AI.


Comments


bottom of page