The full โ€œtrigger wordโ€ or keyword lists used by Big Tech arenโ€™t publicly published – that is what makes them secret. But we do know quite a bit from leaks, academic studies, content moderation policy documents, and transparency reports.

Hereโ€™s a clear breakdown ๐Ÿ‘‡

๐Ÿงฑ 1. The reality: Lists exist, but theyโ€™re proprietary

Each major platform (Meta, X/Twitter, YouTube, TikTok, LinkedIn, etc.) maintains internal keyword or phrase lists used by:

  • AI classifiers (for initial automated detection)
  • Heuristic / rules engines (for hard filters)
  • Safety pipelines (to identify hate speech, illegal content, misinformation, adult content, etc.)

These lists are enormous, multi-language, and constantly evolving.

They are never fully published because doing so would:

  • Enable evasion and โ€œgaming the systemโ€
  • Create PR controversies over politically or culturally sensitive inclusions
  • Expose moderation bias or state-requested influence

So, officially, the companies only describe categories of content moderated, not the actual trigger terms.

๐Ÿข 2. Partial transparency sources (what is known)

The following are legitimate published or leaked clues:

Meta (Facebook, Instagram):

  • โ€œDangerous Individuals and Organizations Listโ€ (leaked in 2021 by The Intercept): included thousands of groups or individuals tied to hate, terrorism, militias, etc. This list functions like a keyword/entity trigger.
  • Community Standards page details banned categories (violence, hate speech, etc.), though not the exact words.
  • Internal moderation manuals (Project Daisy, leaked to Motherboard) show patterns, not terms โ€” e.g., slurs + target class = violation.

Twitter/X (before 2023):

  • Transparency reports referenced โ€œHateful Conductโ€ and โ€œMisinformationโ€ classifiers that use internal term banks.
  • Independent researchers like the Stanford Internet Observatory have identified sample keyword clusters via the API (before rate limits changed).

YouTube / Google:

  • Has published research on โ€œclassifier fairnessโ€ and word embeddings used in moderation (e.g., Google Researchโ€™s Perspective API) โ€” some training lexicons are public Perspective API toxic terms dataset on GitHub.
  • โ€œHate and harassmentโ€ categories use lists of slurs + context detection models like BERT fine-tuned on Jigsaw Toxic Comment Dataset.

TikTok:

  • Leaked internal moderation rules in 2020 (The Guardian, Intercept): โ€œDo not recommendโ€ list included political phrases, LGBTQ terms (later revised), Tiananmen references, and other โ€œsensitiveโ€ content.
  • Keyword-level suppression lists were country-specific.

Microsoft / LinkedIn:

  • Uses blocklists for โ€œprofanity, hate terms, and unsafe domains.โ€ Internal terms not public; however, researchers found that flagging strongly correlates with the Hatebase lexicon and Googleโ€™s โ€œbadwordโ€ open list.

Governments:

  • In some regions (China, Russia, Turkey, EU DSA enforcement desks), governments provide official โ€œblocklistsโ€ or terms to monitor.
  • The Global Internet Forum to Counter Terrorism (GIFCT) maintains shared hash databases of terrorist content and associated keywords, which member companies use collaboratively.

๐Ÿงฉ 3. Open-source and academic mirrors

Academics and civil society groups maintain approximate keyword sets used to model Big Tech moderation:

DatasetFocusPublisher
Hatebase LexiconGlobal multilingual hate speechHatebase NGO
Googleโ€™s Perspective APIToxicity / harassment modelingJigsaw (Alphabet)
CM Hate Speech Words fileOffensive word listOpen-source NLP
Facebook Dangerous Orgs listTerror/ExtremismThe Intercept (leak)
Disinfo Keyword Set โ€“ EUvsDisinfoDisinformation patternsEuropean External Action Service
GDELT / Media Cloud term monitoring setsEvent/keyword trackingGDELT Project

These arenโ€™t the same internal lists the platforms use, but they give you a reasonable map of the categories that attract algorithmic scrutiny.


โš–๏ธ 4. Whatโ€™s โ€œflaggyโ€ in current systems

Across platforms, the types of words or phrases that are highly weighted in classifiers include:

  • Explicit sexual content (esp. minors, body parts, sexual offers)
  • Violent or self-harm instructions
  • Slurs or ethnic / religious insults
  • Political crisis nouns (โ€œelection rigged,โ€ โ€œbioweapon,โ€ โ€œcrisis actor,โ€ etc.)
  • Medical claims (especially COVID, vaccines, reproductive health)
  • Conspiracy signifiers (โ€œplandemic,โ€ โ€œnew world order,โ€ etc.)
  • Weaponized language (call-to-violence verbs, โ€œgo attack,โ€ etc.)

Again โ€” itโ€™s context + keyword + account history that decide if itโ€™s flagged.


๐Ÿงญ 5. How to map โ€œtrigger profilesโ€ safely

For activists or alternative-media projects (like your Digital Freedom Awareness Week), a responsible approach is:

  1. Use context-sensitive testing โ€” small posts, slightly altered phrasing, monitor engagement and label behaviors.
  2. Compare safe vs. flagged variants to map edge boundaries.
  3. Archive key phrases and timestamps privately (donโ€™t publish full flag words).
  4. Crossโ€‘post through federated or open platforms (Mastodon, Odysee, Matrix) to gauge normalized reach.

Reference sheet:

Open-source and partially leaked โ€œtriggerโ€ keyword resources by category

Note: Platforms do not publish their internal trigger lists. The sources below are open datasets, research corpora, and notable leaks that approximate what gets flagged. Use these for awareness and testing, not as definitive platform lists.

Hate speech and harassment

  • Hatebase Lexicon (multilingual)
  • What it is: Crowdsourced/global lexicon of hate terms with context metadata (language, region, target).
  • Use: Build awareness of slur categories and regional variants.
  • Link: Hatebase
  • Jigsaw/Google Perspective API
  • What it is: Models for toxicity, insults, identity attack, threats, sexual explicitness; example word lists appear in papers/repos.
  • Use: Test phrasing through the API; inspect open example lists from research community.
  • Link: Perspective API
  • Toxic Comment Classification (Kaggle/Jigsaw)
  • What it is: Public dataset used to train toxicity classifiers with labeled text.
  • Use: See labeled examples of what models deem toxic.
  • Link: Kaggle competition
  • Hate Speech and Offensive Language Lexicons
  • What it is: Various open-source lists used in NLP.
  • Links:
  • HateXplain dataset
  • LDNOOBW โ€œbad wordsโ€ list
  • Offensive/abusive lexicons collection

Extremism, terrorism, โ€œdangerous organizationsโ€

  • Meta โ€œDangerous Individuals and Organizationsโ€ list (leaked)
  • What it is: 2021 leak of Facebookโ€™s internal list used for enforcement categories (terror, hate, militarized social movements).
  • Caveat: Historical snapshot; likely changed since.
  • Link: The Intercept report and document
  • GIFCT (Global Internet Forum to Counter Terrorism)
  • What it is: Shared industry hash database of terrorist/violent extremist content; some taxonomies are public.
  • Use: Understand shared enforcement frameworks (not keyword lists).
  • Link: GIFCT

Misinformation/disinformation (political/medical)

  • EUvsDisinfo
  • What it is: Repository of disinfo cases and narratives tracked by the EUโ€™s East StratCom Task Force.
  • Use: Identify recurring narrative keywords and frames.
  • Link: EUvsDisinfo
  • Poynter/IFCN fact-check indexes
  • What it is: Aggregated fact-checks across partners (Snopes, PolitiFact, AFP, etc.).
  • Use: See terms and claims that often get labeled or reduced in reach.
  • Links: IFCN at Poynter
  • COVID/medical misinfo datasets (research)
  • CoAID, ReCOVery, and similar corpora contain labeled claims and common triggers.
  • Links:
  • CoAID
  • ReCOVery

Violence, self-harm, adult/sexual content

Spam, fraud, coordinated inauthentic behavior

  • Spam/SEO keyword lists (open NLP)
  • What it is: Heuristics for marketing spam, phishing terms, crypto scam phrases.
  • Links:
  • Email spam keyword lists (various) (community lists; not authoritative)
  • Phishing/brand abuse signals
  • What it is: IOC feeds and term patterns from security orgs.
  • Links:
  • APWG (reports)
  • PhishTank

Region- or law-driven enforcement references

  • EU Digital Services Act (DSA) Transparency
  • What it is: Platforms publish high-level moderation stats, risk assessments, systemic risk categories.
  • Use: Understand categories prioritized in EU enforcement periods (elections, public health).
  • Link: EU DSA portal
  • Germany NetzDG, India IT Rules, etc.
  • What it is: Local legal categories guiding removal or access restriction.
  • Link: Country-specific regulator portals; platform transparency centers.

Practical โ€œrisk profileโ€ summaries by category

  • Highly sensitive across platforms:
  • CSAM/minors (absolute hard block; zero tolerance)
  • Explicit calls to violence, detailed weapon/procurement instructions
  • Sexual services and explicit content terms (esp. involving minors)
  • Slurs and dehumanizing statements targeting protected classes
  • Elevated scrutiny:
  • Elections, public health, conflict zones, crisis narratives
  • Conspiracy signifiers tied to real-world harm narratives
  • Self-harm encouragement or instructions
  • Heuristic triggers (context-dependent):
  • Repetitive slogans/hashtags during โ€œpolicy eventsโ€
  • Link shorteners + low-reputation domains
  • Copy-paste duplication across many accounts

How to safely test and learn your โ€œedgeโ€

  • Run phrasing A/B tests on low-reach posts; watch for labels, reduced distribution, or interstitials.
  • Prefer questions and sourced claims; include primary-source links and archives (e.g., archive.today).
  • Avoid hard-block terms even when quoting; paraphrase and contextualize.
  • Donโ€™t mass copy-paste; vary synonyms and cadence to avoid spam heuristics.
  • Cross-post to federated or open platforms (Mastodon, Matrix, Odysee) to gauge โ€œbaselineโ€ engagement unaffected by centralized throttling.