"If the right to privacy means anything, it is the right of the individual, married or single, to be free from unwarranted governmental intrusion." - William J. Brennan, Jr.
The full โtrigger wordโ or keyword lists used by Big Tech arenโt publicly published – that is what makes them secret. But we do know quite a bit from leaks, academic studies, content moderation policy documents, and transparency reports.
Hereโs a clear breakdown ๐
๐งฑ 1. The reality: Lists exist, but theyโre proprietary
Each major platform (Meta, X/Twitter, YouTube, TikTok, LinkedIn, etc.) maintains internal keyword or phrase lists used by:
These lists are enormous, multi-language, and constantly evolving.
They are never fully published because doing so would:
Enable evasion and โgaming the systemโ
Create PR controversies over politically or culturally sensitive inclusions
Expose moderation bias or state-requested influence
So, officially, the companies only describe categories of content moderated, not the actual trigger terms.
๐ข 2. Partial transparency sources (what is known)
The following are legitimate published or leaked clues:
Meta (Facebook, Instagram):
โDangerous Individuals and Organizations Listโ (leaked in 2021 by The Intercept): included thousands of groups or individuals tied to hate, terrorism, militias, etc. This list functions like a keyword/entity trigger.
Community Standards page details banned categories (violence, hate speech, etc.), though not the exact words.
Internal moderation manuals (Project Daisy, leaked to Motherboard) show patterns, not terms โ e.g., slurs + target class = violation.
Twitter/X (before 2023):
Transparency reports referenced โHateful Conductโ and โMisinformationโ classifiers that use internal term banks.
Independent researchers like the Stanford Internet Observatory have identified sample keyword clusters via the API (before rate limits changed).
YouTube / Google:
Has published research on โclassifier fairnessโ and word embeddings used in moderation (e.g., Google Researchโs Perspective API) โ some training lexicons are public Perspective API toxic terms dataset on GitHub.
โHate and harassmentโ categories use lists of slurs + context detection models like BERT fine-tuned on Jigsaw Toxic Comment Dataset.
TikTok:
Leaked internal moderation rules in 2020 (The Guardian, Intercept): โDo not recommendโ list included political phrases, LGBTQ terms (later revised), Tiananmen references, and other โsensitiveโ content.
Keyword-level suppression lists were country-specific.
Microsoft / LinkedIn:
Uses blocklists for โprofanity, hate terms, and unsafe domains.โ Internal terms not public; however, researchers found that flagging strongly correlates with the Hatebase lexicon and Googleโs โbadwordโ open list.
Governments:
In some regions (China, Russia, Turkey, EU DSA enforcement desks), governments provide official โblocklistsโ or terms to monitor.
The Global Internet Forum to Counter Terrorism (GIFCT) maintains shared hash databases of terrorist content and associated keywords, which member companies use collaboratively.
๐งฉ 3. Open-source and academic mirrors
Academics and civil society groups maintain approximate keyword sets used to model Big Tech moderation:
These arenโt the same internal lists the platforms use, but they give you a reasonable map of the categories that attract algorithmic scrutiny.
โ๏ธ 4. Whatโs โflaggyโ in current systems
Across platforms, the types of words or phrases that are highly weighted in classifiers include:
Explicit sexual content (esp. minors, body parts, sexual offers)
Violent or self-harm instructions
Slurs or ethnic / religious insults
Political crisis nouns (โelection rigged,โ โbioweapon,โ โcrisis actor,โ etc.)
Medical claims (especially COVID, vaccines, reproductive health)
Conspiracy signifiers (โplandemic,โ โnew world order,โ etc.)
Weaponized language (call-to-violence verbs, โgo attack,โ etc.)
Again โ itโs context + keyword + account history that decide if itโs flagged.
๐งญ 5. How to map โtrigger profilesโ safely
For activists or alternative-media projects (like your Digital Freedom Awareness Week), a responsible approach is:
Use context-sensitive testing โ small posts, slightly altered phrasing, monitor engagement and label behaviors.
Compare safe vs. flagged variants to map edge boundaries.
Archive key phrases and timestamps privately (donโt publish full flag words).
Crossโpost through federated or open platforms (Mastodon, Odysee, Matrix) to gauge normalized reach.
Reference sheet:
Open-source and partially leaked โtriggerโ keyword resources by category
Note: Platforms do not publish their internal trigger lists. The sources below are open datasets, research corpora, and notable leaks that approximate what gets flagged. Use these for awareness and testing, not as definitive platform lists.
Hate speech and harassment
Hatebase Lexicon (multilingual)
What it is: Crowdsourced/global lexicon of hate terms with context metadata (language, region, target).
Use: Build awareness of slur categories and regional variants.
Mostly image/video classifier-based; few public term lists. Research uses โNSFW/violenceโ taxonomies (e.g., OpenNSFW variants). No authoritative keyword list.
Spam, fraud, coordinated inauthentic behavior
Spam/SEO keyword lists (open NLP)
What it is: Heuristics for marketing spam, phishing terms, crypto scam phrases.