Big Tech Secret Censorship Lists - Digital Freedom Cloud

The full “trigger word” or keyword lists used by Big Tech aren’t publicly published – that is what makes them secret. But we do know quite a bit from leaks, academic studies, content moderation policy documents, and transparency reports.

Here’s a clear breakdown 👇

🧱 1. The reality: Lists exist, but they’re proprietary

Each major platform (Meta, X/Twitter, YouTube, TikTok, LinkedIn, etc.) maintains internal keyword or phrase lists used by:

AI classifiers (for initial automated detection)
Heuristic / rules engines (for hard filters)
Safety pipelines (to identify hate speech, illegal content, misinformation, adult content, etc.)

These lists are enormous, multi-language, and constantly evolving.

They are never fully published because doing so would:

Enable evasion and “gaming the system”
Create PR controversies over politically or culturally sensitive inclusions
Expose moderation bias or state-requested influence

So, officially, the companies only describe categories of content moderated, not the actual trigger terms.

🏢 2. Partial transparency sources (what is known)

The following are legitimate published or leaked clues:

Meta (Facebook, Instagram):

“Dangerous Individuals and Organizations List” (leaked in 2021 by The Intercept): included thousands of groups or individuals tied to hate, terrorism, militias, etc. This list functions like a keyword/entity trigger.
Community Standards page details banned categories (violence, hate speech, etc.), though not the exact words.
Internal moderation manuals (Project Daisy, leaked to Motherboard) show patterns, not terms — e.g., slurs + target class = violation.

Twitter/X (before 2023):

Transparency reports referenced “Hateful Conduct” and “Misinformation” classifiers that use internal term banks.
Independent researchers like the Stanford Internet Observatory have identified sample keyword clusters via the API (before rate limits changed).

YouTube / Google:

Has published research on “classifier fairness” and word embeddings used in moderation (e.g., Google Research’s Perspective API) — some training lexicons are public Perspective API toxic terms dataset on GitHub.
“Hate and harassment” categories use lists of slurs + context detection models like BERT fine-tuned on Jigsaw Toxic Comment Dataset.

TikTok:

Leaked internal moderation rules in 2020 (The Guardian, Intercept): “Do not recommend” list included political phrases, LGBTQ terms (later revised), Tiananmen references, and other “sensitive” content.
Keyword-level suppression lists were country-specific.

Microsoft / LinkedIn:

Uses blocklists for “profanity, hate terms, and unsafe domains.” Internal terms not public; however, researchers found that flagging strongly correlates with the Hatebase lexicon and Google’s “badword” open list.

Governments:

In some regions (China, Russia, Turkey, EU DSA enforcement desks), governments provide official “blocklists” or terms to monitor.
The Global Internet Forum to Counter Terrorism (GIFCT) maintains shared hash databases of terrorist content and associated keywords, which member companies use collaboratively.

🧩 3. Open-source and academic mirrors

Academics and civil society groups maintain approximate keyword sets used to model Big Tech moderation:

Dataset	Focus	Publisher
Hatebase Lexicon	Global multilingual hate speech	Hatebase NGO
Google’s Perspective API	Toxicity / harassment modeling	Jigsaw (Alphabet)
CM Hate Speech Words file	Offensive word list	Open-source NLP
Facebook Dangerous Orgs list	Terror/Extremism	The Intercept (leak)
Disinfo Keyword Set – EUvsDisinfo	Disinformation patterns	European External Action Service
GDELT / Media Cloud term monitoring sets	Event/keyword tracking	GDELT Project

These aren’t the same internal lists the platforms use, but they give you a reasonable map of the categories that attract algorithmic scrutiny.

⚖️ 4. What’s “flaggy” in current systems

Across platforms, the types of words or phrases that are highly weighted in classifiers include:

Explicit sexual content (esp. minors, body parts, sexual offers)
Violent or self-harm instructions
Slurs or ethnic / religious insults
Political crisis nouns (“election rigged,” “bioweapon,” “crisis actor,” etc.)
Medical claims (especially COVID, vaccines, reproductive health)
Conspiracy signifiers (“plandemic,” “new world order,” etc.)
Weaponized language (call-to-violence verbs, “go attack,” etc.)

Again — it’s context + keyword + account history that decide if it’s flagged.

🧭 5. How to map “trigger profiles” safely

For activists or alternative-media projects (like your Digital Freedom Awareness Week), a responsible approach is:

Use context-sensitive testing — small posts, slightly altered phrasing, monitor engagement and label behaviors.
Compare safe vs. flagged variants to map edge boundaries.
Archive key phrases and timestamps privately (don’t publish full flag words).
Cross‑post through federated or open platforms (Mastodon, Odysee, Matrix) to gauge normalized reach.

Reference sheet:

Open-source and partially leaked “trigger” keyword resources by category

Note: Platforms do not publish their internal trigger lists. The sources below are open datasets, research corpora, and notable leaks that approximate what gets flagged. Use these for awareness and testing, not as definitive platform lists.

Hate speech and harassment

Hatebase Lexicon (multilingual)
What it is: Crowdsourced/global lexicon of hate terms with context metadata (language, region, target).
Use: Build awareness of slur categories and regional variants.
Link: Hatebase
Jigsaw/Google Perspective API
What it is: Models for toxicity, insults, identity attack, threats, sexual explicitness; example word lists appear in papers/repos.
Use: Test phrasing through the API; inspect open example lists from research community.
Link: Perspective API
Toxic Comment Classification (Kaggle/Jigsaw)
What it is: Public dataset used to train toxicity classifiers with labeled text.
Use: See labeled examples of what models deem toxic.
Link: Kaggle competition
Hate Speech and Offensive Language Lexicons
What it is: Various open-source lists used in NLP.
Links:
HateXplain dataset
LDNOOBW “bad words” list
Offensive/abusive lexicons collection

Extremism, terrorism, “dangerous organizations”

Meta “Dangerous Individuals and Organizations” list (leaked)
What it is: 2021 leak of Facebook’s internal list used for enforcement categories (terror, hate, militarized social movements).
Caveat: Historical snapshot; likely changed since.
Link: The Intercept report and document
GIFCT (Global Internet Forum to Counter Terrorism)
What it is: Shared industry hash database of terrorist/violent extremist content; some taxonomies are public.
Use: Understand shared enforcement frameworks (not keyword lists).
Link: GIFCT

Misinformation/disinformation (political/medical)

EUvsDisinfo
What it is: Repository of disinfo cases and narratives tracked by the EU’s East StratCom Task Force.
Use: Identify recurring narrative keywords and frames.
Link: EUvsDisinfo
Poynter/IFCN fact-check indexes
What it is: Aggregated fact-checks across partners (Snopes, PolitiFact, AFP, etc.).
Use: See terms and claims that often get labeled or reduced in reach.
Links: IFCN at Poynter
COVID/medical misinfo datasets (research)
CoAID, ReCOVery, and similar corpora contain labeled claims and common triggers.
Links:
CoAID
ReCOVery

Violence, self-harm, adult/sexual content

Self-harm/suicide language resources
What it is: Research corpora for self-harm detection.
Use: Identify sensitive phrasing that often triggers welfare checks/interstitials.
Link: SHAI (Self-Harm and Suicide Ideation datasets)
Adult content/profanity lexicons
What it is: Open lists of explicit terms used in filtering pipelines.
Links:
Bad words list (LDNOOBW)
Google’s “bad words” style lists mirrored in OSS
Graphic violence/gore
Mostly image/video classifier-based; few public term lists. Research uses “NSFW/violence” taxonomies (e.g., OpenNSFW variants). No authoritative keyword list.

Spam, fraud, coordinated inauthentic behavior

Spam/SEO keyword lists (open NLP)
What it is: Heuristics for marketing spam, phishing terms, crypto scam phrases.
Links:
Email spam keyword lists (various) (community lists; not authoritative)
Phishing/brand abuse signals
What it is: IOC feeds and term patterns from security orgs.
Links:
APWG (reports)
PhishTank

Region- or law-driven enforcement references

EU Digital Services Act (DSA) Transparency
What it is: Platforms publish high-level moderation stats, risk assessments, systemic risk categories.
Use: Understand categories prioritized in EU enforcement periods (elections, public health).
Link: EU DSA portal
Germany NetzDG, India IT Rules, etc.
What it is: Local legal categories guiding removal or access restriction.
Link: Country-specific regulator portals; platform transparency centers.

Practical “risk profile” summaries by category

Highly sensitive across platforms:
CSAM/minors (absolute hard block; zero tolerance)
Explicit calls to violence, detailed weapon/procurement instructions
Sexual services and explicit content terms (esp. involving minors)
Slurs and dehumanizing statements targeting protected classes
Elevated scrutiny:
Elections, public health, conflict zones, crisis narratives
Conspiracy signifiers tied to real-world harm narratives
Self-harm encouragement or instructions
Heuristic triggers (context-dependent):
Repetitive slogans/hashtags during “policy events”
Link shorteners + low-reputation domains
Copy-paste duplication across many accounts

How to safely test and learn your “edge”

Run phrasing A/B tests on low-reach posts; watch for labels, reduced distribution, or interstitials.
Prefer questions and sourced claims; include primary-source links and archives (e.g., archive.today).
Avoid hard-block terms even when quoting; paraphrase and contextualize.
Don’t mass copy-paste; vary synonyms and cadence to avoid spam heuristics.
Cross-post to federated or open platforms (Mastodon, Matrix, Odysee) to gauge “baseline” engagement unaffected by centralized throttling.