Methodology

How OSFeed produces intelligence

OSFeed is not a black box. Every decision in the pipeline — from source selection to event merging — is logged, explainable, and auditable. This page documents exactly how raw Telegram messages become structured geopolitical events.

Design Principles

Full Transparency

Every event links back to its original sources. Every channel has a published reliability profile. Every merge decision is logged. No black boxes.

Multi-Perspective by Design

When sources contradict each other on facts — not just framing — we flag the event as contested and present all perspectives. We don't pick sides.

Source Provenance

Each event carries a complete audit trail: which channel reported first, when corroboration arrived, and exactly what each source contributed.

The 7-Layer Pipeline

Each message passes through seven processing layers before reaching the analyst. Here is exactly what happens at each stage.

Layer 1

Real-Time Collection

OSFeed monitors 44+ public Telegram channels using the Telegram User Account API (Telethon). Unlike bot APIs, this provides full access to channel content — media, documents, edit history, and reactions. Messages are captured the moment they are published.

Channels grouped by topic: Ukraine-Russia, Middle East, Sahel & West Africa, USA & Global Power Shifts
10–20 curated channels per topic, selected for coverage breadth and source independence
Automatic rate-limit handling and persistent session management
Media capture (images, videos, documents) with size controls

Layer 2

Contextual Translation

Every message is translated to canonical English through a geopolitically-aware pipeline. This is not Google Translate — the system injects a domain-specific glossary covering military terminology, regional acronyms, and context-sensitive vocabulary.

Geopolitical glossary: IDF, FSB, GRU, PMC Wagner, JNIM — plus regional military jargon
Context-aware translations: 'обстрел' → 'shelling' (not 'shot'), 'منطقة عازلة' → 'buffer zone' (not 'isolated zone')
Multi-stage language detection: character-set heuristics, lingua library, Telegram metadata fallback
Display translations cached per-language (7-day TTL) for 13 supported languages

Layer 3

Event Detection

Not every Telegram message is an event. Most are commentary, reposts, or opinions. OSFeed classifies each message using a multi-signal heuristic scoring system, with LLM fallback for borderline cases.

6-weighted criteria: action verbs, specific locations/entities, temporal keywords, named entity density, absence of hedging, non-trivial content
Adaptive threshold: shorter text requires lower scores to prevent noise
Borderline cases (score 0.5–0.7) escalated to LLM for contextual classification
Only messages classified as event-bearing proceed to deduplication

Layer 4

Semantic Deduplication

Each event-bearing message is embedded into a 1536-dimensional vector space using OpenAI's text-embedding-3-small model. The embedding is compared against existing event centroids to determine whether this message reports a known event or something new.

Embeddings are always computed from the English translation — ensuring cross-language consistency
Strong match (similarity > 0.82): message is linked to the existing event as an additional source
Grey zone (0.65–0.82): LLM reviews both events' entity fingerprints and content to decide
No match (< 0.65): message creates a new event with LLM-generated title and summary

Layer 5

Multi-Signal Merge Scoring

Events that may refer to the same real-world development are evaluated using a 7-signal scoring system. This prevents both false merges (combining unrelated events) and fragmentation (the same event split across multiple entries).

Centroid similarity (25% weight): cosine similarity of averaged source embeddings
Title/summary similarity (20%): cosine similarity of event summaries
Temporal proximity (15%): time-decay scoring over a 72-hour window
Geographic similarity (12%): semantic embedding of location entities
Entity overlap (10%): Jaccard overlap of extracted persons, organizations, locations, weapons
Actor similarity (8%): semantic embedding of persons and organizations
Category bonus (5%): same event type (military, diplomatic, humanitarian) receives a boost
Adaptive weight redistribution when specific signal embeddings are unavailable

Layer 6

Entity Extraction & Canonicalization

Every event is analyzed to extract structured entities: persons, locations, organizations, weapons, and event type. An alias resolution system maps variant names to canonical forms.

LLM-extracted entity fingerprint: persons, locations, organizations, weapons, event_type
Alias resolution: 'ISIS' → 'Islamic State', 'Zelensky' → 'Volodymyr Zelenskyy'
Entity fingerprints drive merge scoring (entity_overlap signal) and search
Entities cached and refreshed every 5 minutes for consistency

Layer 7

Contradiction Detection

When sources disagree on facts — not just framing or perspective — OSFeed flags the event as contested. The summary presents all factual claims attributed to their sources, without editorializing.

Factual contradiction: Ukrainian channels report 31 of 47 drones intercepted; Russian channels claim all targets hit
Perspective difference (not flagged): one source calls it a 'military operation,' another calls it an 'attack'
Contested events carry a contradiction note explaining the specific disagreement
Multi-perspective summaries preserve each source's factual claims without synthesis

Summary Integrity Rules

Event summaries follow strict editorial constraints enforced at the LLM prompt level. These rules ensure summaries report facts, not interpretations.

✕No analysis or commentary ('This indicates...', 'This suggests...')
✕No meta-observations ('Multiple sources confirm...', 'This represents...')
✕No hedging beyond what sources state ('possibly', 'seemingly')
✕No explaining relationships between sources
✕No concluding sentences or editorial framing
✕Geographic context preserved: city + country minimum

Continuous Improvement

The pipeline improves over time through structured feedback loops.

Merge Feedback

Analysts can manually merge or split events. Every correction is logged in a feedback table used to retrain merge scoring weights.

Channel Profiling

Source reliability profiles are refreshed periodically as channels evolve. Profiles are LLM-generated from recent message samples and validated against historical accuracy.

Score Logging

Every merge scoring decision — the 7 individual signal scores and the composite result — is logged for audit and analysis.

Coverage Gaps

When events are detected significantly later than expected (e.g., the Burkina Faso defense pact arriving 4 hours after Russian state channels), new channels are added to close coverage gaps.

Source Reliability Scoring

How we profile and rate Telegram channels across 5 dimensions.

Case Studies

Real examples of OSFeed detecting and structuring geopolitical events.

Get the intelligence delivered

Subscribe to our newsletter for curated weekly briefings — structured from the same pipeline documented above.

Or join the beta for full access