Our Methodology

How we collect, classify, and analyze news from multiple perspectives

Data Collection

Newsar aggregates news from RSS feeds across the political spectrum. We carefully select sources to ensure balanced coverage from left-leaning, centrist, and right-leaning outlets.

Feed Selection Criteria

  • Established news organizations with consistent publishing schedules
  • Diverse political perspectives to avoid echo chambers
  • Regional variety to capture different geographic viewpoints
  • Multiple languages to provide international coverage

Classification System

Every article is automatically classified using a hybrid approach that combines rule-based methods with AI-powered analysis.

Political Bias Scale

We use a continuous scale from -1.0 (far left) to +1.0 (far right), with 0 representing center/neutral coverage:

  • Far Left: -1.0 to -0.6
  • Center-Left: -0.6 to -0.2
  • Center: -0.2 to +0.2
  • Center-Right: +0.2 to +0.6
  • Far Right: +0.6 to +1.0

Classification Methods

1. Rule-Based Classification

For sources with established bias ratings (e.g., from Media Bias/Fact Check), we apply known bias scores directly. This provides fast, consistent classification for major outlets.

2. AI-Powered Analysis

For unknown sources or verification, we use Ollama with llama3.2:3b to analyze:

  • Language Detection: Automatic identification of article language
  • Political Bias: Content-based bias analysis using LLM reasoning
  • Geographic POV: Identifying regional perspective in coverage
  • Entities: Extracting people, organizations, locations, and events
  • Sentiment: Overall tone and emotional content analysis

Story Clustering

Articles covering the same news event are automatically grouped into "stories" using semantic similarity analysis.

How It Works

  1. Embedding Generation: Each article is converted into a 768-dimensional vector using nomic-embed-text
  2. Similarity Search: We use pgvector's cosine similarity to find related articles
  3. Clustering: DBSCAN/HDBSCAN algorithms group similar articles together
  4. Story Creation: Each cluster becomes a "story" with articles from different perspectives

Coverage Diversity Score

Stories receive a diversity score (0-1) based on:

  • Number of unique sources covering the story
  • Distribution across political bias categories
  • Geographic diversity of sources
  • Time span of coverage

Content Analysis

Beyond classification, we perform additional analysis to help you understand each article:

Keyword Extraction

AI-powered identification of the most important terms and concepts in each article, with relevance scoring.

Summary Generation

Concise, neutral summaries created by AI to help you quickly understand the article's main points.

Sentiment Analysis

Overall tone assessment on a scale from very negative (-1.0) to very positive (+1.0), helping you understand the emotional framing of the coverage.

Quality & Transparency

Confidence Scores

Every classification includes a confidence score (0-1) indicating how certain the AI is about its assessment. Lower confidence suggests the article may have mixed signals or be difficult to classify.

Method Indicators

Each article shows whether it was classified using:

  • Rule-based: Known source with established rating
  • Ollama: AI-analyzed content for unknown sources
  • Hybrid: Combination of both methods

Continuous Improvement

Our classification system is constantly being refined. We:

  • Monitor classification accuracy
  • Update source ratings as outlets evolve
  • Improve AI prompts based on performance
  • Add new sources to maintain balanced coverage

Privacy & Local Processing

All AI processing happens locally using Ollama—no data is sent to external services. Your reading habits and preferences remain completely private.

Limitations & Biases

While we strive for accuracy and fairness, it's important to acknowledge limitations:

  • AI models can make mistakes or misinterpret nuanced content
  • The left-right political spectrum is simplistic and doesn't capture all viewpoints
  • Source selection inherently involves editorial decisions
  • Breaking news may not be immediately classified or clustered

We encourage critical thinking and using Newsar as one of many tools for staying informed.

Have questions about our methodology or suggestions for improvement?

Learn More About Newsar