Data Collection

Newsar aggregates news from RSS feeds across the political spectrum. We carefully select sources to ensure balanced coverage from left-leaning, centrist, and right-leaning outlets.

Feed Selection Criteria

Established news organizations with consistent publishing schedules
Diverse political perspectives to avoid echo chambers
Regional variety to capture different geographic viewpoints
Multiple languages to provide international coverage

Classification System

Every article is automatically classified using a hybrid approach that combines rule-based methods with AI-powered analysis.

Political Bias Scale

We use a continuous scale from -1.0 (far left) to +1.0 (far right), with 0 representing center/neutral coverage:

Far Left: -1.0 to -0.6
Center-Left: -0.6 to -0.2
Center: -0.2 to +0.2
Center-Right: +0.2 to +0.6
Far Right: +0.6 to +1.0

Classification Methods

1. Rule-Based Classification

For sources with established bias ratings (e.g., from Media Bias/Fact Check), we apply known bias scores directly. This provides fast, consistent classification for major outlets.

2. AI-Powered Analysis

For unknown sources or verification, we use Ollama with llama3.2:3b to analyze:

Language Detection: Automatic identification of article language
Political Bias: Content-based bias analysis using LLM reasoning
Geographic POV: Identifying regional perspective in coverage
Entities: Extracting people, organizations, locations, and events
Sentiment: Overall tone and emotional content analysis

Story Clustering

Articles covering the same news event are automatically grouped into "stories" using semantic similarity analysis.

How It Works

Embedding Generation: Each article is converted into a 768-dimensional vector using nomic-embed-text
Similarity Search: We use pgvector's cosine similarity to find related articles
Clustering: DBSCAN/HDBSCAN algorithms group similar articles together
Story Creation: Each cluster becomes a "story" with articles from different perspectives

Coverage Diversity Score

Stories receive a diversity score (0-1) based on:

Number of unique sources covering the story
Distribution across political bias categories
Geographic diversity of sources
Time span of coverage

Content Analysis

Beyond classification, we perform additional analysis to help you understand each article:

Keyword Extraction

AI-powered identification of the most important terms and concepts in each article, with relevance scoring.

Summary Generation

Concise, neutral summaries created by AI to help you quickly understand the article's main points.

Sentiment Analysis

Overall tone assessment on a scale from very negative (-1.0) to very positive (+1.0), helping you understand the emotional framing of the coverage.

Quality & Transparency

Confidence Scores

Every classification includes a confidence score (0-1) indicating how certain the AI is about its assessment. Lower confidence suggests the article may have mixed signals or be difficult to classify.

Method Indicators

Each article shows whether it was classified using:

Rule-based: Known source with established rating
Ollama: AI-analyzed content for unknown sources
Hybrid: Combination of both methods

Continuous Improvement

Our classification system is constantly being refined. We:

Monitor classification accuracy
Update source ratings as outlets evolve
Improve AI prompts based on performance
Add new sources to maintain balanced coverage

Privacy & Local Processing

All AI processing happens locally using Ollama—no data is sent to external services. Your reading habits and preferences remain completely private.

Limitations & Biases

While we strive for accuracy and fairness, it's important to acknowledge limitations:

AI models can make mistakes or misinterpret nuanced content
The left-right political spectrum is simplistic and doesn't capture all viewpoints
Source selection inherently involves editorial decisions
Breaking news may not be immediately classified or clustered

We encourage critical thinking and using Newsar as one of many tools for staying informed.

Have questions about our methodology or suggestions for improvement?

Learn More About Newsar

Our Methodology