Our Methodology
How we collect, classify, and analyze news from multiple perspectives
Data Collection
Newsar aggregates news from RSS feeds across the political spectrum. We carefully select sources to ensure balanced coverage from left-leaning, centrist, and right-leaning outlets.
Feed Selection Criteria
- Established news organizations with consistent publishing schedules
- Diverse political perspectives to avoid echo chambers
- Regional variety to capture different geographic viewpoints
- Multiple languages to provide international coverage
Classification System
Every article is automatically classified using a hybrid approach that combines rule-based methods with AI-powered analysis.
Political Bias Scale
We use a continuous scale from -1.0 (far left) to +1.0 (far right), with 0 representing center/neutral coverage:
- Far Left: -1.0 to -0.6
- Center-Left: -0.6 to -0.2
- Center: -0.2 to +0.2
- Center-Right: +0.2 to +0.6
- Far Right: +0.6 to +1.0
Classification Methods
1. Rule-Based Classification
For sources with established bias ratings (e.g., from Media Bias/Fact Check), we apply known bias scores directly. This provides fast, consistent classification for major outlets.
2. AI-Powered Analysis
For unknown sources or verification, we use Ollama with llama3.2:3b to analyze:
- Language Detection: Automatic identification of article language
- Political Bias: Content-based bias analysis using LLM reasoning
- Geographic POV: Identifying regional perspective in coverage
- Entities: Extracting people, organizations, locations, and events
- Sentiment: Overall tone and emotional content analysis
Story Clustering
Articles covering the same news event are automatically grouped into "stories" using semantic similarity analysis.
How It Works
- Embedding Generation: Each article is converted into a 768-dimensional vector using nomic-embed-text
- Similarity Search: We use pgvector's cosine similarity to find related articles
- Clustering: DBSCAN/HDBSCAN algorithms group similar articles together
- Story Creation: Each cluster becomes a "story" with articles from different perspectives
Coverage Diversity Score
Stories receive a diversity score (0-1) based on:
- Number of unique sources covering the story
- Distribution across political bias categories
- Geographic diversity of sources
- Time span of coverage
Content Analysis
Beyond classification, we perform additional analysis to help you understand each article:
Keyword Extraction
AI-powered identification of the most important terms and concepts in each article, with relevance scoring.
Summary Generation
Concise, neutral summaries created by AI to help you quickly understand the article's main points.
Sentiment Analysis
Overall tone assessment on a scale from very negative (-1.0) to very positive (+1.0), helping you understand the emotional framing of the coverage.
Quality & Transparency
Confidence Scores
Every classification includes a confidence score (0-1) indicating how certain the AI is about its assessment. Lower confidence suggests the article may have mixed signals or be difficult to classify.
Method Indicators
Each article shows whether it was classified using:
- Rule-based: Known source with established rating
- Ollama: AI-analyzed content for unknown sources
- Hybrid: Combination of both methods
Continuous Improvement
Our classification system is constantly being refined. We:
- Monitor classification accuracy
- Update source ratings as outlets evolve
- Improve AI prompts based on performance
- Add new sources to maintain balanced coverage
Privacy & Local Processing
All AI processing happens locally using Ollama—no data is sent to external services. Your reading habits and preferences remain completely private.
Limitations & Biases
While we strive for accuracy and fairness, it's important to acknowledge limitations:
- AI models can make mistakes or misinterpret nuanced content
- The left-right political spectrum is simplistic and doesn't capture all viewpoints
- Source selection inherently involves editorial decisions
- Breaking news may not be immediately classified or clustered
We encourage critical thinking and using Newsar as one of many tools for staying informed.
Have questions about our methodology or suggestions for improvement?
Learn More About Newsar