Get Senior Engineers Straight To Your Inbox

Slashdev Engineers

Every month we send out our top new engineers in our network who are looking for work, be the first to get informed when top engineers become available

Slashdev Cofounders

At Slashdev, we connect top-tier software engineers with innovative companies. Our network includes the most talented developers worldwide, carefully vetted to ensure exceptional quality and reliability.

Top Software Developer 2026 - Clutch Ranking

Build a Smart News Aggregator That Cuts Through Noise/

Michael

Michael

Michael is a software engineer and startup growth expert with 10+ years of software engineering and machine learning experience.

0 Min Read

Build a Smart News Aggregator That Cuts Through Noise
Best Free Code Editors and IDEs for Developers in 2025
Best Free Code Editors and IDEs for Developers in 2025

Thanks For Commenting On Our Post!

We’re excited to share this comprehensive guide with you. This resource includes best practices, and real-world implementation strategies that we use at slashdev when building apps for clients worldwide.

What’s Inside This Guide:

  • Why Your News Feed Is Broken: Understanding algorithmic manipulation
  • The Solution: AI-powered curation that actually works
  • 3 Powerful Code Scripts: Ready-to-use tools you can run today
  • Setup & Customization: Get running in 10 minutes
  • Key Takeaways: What you learned and where to go next

Overview:

Your news feed is designed to keep you scrolling, not keep you informed. Every algorithm optimizes for engagement, which usually means anger, fear, or outrage. That’s not a bug, it’s the business model.

But here’s what most people don’t realize: you can build your own system in about 100 lines of Python.

The Real Problem

Traditional news feeds throw everything at you with no filter except “what keeps you clicking.” You get:

  • The same story from 50 different outlets
  • Clickbait headlines that promise more than they deliver
  • Doomscrolling content designed to spike your cortisol
  • Zero transparency about why you’re seeing what you’re seeing

The average person spends 2+ hours daily consuming news that makes them anxious without making them informed.

How Smart Aggregation Works

Instead of relying on someone else’s algorithm, you’ll build a system that:

  1. Pulls from quality sources using RSS feeds (BBC, Reuters, TechCrunch, Hacker News)
  2. Analyzes each article for sentiment, relevance, and clickbait patterns
  3. Filters out noise based on your criteria, not an advertiser’s profit motive
  4. Presents clean summaries so you get information without manipulation

The technical stack is simple:

  • feedparser – Pulls RSS feeds cleanly
  • transformers – Pre-trained AI models for sentiment analysis
  • pandas – Organizes and filters your data

No machine learning degree required. If you can run a Python script, you can build this.

What You’re Actually Building

Three focused scripts that solve specific problems:

Script 1: The Basic Aggregator: Pulls articles from multiple sources, analyzes sentiment, filters clickbait. This is your foundation – a clean news feed that respects your time.

Script 2: The Duplicate Detector: Groups identical stories from different outlets so you see one comprehensive version instead of 20 variations of the same headline.

Script 3: The Personal Dashboard: Turns your curated feed into a beautiful web interface you can check every morning. Think Apple News, but you control the algorithm.

Each script is standalone. Use one, use all three, or mix and match. They work independently but are even better together.

The Problem with Traditional UI Libraries

1.Material-UI / Chakra UI / Ant Design:Massive bundle sizes

  • MUI core: ~300KB minified
  • Chakra: ~150KB minified
  • You import a Button, you get the entire design system

2.Complex theming

  • 500-line theme objects
  • Custom CSS-in-JS solutions
  • Fighting with specificity to override styles

3.Opinionated design

  • “Material Design” or “Chakra aesthetic” baked in
  • Making it look different requires warfare
  • Your app looks like everyone else’s

4.Heavy abstractions

  • Multiple wrapper components
  • Custom prop APIs to learn
  • Hard to debug when things break

What you actually need:

  • Accessible, functional components
  • Easy to style with your existing tools (Tailwind, CSS modules)
  • Small bundle impact
  • Full control over appearance

Practical Codes

Code 1: Smart News Feed with AI Analysis

This is your core aggregator. It fetches articles, analyzes sentiment, detects clickbait, and ranks everything by actual relevance.

import feedparser
import pandas as pd
from datetime import datetime, timedelta
from transformers import pipeline

class SmartNewsFeed:
    def __init__(self):
        # Initialize AI sentiment analyzer
        print("Loading AI model...")
        self.sentiment_analyzer = pipeline(
            "sentiment-analysis",
            model="distilbert-base-uncased-finetuned-sst-2-english"
        )
        
        # Your news sources - customize these
        self.sources = {
            'Tech': [
                'https://hnrss.org/frontpage',
                'https://techcrunch.com/feed/'
            ],
            'Business': [
                'https://feeds.reuters.com/reuters/businessNews'
            ],
            'World': [
                'https://feeds.bbci.co.uk/news/world/rss.xml'
            ]
        }
    
    def fetch_articles(self, hours_back=24):
        """Pull fresh articles from all sources"""
        cutoff = datetime.now() - timedelta(hours=hours_back)
        articles = []
        
        for category, feeds in self.sources.items():
            for feed_url in feeds:
                feed = feedparser.parse(feed_url)
                
                for entry in feed.entries:
                    # Parse publish date
                    if hasattr(entry, 'published_parsed'):
                        pub_date = datetime(*entry.published_parsed[:6])
                        if pub_date < cutoff:
                            continue
                    
                    articles.append({
                        'title': entry.title,
                        'link': entry.link,
                        'summary': entry.summary[:200],
                        'source': feed.feed.title,
                        'category': category,
                        'published': pub_date if hasattr(entry, 'published_parsed') else None
                    })
        
        return articles
    
    def analyze_article(self, article):
        """Run AI analysis on article"""
        text = f"{article['title']} {article['summary']}"
        
        # Get sentiment
        sentiment = self.sentiment_analyzer(text[:512])[0]
        article['sentiment'] = sentiment['label']
        article['sentiment_score'] = sentiment['score']
        
        # Detect clickbait patterns
        title_lower = article['title'].lower()
        clickbait_words = ['shocking', 'you won\'t believe', 'what happened next']
        article['is_clickbait'] = any(word in title_lower for word in clickbait_words)
        
        # Calculate relevance score
        score = 50  # Base score
        if article['sentiment'] == 'POSITIVE':
            score += 15
        if not article['is_clickbait']:
            score += 20
        article['relevance_score'] = score
        
        return article
    
    def get_top_stories(self, limit=10, filter_clickbait=True):
        """Get your personalized news feed"""
        print("\nπŸ” Fetching articles...")
        articles = self.fetch_articles(hours_back=24)
        
        print(f"πŸ“Š Analyzing {len(articles)} articles...")
        analyzed = [self.analyze_article(a) for a in articles]
        
        # Apply filters
        if filter_clickbait:
            analyzed = [a for a in analyzed if not a['is_clickbait']]
        
        # Sort by relevance
        analyzed.sort(key=lambda x: x['relevance_score'], reverse=True)
        
        return analyzed[:limit]
    
    def display_feed(self, articles):
        """Print your curated feed"""
        print("\n" + "="*70)
        print("YOUR SMART NEWS FEED")
        print("="*70 + "\n")
        
        for i, article in enumerate(articles, 1):
            print(f"{i}. {article['title']}")
            print(f"   πŸ“° {article['source']} | {article['category']}")
            print(f"   😊 {article['sentiment']} | Score: {article['relevance_score']}/100")
            print(f"   πŸ”— {article['link']}\n")
    
    def save_to_csv(self, articles, filename='news_feed.csv'):
        """Export to CSV for later"""
        df = pd.DataFrame(articles)
        df.to_csv(filename, index=False)
        print(f"πŸ’Ύ Saved {len(articles)} articles to {filename}")


# RUN THIS:
if __name__ == "__main__":
    feed = SmartNewsFeed()
    top_stories = feed.get_top_stories(limit=15, filter_clickbait=True)
    feed.display_feed(top_stories)
    feed.save_to_csv(top_stories)

What this does: Pulls news from major sources, runs AI sentiment analysis on each article, detects clickbait, ranks by relevance, and shows you only what matters. Run it every morning for a clean news briefing.

Code 2: Duplicate Story Detector

Ever notice how the same breaking news appears 50 times from different outlets? This script groups identical stories so you see one version instead of drowning in duplicates.

import feedparser
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

class DuplicateDetector:
    def __init__(self):
        self.sources = [
            'https://feeds.bbci.co.uk/news/world/rss.xml',
            'https://feeds.reuters.com/Reuters/worldNews',
            'https://rss.nytimes.com/services/xml/rss/nyt/World.xml'
        ]
    
    def fetch_all_articles(self):
        """Get articles from all sources"""
        articles = []
        
        for feed_url in self.sources:
            feed = feedparser.parse(feed_url)
            for entry in feed.entries:
                articles.append({
                    'title': entry.title,
                    'link': entry.link,
                    'source': feed.feed.title
                })
        
        return articles
    
    def find_duplicates(self, articles, threshold=0.6):
        """Find articles covering the same story"""
        if len(articles) < 2:
            return []
        
        titles = [a['title'] for a in articles]
        
        # Convert titles to numerical vectors
        vectorizer = TfidfVectorizer(stop_words='english')
        vectors = vectorizer.fit_transform(titles)
        
        # Calculate similarity between all titles
        similarity = cosine_similarity(vectors)
        
        # Group similar articles
        clusters = []
        processed = set()
        
        for i in range(len(similarity)):
            if i in processed:
                continue
            
            # Find all articles similar to this one
            similar = [i]
            for j in range(i + 1, len(similarity)):
                if similarity[i][j] > threshold:
                    similar.append(j)
                    processed.add(j)
            
            if len(similar) > 1:  # Only clusters with duplicates
                clusters.append(similar)
        
        return clusters
    
    def display_clusters(self, articles, clusters):
        """Show grouped stories"""
        print("\n" + "="*70)
        print("DUPLICATE STORIES DETECTED")
        print("="*70 + "\n")
        
        for i, cluster in enumerate(clusters, 1):
            print(f"Story #{i} - Covered by {len(cluster)} sources:")
            for idx in cluster:
                article = articles[idx]
                print(f"  β€’ {article['source']}")
                print(f"    {article['title']}")
                print(f"    {article['link']}\n")
            print("-" * 70 + "\n")
    
    def get_best_version(self, articles, cluster):
        """Pick the most comprehensive article from a cluster"""
        cluster_articles = [articles[i] for i in cluster]
        # Longest title usually = most detailed
        best = max(cluster_articles, key=lambda x: len(x['title']))
        return best


# RUN THIS:
if __name__ == "__main__":
    detector = DuplicateDetector()
    
    print("πŸ” Fetching articles from multiple sources...")
    articles = detector.fetch_all_articles()
    print(f"Found {len(articles)} articles")
    
    print("\n🧠 Detecting duplicate stories...")
    clusters = detector.find_duplicates(articles, threshold=0.6)
    
    if clusters:
        detector.display_clusters(articles, clusters)
        print(f"\nβœ… Found {len(clusters)} stories covered by multiple outlets")
        print("You just saved yourself from reading the same thing 50 times.")
    else:
        print("\nβœ… No duplicate stories found")

What this does: Compares article titles using AI similarity scoring and groups stories that are covering the same event. Shows you which outlets are reporting the same thing so you can read one comprehensive version instead of multiple shallow takes.

Code 3:Personal News Dashboard

Turn your filtered feed into a beautiful web interface. Open it every morning instead of doomscrolling Twitter or checking 10 different news apps.

from flask import Flask, render_template_string
import feedparser
from datetime import datetime
from transformers import pipeline

app = Flask(__name__)

# Initialize AI model
sentiment_analyzer = pipeline("sentiment-analysis")

def get_curated_news():
    """Fetch and analyze news"""
    feeds = {
        'Tech': 'https://hnrss.org/frontpage',
        'Business': 'https://feeds.reuters.com/reuters/businessNews',
        'World': 'https://feeds.bbci.co.uk/news/world/rss.xml'
    }
    
    articles = []
    for category, url in feeds.items():
        feed = feedparser.parse(url)
        for entry in feed.entries[:5]:  # Top 5 from each source
            text = entry.title[:512]
            sentiment = sentiment_analyzer(text)[0]
            
            articles.append({
                'title': entry.title,
                'link': entry.link,
                'category': category,
                'sentiment': sentiment['label'],
                'score': round(sentiment['score'] * 100)
            })
    
    # Sort by sentiment score (most balanced first)
    articles.sort(key=lambda x: abs(x['score'] - 50))
    return articles[:15]  # Top 15 overall


HTML_TEMPLATE = """
<!DOCTYPE html>
<html>
<head>
    <title>Your News Dashboard</title>
    <style>
        * { margin: 0; padding: 0; box-sizing: border-box; }
        body {
            font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif;
            background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
            padding: 40px 20px;
            min-height: 100vh;
        }
        .container {
            max-width: 800px;
            margin: 0 auto;
        }
        h1 {
            color: white;
            font-size: 2.5em;
            margin-bottom: 10px;
            text-shadow: 2px 2px 4px rgba(0,0,0,0.2);
        }
        .subtitle {
            color: rgba(255,255,255,0.9);
            margin-bottom: 30px;
            font-size: 1.1em;
        }
        .article {
            background: white;
            border-radius: 12px;
            padding: 25px;
            margin-bottom: 15px;
            box-shadow: 0 4px 6px rgba(0,0,0,0.1);
            transition: transform 0.2s, box-shadow 0.2s;
        }
        .article:hover {
            transform: translateY(-2px);
            box-shadow: 0 6px 12px rgba(0,0,0,0.15);
        }
        .article-title {
            font-size: 1.3em;
            margin-bottom: 10px;
            color: #1a1a1a;
        }
        .article-title a {
            color: #1a1a1a;
            text-decoration: none;
        }
        .article-title a:hover {
            color: #667eea;
        }
        .meta {
            display: flex;
            gap: 10px;
            flex-wrap: wrap;
        }
        .badge {
            padding: 5px 12px;
            border-radius: 20px;
            font-size: 0.85em;
            font-weight: 600;
        }
        .badge-tech { background: #3b82f6; color: white; }
        .badge-business { background: #10b981; color: white; }
        .badge-world { background: #f59e0b; color: white; }
        .badge-positive { background: #dcfce7; color: #166534; }
        .badge-negative { background: #fee2e2; color: #991b1b; }
    </style>
</head>
<body>
    <div class="container">
        <h1>Your Smart News Feed</h1>
        <p class="subtitle">Curated by AI β€’ Updated {{ time }}</p>
        
        {% for article in articles %}
        <div class="article">
            <h2 class="article-title">
                <a href="{{ article.link }}" target="_blank">{{ article.title }}</a>
            </h2>
            <div class="meta">
                <span class="badge badge-{{ article.category.lower() }}">
                    {{ article.category }}
                </span>
                <span class="badge badge-{{ article.sentiment.lower() }}">
                    {{ article.sentiment }} ({{ article.score }}%)
                </span>
            </div>
        </div>
        {% endfor %}
    </div>
</body>
</html>
"""

@app.route('/')
def home():
    articles = get_curated_news()
    current_time = datetime.now().strftime('%B %d, %Y at %I:%M %p')
    return render_template_string(HTML_TEMPLATE, 
                                 articles=articles, 
                                 time=current_time)

# RUN THIS:
if __name__ == "__main__":
    print("πŸš€ Starting your news dashboard...")
    print("πŸ“± Open http://localhost:5000 in your browser")
    print("Press Ctrl+C to stop\n")
    app.run(debug=True, port=5000)

What this does: Creates a clean, beautiful web dashboard that pulls news, analyzes it, and presents only balanced, relevant stories. Bookmark localhost:5000 and open it every morning instead of checking 10 different apps.

How to Run These Codes

Step 1: Install Requirements

Open your terminal and run:

# Install the essentials
pip install feedparser pandas transformers torch scikit-learn flask

# This downloads the AI model (one-time, ~500MB)
# It happens automatically on first run

Step 2: Save and Run

For Code 1 (Smart News Feed):

  1. Save the code as news_feed.py
  2. Run: python news_feed.py
  3. Wait 30 seconds while it fetches and analyzes
  4. Your curated feed prints in the terminal
  5. Check news_feed.csv for the full data

For Code 2 (Duplicate Detector):

  1. Save as duplicate_detector.py
  2. Run: python duplicate_detector.py
  3. It shows you which stories are covered by multiple outlets
  4. Pick one version to read instead of 10

For Code 3 (Dashboard):

  1. Save as dashboard.py
  2. Run: python dashboard.py
  3. Open your browser to http://localhost:5000
  4. Bookmark it and check it daily
  5. Press Ctrl+C in terminal to stop

Common Issues:

“No module named ‘transformers'”: Run: pip install transformers torch

“Model not found”: First run downloads the AI model. Takes 2-3 minutes. Be patient.

Dashboard won’t open: Make sure port 5000 isn’t already used. Change to 5001 in the code if needed.

Customization:

Change news sources: Edit the sources or feeds dictionary in any script. Find RSS feeds for your favorite sites (usually at website.com/rss or /feed).

Adjust filtering: In Code 1, change filter_clickbait=True to False if you want to see everything.

Change article limit: Modify limit=15 to show more or fewer stories.

Key Concepts

You’ve now discovered how to build a Smart News Aggregator as a Python-powered alternative to algorithmic feeds that pulls articles from RSS sources without platform manipulation, filters content using AI sentiment analysis to detect emotional framing and clickbait patterns, eliminates duplicate stories through cosine similarity calculations across different outlets, ranks articles by custom relevance criteria you control instead of engagement metrics, and presents information through clean interfaces that respect your time and attention.

Building this news aggregator teaches you RSS feed integration for direct content access, natural language processing fundamentals through transformer models, vector similarity algorithms for duplicate detection, custom ranking systems that encode your information priorities, and how to create personal AI infrastructure that serves your goals rather than advertiser interests – giving you the clarity of professional curation with the control of owning your entire pipeline.

The key is understanding when algorithmic feeds optimized for engagement actively harm your ability to stay informed, recognizing that transparent filtering beats black-box recommendations, and using these tools strategically to build an information diet that makes you smarter without making you anxious – because news feeds aren’t broken by accident, they’re broken by design, and you just fixed yours.

About slashdev.io

At slashdev.io, we’re a global software engineering company specializing in building production web and mobile applications. We combine cutting-edge LLM technologies (Claude Code, Gemini, Grok, ChatGPT) with traditional tech stacks like ReactJS, Laravel, iOS, and Flutter to deliver exceptional results.

What sets us apart:

  • Expert developers at $50/hour
  • AI-powered development workflows for enhanced productivity
  • Full-service engineering support, not just code
  • Experience building real production applications at scale

Whether you’re building your next app or need expert developers to join your team, we provide ongoing developer relationships that go beyond one-time assessments.

Need Development Support?

Building something ambitious? We’d love to help. Our team specializes in turning ideas into production-ready applications using the latest AI-powered development techniques combined with solid engineering fundamentals.