Skip to content

brunns/rss-agg

Repository files navigation

RSS aggregator

made-with-python made-with-uv Licence ci deploy GitHub forks GitHub stars GitHub watchers GitHub contributors GitHub issues GitHub issues-closed GitHub pull-requests GitHub pull-requests closed Lines of Code Top Language Languages Code to Comment xc compatible zread

Aggregate, de-duplicate and republish RSS feeds.

The problem I'm solving here is this; if I subscribe to the main Guardian RSS feed, I see a great many articles I'm not interested in1. But if instead I subscribe to the feeds for individual tags, while I don't see the things I'm not interested in, I do see a great many duplicates - articles with multiple tags show up in multiple feeds. This little app allows me to have the best of both worlds - I can see only2 the articles I'm interested in my reader3, and only once.

Prerequisites

Local development requires:

Some tasks may require additional tools:

On a Mac, you can install most of these with homebrew and asdf.

brew install uv xc gh 1password-cli awscli asdf colima libxml2 fzf gum # And follow any additional setup instructions brew gives you
asdf plugin add terraform
asdf plugin add nodejs
asdf install

Design

The application is a Flask async web app deployed as an AWS Lambda function running within the Lambda Web Adapter.

Good places to start investigating the code are:

  • web.py, the application entry point.
  • routes.py, the Flask route definitions.

On each request, RSSService orchestrates the full pipeline:

Services are wired together with wireup for dependency injection, with configuration sourced from environment variables as per the twelve-factor app. AWS API Gateway provides the public HTTP endpoint, backed by Terraform-managed infrastructure-as-code located in terraform/.

Configuration

Configuration is sourced from environment variables:

Variable Description Default
FEEDS_SERVICE Feeds service implementation: FileFeedsService or S3FeedsService FileFeedsService
FEEDS_FILE Path to the feeds list file (used by FileFeedsService) feeds.txt
MAX_ITEMS Maximum number of items in the output feed 50
MAX_CONNECTIONS Maximum number of concurrent HTTP connections 16
MAX_KEEPALIVE_CONNECTIONS Maximum number of keep-alive HTTP connections 16
KEEPALIVE_EXPIRY Keep-alive connection expiry in seconds 5
RETRIES Number of HTTP retry attempts per feed 3
TIMEOUT HTTP request timeout in seconds 3
LOG_LEVEL Log verbosity: ERROR, WARNING, INFO, or DEBUG INFO

The following are additionally required when FEEDS_SERVICE=S3FeedsService:

Variable Description Default
FEEDS_BUCKET_NAME S3 bucket name containing the feeds list brunns-rss-agg-feeds
FEEDS_OBJECT_NAME S3 object key for the feeds list feeds.txt
AWS_DEFAULT_REGION AWS region for S3 access boto3 default
AWS_ACCESS_KEY_ID AWS access key ID for S3 access boto3 default
AWS_SECRET_ACCESS_KEY AWS secret access key for S3 access boto3 default
S3_ENDPOINT Custom S3-compatible endpoint URL, e.g. for local testing AWS S3

Tasks

These tasks can be run using xc.

pc

Precommit tasks

Requires: test, lint, audit

RunDeps: async

#!/usr/bin/env python
import this

cli

Run CLI - outputs RSS to stdout

uv run cli -vv

web

Run web server

./run.sh

test

Run all tests

Requires: unit, integration

RunDeps: async

unit

Unit tests

uv run pytest tests/unit/ --durations=10 --cov-report term-missing --cov-fail-under 100 --cov src

integration

Integration tests

if command -v colima > /dev/null; then colima status || colima start; fi
uv run pytest tests/integration/ -s --durations=10

format

Format code

uv run ruff format .
uv run ruff check . --fix-only

lint

Code quality & security checks

Requires: lint-code, type-checking

RunDeps: async

lint-code

Lint code

uv run ruff format . --check
uv run ruff check .

type-checking

Type checking

uv run pyright

audit

Audit for known vulnerabilities

Requires: audit-py, audit-gha

RunDeps: async

audit-py

Audit Python dependencies for known vulnerabilities

uv audit

audit-gha

Scan GitHub Actions for vulnerabilities

uvx zizmor -o .

build

Build lambda image

Inputs: IMAGE_NAME

Environment: IMAGE_NAME=deployment_package.zip

rm -rf build/ terraform/"$IMAGE_NAME"
uv export --no-dev --python 3.14 --format requirements-txt --output-file requirements.txt
uv pip install -r requirements.txt --target build --python 3.14
cp -r src/rss_agg build/
cp run.sh build/
cp feeds.txt build/
chmod +x build/run.sh
cd build
zip -r ../terraform/"$IMAGE_NAME" .
cd ..

terraform-init

Initialise terraform

Directory: ./terraform

terraform init

plan

Plan infrastructure changes

Requires: build, terraform-init

RunDeps: async

Directory: ./terraform

terraform plan

push

Push to origin, and monitor CI workflow

Requires: pc

Inputs: WORKFLOW

Environment: WORKFLOW=ci.yml

git push
sleep 5
RUN_ID=$(gh run list --workflow="$WORKFLOW" --limit=1 --json databaseId --jq '.[0].databaseId')
gh run watch "$RUN_ID" --exit-status

deploy

Run deployment workflow

Inputs: WORKFLOW

Environment: WORKFLOW=cd.yml

gh workflow run "$WORKFLOW"
sleep 5
RUN_ID=$(gh run list --workflow="$WORKFLOW" --limit=1 --json databaseId --jq '.[0].databaseId')
gh run watch "$RUN_ID" --exit-status

healthcheck

Check feed is running and returning XML

Inputs: API_URL

Environment: API_URL=http://0.0.0.0:8080

set +x
echo "Testing API at: $API_URL"

# curl will retry up to 5 times with 5 second delays, fail on non-200 status
if curl -fsSL --retry 5 --retry-delay 5 "$API_URL" | xmllint --noout - 2>/dev/null; then
    echo "✓ API returned valid XML"
else
    echo "✗ API check failed (ensure xmllint is installed: brew install libxml2)"
    exit 1
fi

logs

Query CloudWatch logs for recent Lambda activity

Inputs: DURATION

Environment: DURATION=1h

aws logs tail /aws/lambda/rss_aggregator --since "$DURATION" --format short

create-s3-bucket

One-off commands to set up the AWS S3 bucket that terraform will use to store infrastructure state. Run aws configure first to authenticate if necessary.

aws s3 mb s3://brunns-rss-agg-terraform-state --region eu-west-2
aws s3api put-bucket-versioning --bucket brunns-rss-agg-terraform-state --versioning-configuration Status=Enabled

upload-feeds

Upload feeds.txt (by default) to the S3 feeds bucket

Inputs: FEEDS_FILE, BUCKET, OBJECT

Environment: FEEDS_FILE=feeds.txt

Environment: BUCKET=brunns-rss-agg-feeds

Environment: OBJECT=feeds.txt

aws s3 cp "$FEEDS_FILE" s3://"$BUCKET"/"$OBJECT"

Initial setup steps

Use brunns-python-template or similar.

Footnotes

  1. How is there so much sport in the world, and so many people writing and talking about it?

  2. Or mostly only - the sub-editors do seem to do some questionable tagging sometimes.

  3. Currently Feedly.

  4. On a Mac - I'm not sure what you might use on other platforms.

About

Aggregate, de-duplicate and republish RSS feeds

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors