Aggregate, de-duplicate and republish RSS feeds.
The problem I'm solving here is this; if I subscribe to the main Guardian RSS feed, I see a great many articles I'm not interested in1. But if instead I subscribe to the feeds for individual tags, while I don't see the things I'm not interested in, I do see a great many duplicates - articles with multiple tags show up in multiple feeds. This little app allows me to have the best of both worlds - I can see only2 the articles I'm interested in my reader3, and only once.
Local development requires:
- uv for all things Python
- xc as a task runner
- colima4 for running the docker images we need for our integration tests
- Node.js for pyright (version pinned in
.tool-versions)
Some tasks may require additional tools:
- gh for controlling GitHub actions
- 1Password CLI for secure secret usage
- AWS CLI for controlling & querying AWS
- terraform for deployment (version pinned in
.tool-versions) - libxml2 for testing our RSS output
- fzf and gum for command line shenanigans.
On a Mac, you can install most of these with homebrew and asdf.
brew install uv xc gh 1password-cli awscli asdf colima libxml2 fzf gum # And follow any additional setup instructions brew gives you
asdf plugin add terraform
asdf plugin add nodejs
asdf installThe application is a Flask async web app deployed as an AWS Lambda function running within the Lambda Web Adapter.
Good places to start investigating the code are:
On each request, RSSService orchestrates the full pipeline:
- A FeedsService implementation reads a list of feed paths
and constructs full Guardian RSS URLs.
- FileFeedsService reads from from feeds.txt, OR
- S3FeedsService reads from an S3 object using boto3.
- Fetcher retrieves all feeds concurrently using httpx with HTTP/2 and connection pooling.
- RSSParser parses the responses with defusedxml and de-duplicates items by GUID.
- RSSGenerator sorts by date, applies a configurable item limit, and emits a fresh RSS feed.
Services are wired together with wireup for dependency injection, with configuration sourced from environment variables as per the twelve-factor app. AWS API Gateway provides the public HTTP endpoint, backed by Terraform-managed infrastructure-as-code located in terraform/.
Configuration is sourced from environment variables:
| Variable | Description | Default |
|---|---|---|
FEEDS_SERVICE |
Feeds service implementation: FileFeedsService or S3FeedsService |
FileFeedsService |
FEEDS_FILE |
Path to the feeds list file (used by FileFeedsService) |
feeds.txt |
MAX_ITEMS |
Maximum number of items in the output feed | 50 |
MAX_CONNECTIONS |
Maximum number of concurrent HTTP connections | 16 |
MAX_KEEPALIVE_CONNECTIONS |
Maximum number of keep-alive HTTP connections | 16 |
KEEPALIVE_EXPIRY |
Keep-alive connection expiry in seconds | 5 |
RETRIES |
Number of HTTP retry attempts per feed | 3 |
TIMEOUT |
HTTP request timeout in seconds | 3 |
LOG_LEVEL |
Log verbosity: ERROR, WARNING, INFO, or DEBUG |
INFO |
The following are additionally required when FEEDS_SERVICE=S3FeedsService:
| Variable | Description | Default |
|---|---|---|
FEEDS_BUCKET_NAME |
S3 bucket name containing the feeds list | brunns-rss-agg-feeds |
FEEDS_OBJECT_NAME |
S3 object key for the feeds list | feeds.txt |
AWS_DEFAULT_REGION |
AWS region for S3 access | boto3 default |
AWS_ACCESS_KEY_ID |
AWS access key ID for S3 access | boto3 default |
AWS_SECRET_ACCESS_KEY |
AWS secret access key for S3 access | boto3 default |
S3_ENDPOINT |
Custom S3-compatible endpoint URL, e.g. for local testing | AWS S3 |
These tasks can be run using xc.
Precommit tasks
Requires: test, lint, audit
RunDeps: async
#!/usr/bin/env python
import thisRun CLI - outputs RSS to stdout
uv run cli -vvRun web server
./run.shRun all tests
Requires: unit, integration
RunDeps: async
Unit tests
uv run pytest tests/unit/ --durations=10 --cov-report term-missing --cov-fail-under 100 --cov srcIntegration tests
if command -v colima > /dev/null; then colima status || colima start; fi
uv run pytest tests/integration/ -s --durations=10Format code
uv run ruff format .
uv run ruff check . --fix-onlyCode quality & security checks
Requires: lint-code, type-checking
RunDeps: async
Lint code
uv run ruff format . --check
uv run ruff check .Type checking
uv run pyrightAudit for known vulnerabilities
Requires: audit-py, audit-gha
RunDeps: async
Audit Python dependencies for known vulnerabilities
uv auditScan GitHub Actions for vulnerabilities
uvx zizmor -o .Build lambda image
Inputs: IMAGE_NAME
Environment: IMAGE_NAME=deployment_package.zip
rm -rf build/ terraform/"$IMAGE_NAME"
uv export --no-dev --python 3.14 --format requirements-txt --output-file requirements.txt
uv pip install -r requirements.txt --target build --python 3.14
cp -r src/rss_agg build/
cp run.sh build/
cp feeds.txt build/
chmod +x build/run.sh
cd build
zip -r ../terraform/"$IMAGE_NAME" .
cd ..Initialise terraform
Directory: ./terraform
terraform initPlan infrastructure changes
Requires: build, terraform-init
RunDeps: async
Directory: ./terraform
terraform planPush to origin, and monitor CI workflow
Requires: pc
Inputs: WORKFLOW
Environment: WORKFLOW=ci.yml
git push
sleep 5
RUN_ID=$(gh run list --workflow="$WORKFLOW" --limit=1 --json databaseId --jq '.[0].databaseId')
gh run watch "$RUN_ID" --exit-statusRun deployment workflow
Inputs: WORKFLOW
Environment: WORKFLOW=cd.yml
gh workflow run "$WORKFLOW"
sleep 5
RUN_ID=$(gh run list --workflow="$WORKFLOW" --limit=1 --json databaseId --jq '.[0].databaseId')
gh run watch "$RUN_ID" --exit-statusCheck feed is running and returning XML
Inputs: API_URL
Environment: API_URL=http://0.0.0.0:8080
set +x
echo "Testing API at: $API_URL"
# curl will retry up to 5 times with 5 second delays, fail on non-200 status
if curl -fsSL --retry 5 --retry-delay 5 "$API_URL" | xmllint --noout - 2>/dev/null; then
echo "✓ API returned valid XML"
else
echo "✗ API check failed (ensure xmllint is installed: brew install libxml2)"
exit 1
fiQuery CloudWatch logs for recent Lambda activity
Inputs: DURATION
Environment: DURATION=1h
aws logs tail /aws/lambda/rss_aggregator --since "$DURATION" --format shortOne-off commands to set up the AWS S3 bucket that terraform will use to store
infrastructure state. Run aws configure first to authenticate if necessary.
aws s3 mb s3://brunns-rss-agg-terraform-state --region eu-west-2
aws s3api put-bucket-versioning --bucket brunns-rss-agg-terraform-state --versioning-configuration Status=EnabledUpload feeds.txt (by default) to the S3 feeds bucket
Inputs: FEEDS_FILE, BUCKET, OBJECT
Environment: FEEDS_FILE=feeds.txt
Environment: BUCKET=brunns-rss-agg-feeds
Environment: OBJECT=feeds.txt
aws s3 cp "$FEEDS_FILE" s3://"$BUCKET"/"$OBJECT"Use brunns-python-template or similar.