Modernize and deploy to psf infra #84

ewdurbin · 2025-08-22T14:28:13Z

No description provided.

- Migrate from Poetry to pip-tools with hash verification for better security - Upgrade Python from 3.8.5 to 3.13 for latest features and performance - Upgrade PostgreSQL from v12 to v16 and Redis from v5 to v7 - Simplify database configuration to use DATABASE_URL connection string - Simplify Redis configuration to use REDIS_URL connection string - Reduce Google BigQuery config from 5 env vars to 1 (GOOGLE_SERVICE_ACCOUNT_JSON) - Remove Kubernetes deployment files (will deploy via different method) - Add Procfile and gunicorn.conf.py for modern PaaS deployment - Fix Flask-Limiter and Flask-Migrate compatibility with latest versions - Fix Celery 5.x configuration (use lowercase broker_url) - Remove hardcoded Redis URL from Celery initialization - Update docker-compose to use .env file for configuration - Add comprehensive documentation: - CLAUDE.md: Full application architecture and components - CONFIGURATION.md: Environment variables and setup guide - ETL_TESTING.md: Testing BigQuery ETL locally - ADMIN_FEATURES.md: Admin panel documentation - .env.example: Sample environment configuration 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

- Change all download columns from INTEGER to BIGINT in database models - Add migration to alter existing tables to use BIGINT - Prevents "integer out of range" errors for packages with >2.1B downloads - Allows handling up to 9.2 quintillion downloads per metric The ETL was failing for popular packages whose download counts exceeded PostgreSQL's INTEGER maximum of 2,147,483,647. This change ensures the application can handle the scale of modern PyPI download statistics. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

- Create GitHub Actions workflow that runs on push to main/master - Checks code formatting with black and isort - Performs basic Python syntax validation - Ensures CI passes to enable deployment automation 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

- Add /_health/ route alongside existing /health endpoint - Returns 200 OK for load balancer/monitoring health checks - Required for deployment tooling and monitoring 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

- Configure Gunicorn to bind to Unix socket when BIND_UNIX_SOCKET is set - Socket path: /var/run/cabotage/cabotage.sock - Set proper umask (0o117) for socket permissions (660) - Falls back to TCP port binding when not set - Update documentation with new configuration option 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

- Mount entire source directory as Docker volumes for hot-reload - Run formatting tools (black, isort) inside Docker containers - Add check-fmt make target for CI-style format checking - Install dev requirements in Docker image for consistency This ensures development changes are immediately reflected without rebuilding containers and maintains version consistency between local development and CI environments. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

- Remove production dependencies from requirements-dev.in - Regenerate requirements-dev.txt with only dev tools - This avoids hash verification issues with platform-specific deps in CI - CI now installs a minimal set of dev tools (black, isort, pip-tools) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

- Format migration files to match black 25.1.0 expectations - Update pyproject.toml to target Python 3.13 instead of 3.7 - Add blank line after docstrings in migration files 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

- Add default case to docker-entrypoint.sh to execute arbitrary commands - Update pyproject.toml to target Python 3.13 instead of 3.7 - This ensures local Docker environment can properly run formatting checks - Fixes issue where docker-compose run would silently fail for black/isort Now our local development environment will catch formatting issues before they reach CI. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

- Implement multi-stage build to reduce final image size - Use virtual environment for better dependency isolation - Add BuildKit cache mounts for apt and pip (faster rebuilds) - Default DEVEL=no for production, but set DEVEL=yes in docker-compose - Install postgresql-client only in development mode - Pre-compile Python bytecode for faster startup - Remove obsolete version field from docker-compose.yml - Remove unnecessary user switching for simpler development workflow The release command (flask db upgrade) now works correctly in containers. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

- Rename beat to worker-beat for clarity - Remove flower worker (monitoring UI not needed) - Keep core processes: web, worker, worker-beat, and release 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

- Fix SyntaxWarning for invalid escape sequences in BigQuery regex patterns - Add celery-redbeat for Redis-based beat scheduler (no filesystem writes) - Configure Celery to use RedBeat scheduler in config, Procfile, and docker-entrypoint - Update beat commands to explicitly use --scheduler redbeat.RedBeatScheduler This eliminates the need for persistent filesystem storage for Celery beat, makes the scheduler state shareable across instances, and fixes Python 3.13 syntax warnings. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

Major improvements: - Add SQLite staging for zero-downtime atomic updates - Stream BigQuery results instead of loading all into memory (95% memory reduction) - Fix NULL Python version handling to preserve "null" category data - Add configurable batch size via ETL_BATCH_SIZE env var (default 100k) - Optimize SQLite with PRAGMA settings for bulk inserts - Create indexes after bulk load for better performance - Use 2000-row chunks for SQLite inserts to avoid variable limits - Add use_sqlite parameter to ETL task (default True) Performance impact: - Memory usage: ~95% reduction (2.1M rows → 100k max) - Time: +3.9% slower (132s → 137s) - acceptable tradeoff - Data consistency: Atomic updates prevent partial visibility - Data integrity: All row counts match perfectly with old ETL The slight performance overhead is worth the massive memory savings and elimination of partial data visibility during updates. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

Features: - Complete backfill system for historical PyPI statistics - Multiple backfill strategies: sequential, parallel, monthly, yearly - CLI tool (manage_backfill.py) for easy backfill management - Progress tracking and status checking capabilities - Skip existing data option to resume interrupted backfills Memory & Performance Optimizations: - Reduced SQLite journal from MEMORY to WAL mode - Changed temp_store from MEMORY to FILE - Reduced cache size from 64MB to 32MB - Chunked PostgreSQL transfers (10k rows instead of 50k) - Smaller execute_values page_size (1k instead of 10k) - Fixed ETL to skip recent stats updates during backfill - Prevent stats from disappearing mid-backfill Documentation: - backfill_examples.md: Complete usage examples and best practices - Detailed docstrings for all backfill functions This provides a production-ready system for populating fresh instances and recovering from data gaps, with significant memory usage reductions during large data transfers. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

tomaarsen · 2025-08-25T08:31:50Z

I take it that this has resulted in the site being back online? I appreciate your work on this!

Tom Aarsen

ewdurbin and others added 14 commits August 22, 2025 10:27

ewdurbin merged commit 581551a into main Aug 22, 2025
1 check passed

ewdurbin deleted the modernize-and-deploy-to-psf-infra branch August 22, 2025 14:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Modernize and deploy to psf infra #84

Modernize and deploy to psf infra #84

Uh oh!

ewdurbin commented Aug 22, 2025

Uh oh!

Uh oh!

tomaarsen commented Aug 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Modernize and deploy to psf infra #84

Modernize and deploy to psf infra #84

Uh oh!

Conversation

ewdurbin commented Aug 22, 2025

Uh oh!

Uh oh!

tomaarsen commented Aug 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants