-
Notifications
You must be signed in to change notification settings - Fork 17
Modernize and deploy to psf infra #84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Migrate from Poetry to pip-tools with hash verification for better security - Upgrade Python from 3.8.5 to 3.13 for latest features and performance - Upgrade PostgreSQL from v12 to v16 and Redis from v5 to v7 - Simplify database configuration to use DATABASE_URL connection string - Simplify Redis configuration to use REDIS_URL connection string - Reduce Google BigQuery config from 5 env vars to 1 (GOOGLE_SERVICE_ACCOUNT_JSON) - Remove Kubernetes deployment files (will deploy via different method) - Add Procfile and gunicorn.conf.py for modern PaaS deployment - Fix Flask-Limiter and Flask-Migrate compatibility with latest versions - Fix Celery 5.x configuration (use lowercase broker_url) - Remove hardcoded Redis URL from Celery initialization - Update docker-compose to use .env file for configuration - Add comprehensive documentation: - CLAUDE.md: Full application architecture and components - CONFIGURATION.md: Environment variables and setup guide - ETL_TESTING.md: Testing BigQuery ETL locally - ADMIN_FEATURES.md: Admin panel documentation - .env.example: Sample environment configuration 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
- Change all download columns from INTEGER to BIGINT in database models - Add migration to alter existing tables to use BIGINT - Prevents "integer out of range" errors for packages with >2.1B downloads - Allows handling up to 9.2 quintillion downloads per metric The ETL was failing for popular packages whose download counts exceeded PostgreSQL's INTEGER maximum of 2,147,483,647. This change ensures the application can handle the scale of modern PyPI download statistics. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
- Create GitHub Actions workflow that runs on push to main/master - Checks code formatting with black and isort - Performs basic Python syntax validation - Ensures CI passes to enable deployment automation 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
- Add /_health/ route alongside existing /health endpoint - Returns 200 OK for load balancer/monitoring health checks - Required for deployment tooling and monitoring 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
- Configure Gunicorn to bind to Unix socket when BIND_UNIX_SOCKET is set - Socket path: /var/run/cabotage/cabotage.sock - Set proper umask (0o117) for socket permissions (660) - Falls back to TCP port binding when not set - Update documentation with new configuration option 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
- Mount entire source directory as Docker volumes for hot-reload - Run formatting tools (black, isort) inside Docker containers - Add check-fmt make target for CI-style format checking - Install dev requirements in Docker image for consistency This ensures development changes are immediately reflected without rebuilding containers and maintains version consistency between local development and CI environments. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
- Remove production dependencies from requirements-dev.in - Regenerate requirements-dev.txt with only dev tools - This avoids hash verification issues with platform-specific deps in CI - CI now installs a minimal set of dev tools (black, isort, pip-tools) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
- Format migration files to match black 25.1.0 expectations - Update pyproject.toml to target Python 3.13 instead of 3.7 - Add blank line after docstrings in migration files 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
- Add default case to docker-entrypoint.sh to execute arbitrary commands - Update pyproject.toml to target Python 3.13 instead of 3.7 - This ensures local Docker environment can properly run formatting checks - Fixes issue where docker-compose run would silently fail for black/isort Now our local development environment will catch formatting issues before they reach CI. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
- Implement multi-stage build to reduce final image size - Use virtual environment for better dependency isolation - Add BuildKit cache mounts for apt and pip (faster rebuilds) - Default DEVEL=no for production, but set DEVEL=yes in docker-compose - Install postgresql-client only in development mode - Pre-compile Python bytecode for faster startup - Remove obsolete version field from docker-compose.yml - Remove unnecessary user switching for simpler development workflow The release command (flask db upgrade) now works correctly in containers. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
- Rename beat to worker-beat for clarity - Remove flower worker (monitoring UI not needed) - Keep core processes: web, worker, worker-beat, and release 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
- Fix SyntaxWarning for invalid escape sequences in BigQuery regex patterns - Add celery-redbeat for Redis-based beat scheduler (no filesystem writes) - Configure Celery to use RedBeat scheduler in config, Procfile, and docker-entrypoint - Update beat commands to explicitly use --scheduler redbeat.RedBeatScheduler This eliminates the need for persistent filesystem storage for Celery beat, makes the scheduler state shareable across instances, and fixes Python 3.13 syntax warnings. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Major improvements: - Add SQLite staging for zero-downtime atomic updates - Stream BigQuery results instead of loading all into memory (95% memory reduction) - Fix NULL Python version handling to preserve "null" category data - Add configurable batch size via ETL_BATCH_SIZE env var (default 100k) - Optimize SQLite with PRAGMA settings for bulk inserts - Create indexes after bulk load for better performance - Use 2000-row chunks for SQLite inserts to avoid variable limits - Add use_sqlite parameter to ETL task (default True) Performance impact: - Memory usage: ~95% reduction (2.1M rows → 100k max) - Time: +3.9% slower (132s → 137s) - acceptable tradeoff - Data consistency: Atomic updates prevent partial visibility - Data integrity: All row counts match perfectly with old ETL The slight performance overhead is worth the massive memory savings and elimination of partial data visibility during updates. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Features: - Complete backfill system for historical PyPI statistics - Multiple backfill strategies: sequential, parallel, monthly, yearly - CLI tool (manage_backfill.py) for easy backfill management - Progress tracking and status checking capabilities - Skip existing data option to resume interrupted backfills Memory & Performance Optimizations: - Reduced SQLite journal from MEMORY to WAL mode - Changed temp_store from MEMORY to FILE - Reduced cache size from 64MB to 32MB - Chunked PostgreSQL transfers (10k rows instead of 50k) - Smaller execute_values page_size (1k instead of 10k) - Fixed ETL to skip recent stats updates during backfill - Prevent stats from disappearing mid-backfill Documentation: - backfill_examples.md: Complete usage examples and best practices - Detailed docstrings for all backfill functions This provides a production-ready system for populating fresh instances and recovering from data gaps, with significant memory usage reductions during large data transfers. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
|
I take it that this has resulted in the site being back online? I appreciate your work on this!
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.