🚀 A blazingly fast Python tool to harvest URLs and metadata from website sitemaps like a digital archaeologist!
pip install sitemap-harvester# Harvest a website's sitemap
sitemap-harvester --url https://example.com
# Custom output file and timeout
sitemap-harvester --url https://example.com --output my_data.csv --timeout 15- 📝 Page Title - The main title of each page
- 📄 Meta Description - SEO descriptions
- 🏷️ Keywords - Meta keywords (if present)
- 👤 Author - Page author information
- 🔗 Canonical URL - Canonical link references
- 🖼️ Open Graph Data - Social media metadata
- 🌐 Custom Meta Tags - Any additional meta information
- Use
--timeoutfor slower websites or large sitemaps - The tool automatically deduplicates URLs for you
- Check the console output for real-time progress updates
- Large sitemaps? Grab a coffee ☕ and let it work its magic!
Found a bug? Have a feature request? Contributions are welcome! Feel free to open an issue or submit a pull request.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Happy harvesting! 🌾