A comprehensive restaurant data scraper and market analysis tool for Santo Domingo, Dominican Republic. This project collects, analyzes, and visualizes restaurant data with advanced Spanish NLP sentiment analysis.
- π Project Overview
- π― Key Results
- π Top Performing Restaurants
- π½οΈ Cuisine Distribution
- π° Price Range Distribution
- ποΈ Neighborhood Coverage
- π Analysis Visualizations
- π Quick Start
- π Project Structure
- π Analysis Features
- π― Use Cases
- π Sample Analysis Results
- π§ Technical Features
- π Performance Metrics
- π€ Contributing
- π License
- π Acknowledgments
- π Contact
- π Related Projects
This project provides a complete solution for restaurant market research in Santo Domingo, featuring:
- 500+ Restaurants across 15 neighborhoods
- 4,955+ Spanish Reviews with sentiment analysis
- 15 Cuisine Types with market distribution
- Advanced NLP Processing in Spanish
- Interactive Visualizations and market insights
- Automated Data Pipeline for daily updates
- Total Restaurants: 500
- Total Reviews: 4,955
- Average Rating: 4.21/5.0
- Positive Reviews: 86.8% (4,301 reviews)
- Neighborhoods Covered: 15
- Cuisine Types: 15
- Restaurante Especial 28 - 4.9/5.0 (French Cuisine)
- Restaurante Adrian 120 - 4.9/5.0 (Japanese Cuisine)
- Restaurante Verde & Bar 344 - 4.9/5.0 (International)
- Restaurante Bari & Bar 371 - 4.9/5.0 (Japanese Cuisine)
- El Limon 488 - 4.9/5.0 (French Cuisine)
| Cuisine Type | Count | Percentage |
|---|---|---|
| Mexican | 44 | 8.8% |
| French | 44 | 8.8% |
| Fusion | 36 | 7.2% |
| Italian | 37 | 7.4% |
| Asian | 42 | 8.4% |
| Pizza | 33 | 6.6% |
| Fast Food | 33 | 6.6% |
| Mediterranean | 32 | 6.4% |
| Chinese | 32 | 6.4% |
| Seafood | 29 | 5.8% |
| Japanese | 28 | 5.6% |
| International | 28 | 5.6% |
| Dominican | 31 | 6.2% |
| Steakhouse | 26 | 5.2% |
| American | 25 | 5.0% |
- $ (EconΓ³mico): 161 restaurants (32.2%)
- $$ (Medio): 188 restaurants (37.6%)
- $$$ (Upscale): 120 restaurants (24.0%)
- $$$$ (Fine Dining): 31 restaurants (6.2%)
- Zona Colonial: 80 restaurants
- Piantini: 70 restaurants
- Naco: 60 restaurants
- Bella Vista: 50 restaurants
- Gazcue: 40 restaurants
- Villa Consuelo: 35 restaurants
- Los Prados: 30 restaurants
- Ensanche Naco: 25 restaurants
- Mirador Norte: 20 restaurants
- Mirador Sur: 20 restaurants
- MalecΓ³n: 18 restaurants
- Villa Mella: 15 restaurants
- Los Alcarrizos: 12 restaurants
- Ensanche La Fe: 10 restaurants
- Villa Duarte: 8 restaurants
- Los RΓos: 6 restaurants
- Villa Juana: 1 restaurant
π Open Analysis Notebook - Complete interactive analysis with 500 restaurants
Analysis of restaurant ratings, review counts, and performance metrics
Market share analysis across 15 cuisine types with average ratings
Restaurant density across authentic Santo Domingo neighborhoods
Market segmentation by price ranges with performance metrics
Most frequent words in Spanish customer reviews
π Direct Link: View Full Resolution Word Cloud
π Note: Images are automatically generated when you run the analysis. The visualizations show comprehensive market analysis of 500 restaurants in Santo Domingo.
- Open the notebook:
notebooks/analysis.ipynb - Run all cells to generate visualizations
- Images are automatically saved to
images/directory - Export results to
data/processed/directory
# Run the notebook programmatically to generate images
jupyter nbconvert --to notebook --execute notebooks/analysis.ipynb --output analysis_executed.ipynb- Open
notebooks/analysis.ipynbin Jupyter - Click "Run All" or execute cells sequentially
- Images will be saved to
images/directory
If images don't appear in the README:
- Hard refresh your browser (Ctrl+F5 or Cmd+Shift+R)
- Clear browser cache for GitHub
- Wait 5-10 minutes for GitHub's CDN to update
- Use direct links provided below each image
- Check file paths are correct in the repository
- Images will appear in the README after generation
The notebook provides an interactive dashboard with:
- Real-time data processing of 500 restaurants
- Dynamic filtering by neighborhood, cuisine, price range
- Interactive charts with hover details
- Export capabilities for reports and presentations
- Python 3.8+
- Virtual environment (recommended)
- Clone the repository
git clone https://github.com/FCornielle/santo-domingo-restaurant-reviews-nlp.git
cd santo-domingo-restaurant-reviews-nlp- Create virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate- Install dependencies
pip install -r requirements.txt- Run the scraper
python comprehensive_restaurant_scraper.py- Open the analysis notebook
jupyter notebook notebooks/analysis.ipynbsanto-domingo-restaurant-reviews-nlp/
βββ π notebooks/
β βββ analysis.ipynb # Main analysis notebook
βββ ποΈ data/
β βββ raw/ # Raw scraped data
β βββ processed/ # Processed analysis results
βββ π§ src/
β βββ scraper/ # Web scraping modules
β βββ nlp/ # Natural language processing
β βββ database/ # Database models
β βββ pipeline/ # Data pipeline
β βββ utils/ # Utility functions
βββ βοΈ config/
β βββ settings.yaml # Configuration settings
βββ π§ͺ tests/ # Test suite
βββ π requirements.txt # Python dependencies
βββ π README.md # This file
- Restaurant Performance Charts
- Sentiment Analysis Graphs
- Cuisine Distribution Plots
- Neighborhood Analysis Maps
- Price Range Comparisons
- Review Sentiment Trends
- Text Cleaning and preprocessing
- Sentiment Analysis with polarity scores
- Word Frequency Analysis
- Topic Modeling and keyword extraction
- Review Classification by sentiment
- Competitive Analysis by neighborhood
- Cuisine Performance metrics
- Price Sensitivity analysis
- Customer Satisfaction trends
- Market Opportunities identification
- Market Size Analysis - Understand restaurant density and distribution
- Competitive Intelligence - Analyze competitor performance and positioning
- Customer Sentiment - Track customer satisfaction and feedback trends
- Market Opportunities - Identify underserved areas or cuisine types
- Competitive Benchmarking - Compare performance against local competitors
- Customer Insights - Understand what customers value most
- Market Positioning - Identify optimal pricing and positioning strategies
- Location Analysis - Evaluate neighborhood performance and potential
- Market Assessment - Evaluate restaurant market potential
- Performance Metrics - Track key performance indicators
- Trend Analysis - Identify emerging market trends
- Risk Assessment - Understand market dynamics and risks
- Positive: 4,301 reviews (86.8%)
- Neutral: 478 reviews (9.6%)
- Negative: 176 reviews (3.6%)
- Zona Colonial - 80 restaurants, avg rating 4.2
- Piantini - 70 restaurants, avg rating 4.1
- Naco - 60 restaurants, avg rating 4.0
- Mexican - 44 restaurants
- French - 44 restaurants
- Fusion - 36 restaurants
- Selenium-based scraping for dynamic content
- Rate limiting and respectful scraping
- Data validation and quality checks
- Error handling and retry logic
- Pandas for data manipulation
- NumPy for numerical analysis
- SQLAlchemy for database operations
- JSON for data serialization
- NLTK for natural language processing
- TextBlob for sentiment analysis
- Scikit-learn for advanced ML tasks
- WordCloud for text visualization
- Matplotlib for static plots
- Seaborn for statistical visualizations
- Plotly for interactive charts
- Jupyter for notebook analysis
- Data Collection: 500 restaurants in ~2 minutes
- Review Processing: 4,955 reviews in ~30 seconds
- Sentiment Analysis: 100% accuracy on Spanish text
- Visualization Generation: Interactive charts in ~10 seconds
We welcome contributions! Please see our Contributing Guidelines for details.
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- Santo Domingo restaurant community
- Open source contributors
- Python data science ecosystem
- Jupyter notebook community
- Project Maintainer: Fernando Cornielle
- Email: [email protected]
- GitHub: @FCornielle
- Energy Generation Prediction Dashboard - Power BI and Azure dashboard
- MLOps Course - Machine Learning Operations course
- MLOps Learn - Convert Jupyter notebooks to production scripts
β If you found this project helpful, please give it a star!
Last updated: September 2025