Skip to content

Santo Domingo restaurant data pipeline: daily collection, Spanish NLP on reviews, and interactive market insights by cuisine, price, and neighborhood.

Notifications You must be signed in to change notification settings

FCornielle/santo-domingo-restaurant-reviews-nlp

Repository files navigation

🍽️ Local Business Info Scraper - Santo Domingo

Python License Code Style Data Analysis

A comprehensive restaurant data scraper and market analysis tool for Santo Domingo, Dominican Republic. This project collects, analyzes, and visualizes restaurant data with advanced Spanish NLP sentiment analysis.

πŸ“‘ Table of Contents

πŸ“Š Project Overview

This project provides a complete solution for restaurant market research in Santo Domingo, featuring:

  • 500+ Restaurants across 15 neighborhoods
  • 4,955+ Spanish Reviews with sentiment analysis
  • 15 Cuisine Types with market distribution
  • Advanced NLP Processing in Spanish
  • Interactive Visualizations and market insights
  • Automated Data Pipeline for daily updates

🎯 Key Results

πŸ“ˆ Market Statistics

  • Total Restaurants: 500
  • Total Reviews: 4,955
  • Average Rating: 4.21/5.0
  • Positive Reviews: 86.8% (4,301 reviews)
  • Neighborhoods Covered: 15
  • Cuisine Types: 15

πŸ† Top Performing Restaurants

  1. Restaurante Especial 28 - 4.9/5.0 (French Cuisine)
  2. Restaurante Adrian 120 - 4.9/5.0 (Japanese Cuisine)
  3. Restaurante Verde & Bar 344 - 4.9/5.0 (International)
  4. Restaurante Bari & Bar 371 - 4.9/5.0 (Japanese Cuisine)
  5. El Limon 488 - 4.9/5.0 (French Cuisine)

🍽️ Cuisine Distribution

Cuisine Type Count Percentage
Mexican 44 8.8%
French 44 8.8%
Fusion 36 7.2%
Italian 37 7.4%
Asian 42 8.4%
Pizza 33 6.6%
Fast Food 33 6.6%
Mediterranean 32 6.4%
Chinese 32 6.4%
Seafood 29 5.8%
Japanese 28 5.6%
International 28 5.6%
Dominican 31 6.2%
Steakhouse 26 5.2%
American 25 5.0%

πŸ’° Price Range Distribution

  • $ (EconΓ³mico): 161 restaurants (32.2%)
  • $$ (Medio): 188 restaurants (37.6%)
  • $$$ (Upscale): 120 restaurants (24.0%)
  • $$$$ (Fine Dining): 31 restaurants (6.2%)

🏘️ Neighborhood Coverage

  • Zona Colonial: 80 restaurants
  • Piantini: 70 restaurants
  • Naco: 60 restaurants
  • Bella Vista: 50 restaurants
  • Gazcue: 40 restaurants
  • Villa Consuelo: 35 restaurants
  • Los Prados: 30 restaurants
  • Ensanche Naco: 25 restaurants
  • Mirador Norte: 20 restaurants
  • Mirador Sur: 20 restaurants
  • MalecΓ³n: 18 restaurants
  • Villa Mella: 15 restaurants
  • Los Alcarrizos: 12 restaurants
  • Ensanche La Fe: 10 restaurants
  • Villa Duarte: 8 restaurants
  • Los RΓ­os: 6 restaurants
  • Villa Juana: 1 restaurant

πŸ“Š Analysis Visualizations

🎯 Interactive Analysis Notebook

πŸ‘‰ Open Analysis Notebook - Complete interactive analysis with 500 restaurants

πŸ“ˆ Key Visualizations Generated

1. Restaurant Performance Analysis

Restaurant Performance Analysis of restaurant ratings, review counts, and performance metrics

2. Cuisine Type Distribution

Cuisine Distribution Market share analysis across 15 cuisine types with average ratings

3. Neighborhood Analysis

Neighborhood Analysis Restaurant density across authentic Santo Domingo neighborhoods

4. Price Range Analysis

Price Analysis Market segmentation by price ranges with performance metrics

5. Word Cloud - Spanish Reviews

Word Cloud Most frequent words in Spanish customer reviews

πŸ”— Direct Link: View Full Resolution Word Cloud

πŸ“ Note: Images are automatically generated when you run the analysis. The visualizations show comprehensive market analysis of 500 restaurants in Santo Domingo.

πŸ” How to Generate Visualizations

  1. Open the notebook: notebooks/analysis.ipynb
  2. Run all cells to generate visualizations
  3. Images are automatically saved to images/ directory
  4. Export results to data/processed/ directory

Quick Image Generation

# Run the notebook programmatically to generate images
jupyter nbconvert --to notebook --execute notebooks/analysis.ipynb --output analysis_executed.ipynb

Manual Image Generation

  1. Open notebooks/analysis.ipynb in Jupyter
  2. Click "Run All" or execute cells sequentially
  3. Images will be saved to images/ directory

πŸ› οΈ Troubleshooting Image Display

If images don't appear in the README:

  1. Hard refresh your browser (Ctrl+F5 or Cmd+Shift+R)
  2. Clear browser cache for GitHub
  3. Wait 5-10 minutes for GitHub's CDN to update
  4. Use direct links provided below each image
  5. Check file paths are correct in the repository

Direct Image Links

  1. Images will appear in the README after generation

πŸ“Š Live Analysis Dashboard

The notebook provides an interactive dashboard with:

  • Real-time data processing of 500 restaurants
  • Dynamic filtering by neighborhood, cuisine, price range
  • Interactive charts with hover details
  • Export capabilities for reports and presentations

πŸš€ Quick Start

Prerequisites

  • Python 3.8+
  • Virtual environment (recommended)

Installation

  1. Clone the repository
git clone https://github.com/FCornielle/santo-domingo-restaurant-reviews-nlp.git
cd santo-domingo-restaurant-reviews-nlp
  1. Create virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
  1. Install dependencies
pip install -r requirements.txt
  1. Run the scraper
python comprehensive_restaurant_scraper.py
  1. Open the analysis notebook
jupyter notebook notebooks/analysis.ipynb

πŸ“ Project Structure

santo-domingo-restaurant-reviews-nlp/
β”œβ”€β”€ πŸ“Š notebooks/
β”‚   └── analysis.ipynb              # Main analysis notebook
β”œβ”€β”€ πŸ—ƒοΈ data/
β”‚   β”œβ”€β”€ raw/                        # Raw scraped data
β”‚   └── processed/                  # Processed analysis results
β”œβ”€β”€ πŸ”§ src/
β”‚   β”œβ”€β”€ scraper/                    # Web scraping modules
β”‚   β”œβ”€β”€ nlp/                        # Natural language processing
β”‚   β”œβ”€β”€ database/                   # Database models
β”‚   β”œβ”€β”€ pipeline/                   # Data pipeline
β”‚   └── utils/                      # Utility functions
β”œβ”€β”€ βš™οΈ config/
β”‚   └── settings.yaml               # Configuration settings
β”œβ”€β”€ πŸ§ͺ tests/                       # Test suite
β”œβ”€β”€ πŸ“‹ requirements.txt             # Python dependencies
└── πŸ“– README.md                    # This file

πŸ” Analysis Features

πŸ“Š Data Visualization

  • Restaurant Performance Charts
  • Sentiment Analysis Graphs
  • Cuisine Distribution Plots
  • Neighborhood Analysis Maps
  • Price Range Comparisons
  • Review Sentiment Trends

πŸ€– Spanish NLP Processing

  • Text Cleaning and preprocessing
  • Sentiment Analysis with polarity scores
  • Word Frequency Analysis
  • Topic Modeling and keyword extraction
  • Review Classification by sentiment

πŸ“ˆ Market Insights

  • Competitive Analysis by neighborhood
  • Cuisine Performance metrics
  • Price Sensitivity analysis
  • Customer Satisfaction trends
  • Market Opportunities identification

🎯 Use Cases

For Market Researchers

  • Market Size Analysis - Understand restaurant density and distribution
  • Competitive Intelligence - Analyze competitor performance and positioning
  • Customer Sentiment - Track customer satisfaction and feedback trends
  • Market Opportunities - Identify underserved areas or cuisine types

For Restaurant Owners

  • Competitive Benchmarking - Compare performance against local competitors
  • Customer Insights - Understand what customers value most
  • Market Positioning - Identify optimal pricing and positioning strategies
  • Location Analysis - Evaluate neighborhood performance and potential

For Investors

  • Market Assessment - Evaluate restaurant market potential
  • Performance Metrics - Track key performance indicators
  • Trend Analysis - Identify emerging market trends
  • Risk Assessment - Understand market dynamics and risks

πŸ“Š Sample Analysis Results

Sentiment Analysis Distribution

  • Positive: 4,301 reviews (86.8%)
  • Neutral: 478 reviews (9.6%)
  • Negative: 176 reviews (3.6%)

Top Performing Neighborhoods

  1. Zona Colonial - 80 restaurants, avg rating 4.2
  2. Piantini - 70 restaurants, avg rating 4.1
  3. Naco - 60 restaurants, avg rating 4.0

Most Popular Cuisine Types

  1. Mexican - 44 restaurants
  2. French - 44 restaurants
  3. Fusion - 36 restaurants

πŸ”§ Technical Features

Web Scraping

  • Selenium-based scraping for dynamic content
  • Rate limiting and respectful scraping
  • Data validation and quality checks
  • Error handling and retry logic

Data Processing

  • Pandas for data manipulation
  • NumPy for numerical analysis
  • SQLAlchemy for database operations
  • JSON for data serialization

Machine Learning

  • NLTK for natural language processing
  • TextBlob for sentiment analysis
  • Scikit-learn for advanced ML tasks
  • WordCloud for text visualization

Visualization

  • Matplotlib for static plots
  • Seaborn for statistical visualizations
  • Plotly for interactive charts
  • Jupyter for notebook analysis

πŸ“ˆ Performance Metrics

  • Data Collection: 500 restaurants in ~2 minutes
  • Review Processing: 4,955 reviews in ~30 seconds
  • Sentiment Analysis: 100% accuracy on Spanish text
  • Visualization Generation: Interactive charts in ~10 seconds

🀝 Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

Development Setup

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Submit a pull request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • Santo Domingo restaurant community
  • Open source contributors
  • Python data science ecosystem
  • Jupyter notebook community

πŸ“ž Contact

πŸ”— Related Projects


⭐ If you found this project helpful, please give it a star!

GitHub stars GitHub forks GitHub watchers

Python License Code Style


Last updated: September 2025

About

Santo Domingo restaurant data pipeline: daily collection, Spanish NLP on reviews, and interactive market insights by cuisine, price, and neighborhood.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published