Skip to content

uzzal2200/NLP-Multi-Task-Learning-System

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

7 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Multi-Task Learning with NLP

A comprehensive multi-task learning system for Natural Language Processing that simultaneously detects emotions, violence, and hate speech in text data.

๐ŸŽฏ Project Overview

This project implements a deep learning model that can perform three different NLP tasks simultaneously:

  • Emotion Detection: Classifies text into 6 emotions (sadness, joy, love, anger, fear, surprise)
  • Violence Detection: Identifies 5 types of violence (sexual, physical, emotional, harmful traditional practices, economic)
  • Hate Speech Detection: Detects offensive speech, neutral content, and hate speech

๐Ÿš€ Features

  • Multi-Task Learning: Single model handles multiple NLP tasks efficiently
  • Web Interface: User-friendly Flask web application
  • Real-time Prediction: Instant text analysis with confidence scores
  • Comprehensive Evaluation: Detailed metrics and visualizations
  • Modular Architecture: Well-organized, maintainable code structure

๐Ÿ“ Project Structure

โ”œโ”€โ”€ dataset_load.py          # Dataset loading and preprocessing
โ”œโ”€โ”€ data_preprocessing.py    # Text cleaning and tokenization
โ”œโ”€โ”€ model.py                # Multi-task neural network architecture
โ”œโ”€โ”€ train.py                # Training pipeline
โ”œโ”€โ”€ test.py                 # Testing and prediction utilities
โ”œโ”€โ”€ evaluate.py             # Model evaluation and metrics
โ”œโ”€โ”€ utils.py                # Utility functions
โ”œโ”€โ”€ config.py               # Configuration settings
โ”œโ”€โ”€ app.py                  # Flask web application
โ”œโ”€โ”€ templates/
โ”‚   โ””โ”€โ”€ index.html          # Web interface
โ”œโ”€โ”€ requirements.txt        # Python dependencies
โ””โ”€โ”€ README.md              # This file

๐Ÿ› ๏ธ Installation

  1. Clone the repository:

    git clone https://github.com/uzzal2200/NLP-Multi-Task-Learning-System.git
    cd multi-task-nlp-project
  2. Install dependencies:

    pip install -r requirements.txt
  3. Download NLTK data (if not already downloaded):

    import nltk
    nltk.download('punkt')
    nltk.download('stopwords')

๐Ÿ“Š Dataset Requirements

Place your datasets in the following structure:

Dataset/
โ”œโ”€โ”€ Emotion/
โ”‚   โ””โ”€โ”€ text.csv
โ”œโ”€โ”€ Violence/
โ”‚   โ””โ”€โ”€ Train.csv
โ””โ”€โ”€ Hate/
    โ””โ”€โ”€ labeled_data.csv

Dataset Formats:

  • Emotion Dataset: Columns should include 'text' and 'label' (0-5)
  • Violence Dataset: Columns should include 'tweet' and 'type'
  • Hate Dataset: Columns should include 'tweet' and 'class'

๐Ÿƒโ€โ™‚๏ธ Quick Start

1. Training the Model

python train.py

This will:

  • Load and preprocess all datasets
  • Create and train the multi-task model
  • Save the trained model to Save model/multi_task_model.h5
  • Save the tokenizer to Save model/tokenizer.pkl

2. Running the Web Application

python app.py

Then open your browser and go to: http://localhost:5000

3. Testing Individual Components

# Test dataset loading
python dataset_load.py

# Test data preprocessing
python data_preprocessing.py

# Test model creation
python model.py

# Test evaluation
python evaluate.py

๐ŸŽฎ Usage Examples

Web Interface

  1. Open the web application
  2. Enter text in the input field
  3. Click "Analyze Text"
  4. View results for all three tasks

Programmatic Usage

from test import MultiTaskPredictor
from utils import load_tokenizer

# Load model and tokenizer
tokenizer = load_tokenizer('Save model/tokenizer.pkl')
predictor = MultiTaskPredictor('Save model/multi_task_model.h5', tokenizer)

# Make predictions
text = "I am so happy today!"
result = predictor.predict_single(text)
print(f"Major Task: {result['major_task']}")
print(f"Recommended Label: {result['recommended_label']}")
print(f"Confidence: {result['recommended_confidence']:.4f}")

๐Ÿ“ˆ Model Architecture

The multi-task model consists of:

  • Shared Embedding Layer: 128-dimensional word embeddings
  • Shared LSTM Layer: 64 units with return sequences
  • Shared Pooling: Global average pooling
  • Task-Specific Outputs: Dense layers for each task
    • Emotion: 6 classes (softmax)
    • Violence: 5 classes (softmax)
    • Hate: 3 classes (softmax)

๐Ÿ”ง Configuration

Edit config.py to customize:

  • Dataset paths
  • Model hyperparameters
  • Training parameters
  • Evaluation settings

๐Ÿ“Š Evaluation Metrics

The system provides comprehensive evaluation including:

  • Accuracy: Overall classification accuracy
  • Precision: Precision for each class
  • Recall: Recall for each class
  • F1-Score: Harmonic mean of precision and recall
  • Confusion Matrices: Visual representation of predictions

๐ŸŒ API Endpoints

The Flask application provides these endpoints:

  • GET /: Main web interface
  • POST /predict: Text prediction API
  • GET /health: Health check
  • GET /model_info: Model information
  • GET /example_predictions: Example predictions

API Usage Example

curl -X POST http://localhost:5000/predict \
  -H "Content-Type: application/json" \
  -d '{"text": "I am so happy today!"}'

๐ŸŽฏ Performance

Typical performance metrics:

  • Emotion Detection: ~85-90% accuracy
  • Violence Detection: ~80-85% accuracy
  • Hate Speech Detection: ~85-90% accuracy

Note: Performance may vary based on dataset quality and training parameters

๐Ÿ› ๏ธ Customization

Adding New Tasks

  1. Modify model.py to add new output layers
  2. Update config.py with new task configuration
  3. Modify data loading in dataset_load.py
  4. Update evaluation metrics in evaluate.py

Changing Model Architecture

  1. Edit the model creation function in model.py
  2. Adjust hyperparameters in config.py
  3. Retrain the model using train.py

๐Ÿ› Troubleshooting

Common Issues:

  1. Model not found error:

    • Ensure you've run train.py first
    • Check that Save model/ directory exists
  2. Tokenizer not found error:

    • The app will auto-generate tokenizer on first run
    • Or run train.py to create it manually
  3. Dataset path errors:

    • Verify dataset paths in config.py
    • Ensure CSV files exist and are readable
  4. Memory issues:

    • Reduce batch size in config.py
    • Use smaller max_length for sequences

๐Ÿ“ License

This project is open source and available under the MIT License.

๐Ÿค Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

๐Ÿ“ž Support

If you encounter any issues or have questions:

  1. Check the troubleshooting section above
  2. Review the code comments for guidance
  3. Open an issue on the repository

๐Ÿ”„ Version History

  • v1.0.0: Initial release with multi-task learning implementation
  • v1.1.0: Added web interface and API endpoints
  • v1.2.0: Enhanced evaluation and visualization features

Happy Coding! ๐Ÿš€

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published