A comprehensive multi-task learning system for Natural Language Processing that simultaneously detects emotions, violence, and hate speech in text data.
This project implements a deep learning model that can perform three different NLP tasks simultaneously:
- Emotion Detection: Classifies text into 6 emotions (sadness, joy, love, anger, fear, surprise)
- Violence Detection: Identifies 5 types of violence (sexual, physical, emotional, harmful traditional practices, economic)
- Hate Speech Detection: Detects offensive speech, neutral content, and hate speech
- Multi-Task Learning: Single model handles multiple NLP tasks efficiently
- Web Interface: User-friendly Flask web application
- Real-time Prediction: Instant text analysis with confidence scores
- Comprehensive Evaluation: Detailed metrics and visualizations
- Modular Architecture: Well-organized, maintainable code structure
โโโ dataset_load.py # Dataset loading and preprocessing
โโโ data_preprocessing.py # Text cleaning and tokenization
โโโ model.py # Multi-task neural network architecture
โโโ train.py # Training pipeline
โโโ test.py # Testing and prediction utilities
โโโ evaluate.py # Model evaluation and metrics
โโโ utils.py # Utility functions
โโโ config.py # Configuration settings
โโโ app.py # Flask web application
โโโ templates/
โ โโโ index.html # Web interface
โโโ requirements.txt # Python dependencies
โโโ README.md # This file
-
Clone the repository:
git clone https://github.com/uzzal2200/NLP-Multi-Task-Learning-System.git cd multi-task-nlp-project -
Install dependencies:
pip install -r requirements.txt
-
Download NLTK data (if not already downloaded):
import nltk nltk.download('punkt') nltk.download('stopwords')
Place your datasets in the following structure:
Dataset/
โโโ Emotion/
โ โโโ text.csv
โโโ Violence/
โ โโโ Train.csv
โโโ Hate/
โโโ labeled_data.csv
- Emotion Dataset: Columns should include 'text' and 'label' (0-5)
- Violence Dataset: Columns should include 'tweet' and 'type'
- Hate Dataset: Columns should include 'tweet' and 'class'
python train.pyThis will:
- Load and preprocess all datasets
- Create and train the multi-task model
- Save the trained model to
Save model/multi_task_model.h5 - Save the tokenizer to
Save model/tokenizer.pkl
python app.pyThen open your browser and go to: http://localhost:5000
# Test dataset loading
python dataset_load.py
# Test data preprocessing
python data_preprocessing.py
# Test model creation
python model.py
# Test evaluation
python evaluate.py- Open the web application
- Enter text in the input field
- Click "Analyze Text"
- View results for all three tasks
from test import MultiTaskPredictor
from utils import load_tokenizer
# Load model and tokenizer
tokenizer = load_tokenizer('Save model/tokenizer.pkl')
predictor = MultiTaskPredictor('Save model/multi_task_model.h5', tokenizer)
# Make predictions
text = "I am so happy today!"
result = predictor.predict_single(text)
print(f"Major Task: {result['major_task']}")
print(f"Recommended Label: {result['recommended_label']}")
print(f"Confidence: {result['recommended_confidence']:.4f}")The multi-task model consists of:
- Shared Embedding Layer: 128-dimensional word embeddings
- Shared LSTM Layer: 64 units with return sequences
- Shared Pooling: Global average pooling
- Task-Specific Outputs: Dense layers for each task
- Emotion: 6 classes (softmax)
- Violence: 5 classes (softmax)
- Hate: 3 classes (softmax)
Edit config.py to customize:
- Dataset paths
- Model hyperparameters
- Training parameters
- Evaluation settings
The system provides comprehensive evaluation including:
- Accuracy: Overall classification accuracy
- Precision: Precision for each class
- Recall: Recall for each class
- F1-Score: Harmonic mean of precision and recall
- Confusion Matrices: Visual representation of predictions
The Flask application provides these endpoints:
GET /: Main web interfacePOST /predict: Text prediction APIGET /health: Health checkGET /model_info: Model informationGET /example_predictions: Example predictions
curl -X POST http://localhost:5000/predict \
-H "Content-Type: application/json" \
-d '{"text": "I am so happy today!"}'Typical performance metrics:
- Emotion Detection: ~85-90% accuracy
- Violence Detection: ~80-85% accuracy
- Hate Speech Detection: ~85-90% accuracy
Note: Performance may vary based on dataset quality and training parameters
- Modify
model.pyto add new output layers - Update
config.pywith new task configuration - Modify data loading in
dataset_load.py - Update evaluation metrics in
evaluate.py
- Edit the model creation function in
model.py - Adjust hyperparameters in
config.py - Retrain the model using
train.py
-
Model not found error:
- Ensure you've run
train.pyfirst - Check that
Save model/directory exists
- Ensure you've run
-
Tokenizer not found error:
- The app will auto-generate tokenizer on first run
- Or run
train.pyto create it manually
-
Dataset path errors:
- Verify dataset paths in
config.py - Ensure CSV files exist and are readable
- Verify dataset paths in
-
Memory issues:
- Reduce batch size in
config.py - Use smaller max_length for sequences
- Reduce batch size in
This project is open source and available under the MIT License.
Contributions are welcome! Please feel free to submit a Pull Request.
If you encounter any issues or have questions:
- Check the troubleshooting section above
- Review the code comments for guidance
- Open an issue on the repository
- v1.0.0: Initial release with multi-task learning implementation
- v1.1.0: Added web interface and API endpoints
- v1.2.0: Enhanced evaluation and visualization features
Happy Coding! ๐