Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 10 additions & 2 deletions scripts/terraform/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,14 @@ terraform.tfstate.lock.hcl
# Sensitive files
.env

# Keys and sensitive data
*.pem
*.key
vm_*_private_key.*

# keys
*.pem
# Configuration files with sensitive data
config.yml
outputs.json

# Generated Ansible inventory
inventory_generated.yml
101 changes: 101 additions & 0 deletions scripts/terraform/QUICK_REFERENCE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
# AIOpsLab Automation Quick Reference

## 🚀 Quick Start
```bash
cd AIOpsLab/scripts/terraform/
az login
terraform init
python deploy.py
```

## 🔧 Advanced Usage

### Deploy with specific parameters
```bash
python deploy_unified.py deploy \
--resource-group "my-aiopslab-rg" \
--prefix "aiopslab" \
--location "westus2" \
--create-resource-group
```

### Deploy with configuration file
```bash
cp config.yml.example config.yml
# Edit config.yml with your settings
python deploy_unified.py deploy --config config.yml
```

### Verify deployment
```bash
python verify_deployment.py
```

### Destroy infrastructure
```bash
python deploy_unified.py destroy \
--resource-group "my-aiopslab-rg" \
--prefix "aiopslab"
```

## 📁 File Structure
```
scripts/terraform/
├── deploy.py # Simple deployment wrapper
├── deploy_unified.py # Advanced deployment orchestration
├── verify_deployment.py # Post-deployment verification
├── config.yml.example # Configuration template
├── test_deployment.py # Automated tests
├── main.tf # Core infrastructure definition
├── variables.tf # Input variables
├── outputs.tf # Output values
├── data.tf # Data sources and locals
├── ssh.tf # SSH key management
└── providers.tf # Provider configuration
```

## ⚡ What's Automated
- ✅ Azure infrastructure provisioning
- ✅ SSH key generation and management
- ✅ Ansible inventory generation
- ✅ Kubernetes cluster setup
- ✅ AIOpsLab installation
- ✅ Network connectivity validation
- ✅ Error handling and retries
- ✅ Post-deployment verification

## 🔍 Troubleshooting

### Common Issues
- **SSH connectivity**: Wait 2-3 minutes after deployment
- **Ansible failures**: Check VM accessibility with `verify_deployment.py`
- **Terraform state**: Use `terraform refresh` if needed
- **Kubernetes issues**: SSH to nodes and check `systemctl status kubelet`

### Debug Commands
```bash
# Check deployment status
python verify_deployment.py

# View Terraform outputs
terraform output

# Test SSH connectivity
ssh -i vm_1_private_key.pem azureuser@<controller-ip>

# Check Kubernetes cluster
kubectl get nodes
```

## 🔐 Security Notes
- SSH keys are auto-generated and stored locally
- VM SSH ports are open to public by default
- Update NSG rules in `main.tf` for production use
- Delete private key files after use if not needed

## 📞 Support
For issues and questions:
1. Check the troubleshooting section in README.md
2. Run `verify_deployment.py` for detailed diagnostics
3. Review logs from the deployment script
4. Check Azure portal for resource status
185 changes: 130 additions & 55 deletions scripts/terraform/README.md
Original file line number Diff line number Diff line change
@@ -1,88 +1,163 @@
## Setting up AIOpsLab using Terraform and Ansible

## Setting up AIOpsLab using Terraform

This guide outlines the steps for establishing a secure connection to your Azure environment using a VPN and then provisioning resources with Terraform. This will create a two-node Kubernetes cluster with one controller and one worker node.
This guide outlines the automated steps for provisioning Azure infrastructure using Terraform and configuring AIOpsLab using Ansible. The process has been significantly streamlined to reduce manual steps.

**NOTE**: This will incur cloud costs as resources are created on Azure.

**Prerequisites:**

- **Azure VPN Connection:** Set up a secure connection to your Azure environment using a VPN client.
- **Working directory:** AIOpsLab/scripts/terraform/
- **Privileges:** The user should have the privileges to create resources (SSH keys, VM, network interface, network interface security group (if required), public IP, subnet, virtual network) in the selected resource group.
- **Azure CLI:** Follow the official [Microsoft documentation](https://learn.microsoft.com/en-us/cli/azure/install-azure-cli) for installing the Azure CLI for your operating system:
- **Install and initialize Terraform:**

a. Download and install Terraform from the [official HashiCorp website](https://developer.hashicorp.com/terraform/install);

b. To make the initial dependency selections that will initialize the dependency lock file, run:

terraform init

**Steps:**

1. **Authenticate with Azure CLI**
### Prerequisites

Open a terminal window and run the following command to log in to Azure:
- **Azure CLI:** Follow the official [Microsoft documentation](https://learn.microsoft.com/en-us/cli/azure/install-azure-cli) for installing the Azure CLI
- **Terraform:** Download and install from the [official HashiCorp website](https://developer.hashicorp.com/terraform/install)
- **Ansible:** Install using your package manager (e.g., `sudo apt install ansible` on Ubuntu)
- **Python 3.11+** with PyYAML: `pip install PyYAML`
- **Azure VPN Connection:** Set up a secure connection to your Azure environment using a VPN client
- **Privileges:** The user should have privileges to create resources (SSH keys, VMs, networking, storage) in Azure

### Quick Start (Recommended)

1. **Authenticate with Azure CLI**
```shell
az login
az account set --subscription "<your-subscription-id>"
```

2. **Navigate to the Terraform directory**
```shell
cd AIOpsLab/scripts/terraform/
```

2. **Select subscription**
3. **Initialize Terraform**
```shell
terraform init
```

The output of az login will have a list of subscriptions you have access to. Copy the value in the "id" column of the subscription you want to work with:
4. **Run the simplified deployment script**
```shell
python deploy.py
```

The script will prompt you for:
- Resource group name
- Resource name prefix
- Azure region (default: westus2)
- Whether to create a new resource group

5. **Wait for completion**
The script will automatically:
- Provision Azure infrastructure using Terraform
- Generate Ansible inventory from Terraform outputs
- Install and configure Kubernetes using Ansible
- Set up AIOpsLab on the controller node
- Display SSH connection information

### Advanced Usage

For more control over the deployment process, use the unified deployment tool directly:

```shell
# Deploy with specific parameters
python deploy_unified.py deploy \
--resource-group "my-aiopslab-rg" \
--prefix "aiopslab" \
--location "westus2" \
--create-resource-group

# Destroy infrastructure
python deploy_unified.py destroy \
--resource-group "my-aiopslab-rg" \
--prefix "aiopslab"

# Use configuration file
cp config.yml.example config.yml
# Edit config.yml with your settings
python deploy_unified.py deploy --config config.yml
```

### Manual Steps (Legacy Process)

If you prefer the manual approach or need to troubleshoot, you can still use the individual components:

1. **Create Terraform plan**
```shell
az account set --subscription "<id>"
terraform plan -out main.tfplan \
-var "resource_group_name=<rg>" \
-var "resource_name_prefix=<prefix>"
```
3. **Verify the plan**

*Note*: The SSH port of the VMs is open to the public. Please update the NSG resource in the main.tf file to restrict incoming traffic. Use the source_address_prefix attribute to specify allowed sources (e.g., source_address_prefix = "CorpNetPublic").
2. **Apply the plan**
```shell
terraform apply "main.tfplan"
```

Create and save the plan by passing the required variables
3. **Generate Ansible inventory**
```shell
# The unified script does this automatically, but you can generate manually:
terraform output -json > outputs.json
# Then create inventory.yml based on the outputs
```

4. **Run Ansible playbooks**
```shell
cd ../ansible/
ansible-playbook -i inventory.yml setup_common.yml
ansible-playbook -i inventory.yml remote_setup_controller_worker.yml
```

a) _resource_group_name_ (rg): the resource group where the resources would be created.
### Post-Deployment

b) _resource_prefix_name_ (prefix): a prefix for all the resources created using the Terraform script.
After successful deployment:

1. **SSH into the controller node**
```shell
terraform plan -out main.tfplan -var " resource_group_name=<rg>" -var "resource_name_prefix=<prefix>"
ssh -i vm_1_private_key.pem azureuser@<controller-public-ip>
```
5. **Apply the saved plan**

Note: Verify the plan from the previous step before applying it.
2. **Activate the AIOpsLab environment**
```shell
cd ~/AIOpsLab
source .venv/bin/activate
```

3. **Verify Kubernetes cluster**
```shell
terraform apply "main.tfplan"
kubectl get nodes
```

6. **Setup AIOpsLab**
Run the below script to setup AIOpsLab on the newly provisioned resources

```shell
python deploy.py
```
On successful execution, the script outputs the SSH commands to login to the controller and worker node. Please save it.
### Cleanup

Please activate virtual environment before running any scripts and add the path to `wrk2` executable to PATH:
To destroy the infrastructure:

```
azureuser@kubeController:~/AIOpsLab$ source .venv/bin/activate
(.venv) azureuser@kubeController:~/AIOpsLab/clients$ export PATH="$PATH:/home/azureuser/AIOpsLab/TargetMicroservices/wrk2"
```
```shell
# Using the unified tool
python deploy_unified.py destroy --resource-group "my-aiopslab-rg" --prefix "aiopslab"

**How to destroy the resources using Terraform?**
# Or manually
terraform destroy -auto-approve
```

1. Before deleting the resources, run the below command to create and save a plan (use the values previous used for resource_group_name and resource_name_prefix)

```shell
terraform plan -destroy -out main.destroy.tfplan -var "resource_group_name=<rg>" -var "resource_name_prefix=<prefix>"
```
### Security Notes

- The SSH port of the VMs is open to the public by default. Update the NSG resources in main.tf to restrict incoming traffic using the `source_address_prefix` attribute (e.g., `source_address_prefix = "CorpNetPublic"`)
- SSH private keys are generated automatically and stored locally. Keep them secure and delete them after use if not needed
- Consider using Azure Key Vault for production deployments

### Troubleshooting

- **SSH connectivity issues**: Wait a few minutes after deployment for VMs to fully initialize
- **Ansible playbook failures**: Check that all VMs are accessible and have the correct SSH keys
- **Terraform state issues**: Use `terraform refresh` to update state if resources were modified outside Terraform
- **Kubernetes issues**: SSH into nodes and check service status with `systemctl status kubelet`

2. Once the plan is verified, remove the resources using the below command:
### What's Automated

```shell
terraform destroy main.destroy.tfplan
```
The enhanced deployment process now automates:
- ✅ Azure resource provisioning (VMs, networking, storage)
- ✅ SSH key generation and management
- ✅ Ansible inventory generation from Terraform outputs
- ✅ Kubernetes cluster setup and configuration
- ✅ AIOpsLab installation and configuration
- ✅ Network connectivity validation
- ✅ Error handling and progress reporting
- ✅ Infrastructure cleanup/destruction

This reduces the manual steps from ~15-20 individual commands to a single deployment command!
Loading