Skip to content

elliedel/electricty_etl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

⚑️ Electricity ETL Pipeline

This project automates the extraction, transformation, and loading (ETL) of electricity data from raw .csv and .json files into a MySQL database. It's based on a foundational DataCamp project and expanded as a full end-to-end data engineering pipeline.

πŸ“‚ Project Structure


.
β”œβ”€β”€ datasets/
β”‚   β”œβ”€β”€ electricity_sales.csv
β”‚   └── electricity_capability_nested.json
β”œβ”€β”€ main.py
β”œβ”€β”€ nootbook.ipynb
β”œβ”€β”€ .env
β”œβ”€β”€ requirements.txt
└── README.md

πŸ“Š Data Sources

  1. electricity_sales.csv

    • Monthly electricity sales data across different states and sectors.
    • Columns: period, stateid, stateDescription, sectorid, sectorName, price, price-units
  2. electricity_capability_nested.json

    • Nested JSON representing generation capacity per energy source per state.
    • Flattened to: period, stateId, stateDescription, energySourceid, energySourceDescription, energySourceCapability, energySourceCapabilityUnits

πŸ§ͺ ETL Pipeline Overview

  • Extract:

    • Reads CSV and JSON files from the datasets/ directory.
    • JSON is flattened for relational DB compatibility.
  • Transform:

    • For electricity_sales:
      • Filters only for residential and transportation sectors.
      • Parses period into year and month.
      • Drops unused columns and missing values.
    • For electricity_capability:
      • Normalizes nested JSON and converts capability to float.
  • Load:

    • Writes both cleaned DataFrames into a MySQL database using SQLAlchemy.

βš™οΈ Setup Instructions

  1. Clone the repository
git clone https://github.com/elliedel/electricty_etl.git
cd electricity-etl-project
  1. Create and configure your .env file
USER=your_mysql_username
PASSWORD=your_mysql_password
  1. Install dependencies
pip install -r requirements.txt
  1. Run the ETL pipeline
python etl_script.py

Make sure MySQL is running and the database electricity_db exists.

create schema if not exists electricity_db;

πŸ› οΈ Technologies Used

  • Python 3.10+
  • Pandas
  • SQLAlchemy
  • MySQL
  • dotenv

πŸ“ˆ Result

After running the ETL, you will have two tables in your MySQL database:

  • electricity_sales
  • electricity_capability

You can use these cleaned tables for further analytics or dashboarding.

πŸ“š Credits

  • Based on a DataCamp project: β€œPowering Data for the Department of Energy - Building an ETL Pipeline"

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published