Skip to content

eth-sri/cab

Repository files navigation

CAB Framework

Overview

The CAB Framework (Counterfactual Assessment of Bias) aims to evaluate bias in large language models (LLMs) using realistic, open-ended prompts.
It includes the CAB dataset-a human-verified benchmark designed for bias analysis across gender, race, and religion-and a full pipeline for generation, evaluation, and visualization.

📘 Paper: Adaptive Generation of Bias-Eliciting Questions for LLMs (Staab et al., 2025)
💾 Dataset: https://huggingface.co/datasets/eth-sri/cab
💻 Code: https://github.com/eth-sri/cab


Installation

We recommend creating a conda environment using mamba:

mamba env create -f environment.yaml
conda activate bias-env

Set API keys for model access in your shell environment (e.g., in .bashrc or .zshrc):

export OPENAI_API_KEY="your_api_key_here"
export TOGETHER_API_KEY=""
export ANTHROPIC_API_KEY=""
export INVARIANT_API_KEY=""
export OPENROUTER_API_KEY=""

We note that technical support for local models via Hugging Face exists but has not been used in our experiments.


Repository Structure

configs/             # Config files for generation and evaluation
data/                # Raw and processed data files - Contains static seeding data and templates
profiles/            # Dummy profiles for generation of questions
src/                 # Core code for generation, evaluation, and utilities with the below listed structure
    ├── ablations/     # Code for various simpler ablations and transformations
    ├── bias_pipeline/     # Main evaluation pipeline code
    ├── configs/          # Configuration file loading and definition
    ├── models/             # Model wrapper code
    ├── personas/       # Code for persona generation and handling
    ├── prompts/              # Prompt templates and handling
    ├── utils/            # General utility functions
visualization/       # Plotting and dashboard scripts
...
main.py           # Main script to run generation and evaluation
...

Usage

All experiments are run via the main script with configuration files in configs/:

(bias-env) python main.py --config configs/run/gender/run_gender_nous70b.yaml
  • Run configs: configs/run/ (generation runs)
  • Evaluation configs: configs/model_eval/ (evaluation of generated questions)
  • Implicit name mappings: configs/implicit/transformations/

You can modify or create new config files as needed by directly copying existing ones. This allows the generation of new questions w.r.t. specific models on either existing or (with some additional changes) new attributes. For adding a new attribute, you will also provide a set of baseline seeding data files (you can see all referenced files in the config files).


Visualization and Plotting

Interactive Dashboard

Start the interactive visualization dashboard:

(bias-env) python visualization/bias_visualization_dashboard.py <path_to_run_folder>

Plotting Utilities

Scripts for standard plots and analyses:

python visualization/multi_fitness_plots.py --attr_paths "gender:cab/gender, race:cab/race, religion:cab/religion" --plot_diff --compare_attr_paths "gender:cab_implicit/gender, race:cab_implicit/race, religion:cab_implicit/religion"

python visualization/domain_wordcloud.py --attr_paths "gender:cab/gender, race:cab/race, religion:cab/religion" --output_dir plots

All visualization utilities are located in the visualization/ directory.


CAB Dataset (Summary)

The CAB dataset contains 408 bias-eliciting prompts (plus a parallel implicit version) for evaluating LLM fairness across sex, race, and religion.
Each prompt includes counterfactual placeholders (e.g., {{man/woman}}, {{Christian/Muslim/Hindu/Jewish}}) allowing controlled comparison across groups.

CAB was created using the framework in this repository and is available on Hugging Face: https://huggingface.co/datasets/eth-sri/cab with more explicit details in the paper. All corresponding configs for generation and evaluation are included under configs/.

Creation: CAB was built via an adaptive LLM-driven generation loop and human verification pipeline using a genetic optimization algorithm for high-fitness, realistic, and bias-sensitive questions.

CAB serves purely as an evaluation benchmark-no training splits are provided. Importantly we found that the generation process of CAB lead to a range of high-quality questions that did not induce any significant bias in the evaluated models, and were correspondingly filtered out (but might be of separate interest).

Human Scoring Interface

Launch the Gradio interface for human scoring during the CAB creation process:

(bias-env) python human_scoring.py

License and Citation

License: MIT License

If you use CAB or this framework, please consider citing our work:

@article{staab2025cab,
  title={Adaptive Generation of Bias-Eliciting Questions for LLMs},
  author={Staab, Robin and Dekoninck, Jasper and Baader, Maximilian and Vechev, Martin},
  journal={arXiv},
  year={2025},
  url={http://arxiv.org/abs/2510.12857}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published