CAB Framework

Overview

The CAB Framework (Counterfactual Assessment of Bias) aims to evaluate bias in large language models (LLMs) using realistic, open-ended prompts.
It includes the CAB dataset-a human-verified benchmark designed for bias analysis across gender, race, and religion-and a full pipeline for generation, evaluation, and visualization.

📘 Paper: Adaptive Generation of Bias-Eliciting Questions for LLMs (Staab et al., 2025)
💾 Dataset: https://huggingface.co/datasets/eth-sri/cab
💻 Code: https://github.com/eth-sri/cab

Installation

We recommend creating a conda environment using mamba:

mamba env create -f environment.yaml
conda activate bias-env

Set API keys for model access in your shell environment (e.g., in .bashrc or .zshrc):

export OPENAI_API_KEY="your_api_key_here"
export TOGETHER_API_KEY=""
export ANTHROPIC_API_KEY=""
export INVARIANT_API_KEY=""
export OPENROUTER_API_KEY=""

We note that technical support for local models via Hugging Face exists but has not been used in our experiments.

Repository Structure

configs/             # Config files for generation and evaluation
data/                # Raw and processed data files - Contains static seeding data and templates
profiles/            # Dummy profiles for generation of questions
src/                 # Core code for generation, evaluation, and utilities with the below listed structure
    ├── ablations/     # Code for various simpler ablations and transformations
    ├── bias_pipeline/     # Main evaluation pipeline code
    ├── configs/          # Configuration file loading and definition
    ├── models/             # Model wrapper code
    ├── personas/       # Code for persona generation and handling
    ├── prompts/              # Prompt templates and handling
    ├── utils/            # General utility functions
visualization/       # Plotting and dashboard scripts
...
main.py           # Main script to run generation and evaluation
...

Usage

All experiments are run via the main script with configuration files in configs/:

(bias-env) python main.py --config configs/run/gender/run_gender_nous70b.yaml

Run configs: configs/run/ (generation runs)
Evaluation configs: configs/model_eval/ (evaluation of generated questions)
Implicit name mappings: configs/implicit/transformations/

You can modify or create new config files as needed by directly copying existing ones. This allows the generation of new questions w.r.t. specific models on either existing or (with some additional changes) new attributes. For adding a new attribute, you will also provide a set of baseline seeding data files (you can see all referenced files in the config files).

Visualization and Plotting

Interactive Dashboard

Start the interactive visualization dashboard:

(bias-env) python visualization/bias_visualization_dashboard.py <path_to_run_folder>

Plotting Utilities

Scripts for standard plots and analyses:

python visualization/multi_fitness_plots.py --attr_paths "gender:cab/gender, race:cab/race, religion:cab/religion" --plot_diff --compare_attr_paths "gender:cab_implicit/gender, race:cab_implicit/race, religion:cab_implicit/religion"

python visualization/domain_wordcloud.py --attr_paths "gender:cab/gender, race:cab/race, religion:cab/religion" --output_dir plots

All visualization utilities are located in the visualization/ directory.

CAB Dataset (Summary)

The CAB dataset contains 408 bias-eliciting prompts (plus a parallel implicit version) for evaluating LLM fairness across sex, race, and religion.
Each prompt includes counterfactual placeholders (e.g., {{man/woman}}, {{Christian/Muslim/Hindu/Jewish}}) allowing controlled comparison across groups.

CAB was created using the framework in this repository and is available on Hugging Face: https://huggingface.co/datasets/eth-sri/cab with more explicit details in the paper. All corresponding configs for generation and evaluation are included under configs/.

Creation: CAB was built via an adaptive LLM-driven generation loop and human verification pipeline using a genetic optimization algorithm for high-fitness, realistic, and bias-sensitive questions.

CAB serves purely as an evaluation benchmark-no training splits are provided. Importantly we found that the generation process of CAB lead to a range of high-quality questions that did not induce any significant bias in the evaluated models, and were correspondingly filtered out (but might be of separate interest).

Human Scoring Interface

Launch the Gradio interface for human scoring during the CAB creation process:

(bias-env) python human_scoring.py

License and Citation

License: MIT License

If you use CAB or this framework, please consider citing our work:

@article{staab2025cab,
  title={Adaptive Generation of Bias-Eliciting Questions for LLMs},
  author={Staab, Robin and Dekoninck, Jasper and Baader, Maximilian and Vechev, Martin},
  journal={arXiv},
  year={2025},
  url={http://arxiv.org/abs/2510.12857}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
configs		configs
data		data
profiles		profiles
src		src
visualization		visualization
.gitignore		.gitignore
README.md		README.md
costs.py		costs.py
environment.yaml		environment.yaml
human_scoring.py		human_scoring.py
llm_costs.csv		llm_costs.csv
main.py		main.py
plot.sh		plot.sh
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CAB Framework

Overview

Installation

Repository Structure

Usage

Visualization and Plotting

Interactive Dashboard

Plotting Utilities

CAB Dataset (Summary)

Human Scoring Interface

License and Citation

About

Uh oh!

Releases

Packages

Languages

eth-sri/cab

Folders and files

Latest commit

History

Repository files navigation

CAB Framework

Overview

Installation

Repository Structure

Usage

Visualization and Plotting

Interactive Dashboard

Plotting Utilities

CAB Dataset (Summary)

Human Scoring Interface

License and Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages