Added ban list evaluation by rkritika1508 · Pull Request #62 · ProjectTech4DevAI/kaapi-guardrails

rkritika1508 · 2026-02-23T09:02:24Z

Summary

Target issue is #63.
Explain the motivation for making this change. What existing problem does the pull request solve?

Updated the script run_all_evaluations.sh to run evaluations for all validators, including ban list.
Added a dataset for ban list validator. The dataset can be found here.
Added code to evaluate the performance of ban list validator on the curated dataset.

To run evaluations, use this command -
scripts/run_all_evaluations.sh BAN_LIST_WORDS="sonography,gender check"

Checklist

Before submitting a pull request, please ensure that you mark these task.

Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
If you've fixed a bug or added code that is tested and has test cases.

Notes

Please add here if any other information is required for the reviewer.

Summary by CodeRabbit

New Features
- Added a ban-list evaluator with configurable banned-word checks, per-sample predictions, and aggregated metrics.
- Runner now accepts environment-variable assignments when invoking evaluations.
Documentation
- Updated evaluation guidance to document ban-list configuration, required environment variable, invocation examples, and output expectations.
- Reworded test descriptions to clarify curated datasets and benchmark scope.
Refactor
- Unified metrics and performance reporting across evaluators for consistent latency, memory, and summary payloads.

coderabbitai · 2026-02-23T09:08:11Z

📝 Walkthrough

Walkthrough

A new ban-list evaluator and reporting helpers were added. Documentation and the evaluation orchestration script were updated to require and propagate BAN_LIST_WORDS. Existing evaluators were refactored to use a shared evaluation-report builder and performance summarization utilities.

Changes

Cohort / File(s)	Summary
Documentation `backend/README.md`	Documented `BAN_LIST_WORDS` usage, added ban_list output path and standardized output structure.
Ban-list Evaluator `backend/app/evaluation/ban_list/run.py`	New evaluator that reads `BAN_LIST_WORDS`, loads dataset, exposes `run_ban_list(text) -> (redacted_text, fail_flag)`, profiles execution, computes binary/exact-match metrics, writes `outputs/ban_list/predictions.csv` and `metrics.json`.
Evaluation helpers `backend/app/evaluation/common/helper.py`	Added `summarize_latency`, `build_performance_payload`, and `build_evaluation_report` to centralize latency/memory summarization and standardize metrics JSON payloads.
Existing evaluators updated `backend/app/evaluation/gender_assumption_bias/run.py`, `backend/app/evaluation/lexical_slur/run.py`, `backend/app/evaluation/pii/run.py`	Replaced inline metrics dicts with `build_evaluation_report(...)` calls and adjusted imports to use the new helper and Profiler wiring.
Evaluation orchestration script `backend/scripts/run_all_evaluations.sh`	Exports CLI env assignments of form `KEY=VALUE` before running evaluators and added `ban_list` to the `RUNNERS` sequence.
Manifests / deps `manifest_file`, `requirements.txt`, `pyproject.toml`	Manifest/dependency records updated to account for added code (no public API signature changes other than new helper exports and `run_ban_list`).

Sequence Diagram(s)

sequenceDiagram
    participant Runner as "run_all_evaluations.sh"
    participant Env as "Environment\n(BAN_LIST_WORDS)"
    participant Evaluator as "ban_list/run.py\n(BanList Validator)"
    participant Dataset as "Dataset\n(files)"
    participant Profiler as "Profiler"
    participant Outputs as "Outputs\n(predictions.csv, metrics.json)"

    Runner->>Env: export KEY=VALUE args
    Runner->>Evaluator: invoke ban_list runner
    Evaluator->>Env: read BAN_LIST_WORDS
    Evaluator->>Dataset: load dataset (source/target/labels)
    Evaluator->>Profiler: start profiling
    Evaluator->>Evaluator: run_ban_list(text) per sample -> (redacted_text, fail_flag)
    Evaluator->>Profiler: stop profiling
    Evaluator->>Outputs: write predictions.csv
    Evaluator->>Outputs: build_evaluation_report(...) -> write metrics.json

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Added evaluation setup #17 — Extends the offline evaluation framework; likely overlaps on runners and helper utilities.
Added gender assumption bias evaluation #59 — Modifies evaluation runners and run_all_evaluations.sh; shares refactors to evaluator invocation and reporting.

Suggested labels

ready-for-review

Suggested reviewers

nishika26

Poem

🐰 I hopped through code with carrot cheer,
Ban words gathered, now crystal clear.
Metrics counted, profiles hum,
CSVs tidy — outputs come.
Hooray — the checks now bloom with vim! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Added ban list evaluation' directly and clearly describes the main change in the changeset—the addition of ban list evaluation functionality and supporting infrastructure.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/ban_list_evals

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (1)

backend/app/evaluation/ban_list/run.py (1)
28-28: Consider wrapping module-level execution in if __name__ == "__main__".

pd.read_csv(DATASET_PATH), the BanList instantiation, and all evaluation logic execute unconditionally at import time. Wrapping them in an entry-point guard would prevent accidental side effects if run_ban_list is ever imported by another module (e.g., a test), and also makes the FileNotFoundError easier to trace.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/evaluation/ban_list/run.py` at line 28, The module currently
executes pd.read_csv(DATASET_PATH) and the BanList instantiation/evaluation at
import time; wrap that module-level work in an entry-point guard by moving the
dataset load, the BanList(...) instantiation and the evaluation/run logic into a
protected block (either a small main() function invoked under if __name__ ==
"__main__": or directly under that guard) so importing this module (e.g., for
tests) does not trigger pd.read_csv or run the evaluation automatically.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/evaluation/ban_list/run.py`:
- Around line 48-53: The code unconditionally accesses df["target_text"] when
"label" is absent which raises KeyError if target_text is missing; update the
conditional in run.py to explicitly handle three cases: if "label" in df.columns
set df["y_true"] from df["label"]; elif both "source_text" and "target_text" in
df.columns compute df["y_true"] = (df["source_text"].astype(str) !=
df["target_text"].astype(str)).astype(int); else (e.g., only "source_text"
present) set df["y_true"] to a sensible default (such as 0 or NaN) or raise a
clear ValueError explaining missing "target_text"/"label" so callers know the
dataset is invalid. Ensure you reference the df, "y_true", "label",
"source_text", and "target_text" symbols when making the change.
- Around line 65-77: The latency calculations in the write_json payload use
p.latencies directly and will raise ZeroDivisionError/ValueError/IndexError when
p.latencies is empty and also compute p95 with an off-by-one; fix by guarding on
n = len(p.latencies): if n == 0 set latency stats to None (or 0 per project
convention) and otherwise compute mean = sum(p.latencies)/n, compute
sorted_latencies = sorted(p.latencies), compute p95_index = max(0,
math.ceil(0.95 * n) - 1) (import math) and use sorted_latencies[p95_index], and
compute max = max(p.latencies); update the block that builds the "latency_ms"
dict inside the write_json call that references p.latencies to use this guarded
logic.
- Around line 30-38: Instantiate BanList with the explicit
on_fail=OnFailAction.FIX (update the validator creation to pass OnFailAction.FIX
and add an import for OnFailAction if missing) so intent matches
ban_list_safety_validator_config.py/PIIRemover; in run_ban_list (the function
using validator.validate) keep the existing branch that returns
(result.fix_value, 1) when isinstance(result, FailResult) and add a defensive
else branch that returns the original text (or result.value) with a 0 flag to
handle non-FailResult cases safely.

In `@backend/scripts/run_all_evaluations.sh`:
- Around line 10-14: The loop in run_all_evaluations.sh that iterates over
arguments and attempts to export assignments uses export "$arg", which triggers
ShellCheck SC2163; replace export "$arg" with declare -x "$arg" inside the
for-loop so the string-form assignment is exported in the portable/idiomatic way
(locate the for arg in "$@" loop and change the export call to declare -x).

---

Nitpick comments:
In `@backend/app/evaluation/ban_list/run.py`:
- Line 28: The module currently executes pd.read_csv(DATASET_PATH) and the
BanList instantiation/evaluation at import time; wrap that module-level work in
an entry-point guard by moving the dataset load, the BanList(...) instantiation
and the evaluation/run logic into a protected block (either a small main()
function invoked under if __name__ == "__main__": or directly under that guard)
so importing this module (e.g., for tests) does not trigger pd.read_csv or run
the evaluation automatically.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2f7bd27 and a9e2db7.

📒 Files selected for processing (3)

backend/README.md
backend/app/evaluation/ban_list/run.py
backend/scripts/run_all_evaluations.sh

backend/app/evaluation/ban_list/run.py

backend/scripts/run_all_evaluations.sh

backend/app/evaluation/ban_list/run.py

backend/README.md

coderabbitai

♻️ Duplicate comments (1)

backend/scripts/run_all_evaluations.sh (1)
10-12: ⚠️ Potential issue | 🟡 Minor

Replace export "$arg" with declare -x "$arg" to avoid SC2163 and keep exports idiomatic.

This is the same ShellCheck warning noted earlier; the current form still doesn’t export the variable name reliably.
🔧 Proposed fix
 for arg in "$@"; do
   if [[ "$arg" == *=* ]]; then
-    export "$arg"
+    declare -x "$arg"
   fi
 done
To verify, re-run ShellCheck on this script after the change.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/scripts/run_all_evaluations.sh` around lines 10 - 12, The loop that
exports key=value args (for arg in "$@"; do ... export "$arg") should use
declare -x "$arg" instead of export "$arg" to avoid ShellCheck SC2163 and ensure
the variable name is exported reliably; update the token inside the loop (the
export call) to declare -x "$arg" in run_all_evaluations.sh and then re-run
ShellCheck to verify the warning is resolved.

🧹 Nitpick comments (1)

backend/app/evaluation/common/helper.py (1)

41-51: Prevent extra_fields from overriding reserved report keys.

Right now extra_fields can clobber guardrail or num_samples (and could attempt to set performance). Consider guarding against reserved keys to avoid silent overrides.

💡 Suggested guard

 def build_evaluation_report(
     guardrail: str,
     num_samples: int,
     profiler: "Profiler",
     **extra_fields: Any,
 ) -> dict[str, Any]:
+    reserved = {"guardrail", "num_samples", "performance"}
+    overlap = reserved.intersection(extra_fields)
+    if overlap:
+        raise ValueError(f"extra_fields includes reserved keys: {', '.join(sorted(overlap))}")
     return {
         "guardrail": guardrail,
         "num_samples": num_samples,
         **extra_fields,
         "performance": build_performance_payload(profiler),
     }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/evaluation/common/helper.py` around lines 41 - 51, In
build_evaluation_report, prevent extra_fields from overriding reserved keys
("guardrail", "num_samples", "performance") by checking extra_fields.keys() for
any intersection with that reserved set and failing fast or removing them;
update the function (build_evaluation_report) to compute reserved =
{"guardrail","num_samples","performance"}, compute conflicts = reserved ∩
extra_fields.keys(), and if conflicts is non-empty either raise a ValueError
listing the conflicting keys or remove those keys from extra_fields and log/warn
before building the return dict so the reserved values from the function cannot
be clobbered.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@backend/scripts/run_all_evaluations.sh`:
- Around line 10-12: The loop that exports key=value args (for arg in "$@"; do
... export "$arg") should use declare -x "$arg" instead of export "$arg" to
avoid ShellCheck SC2163 and ensure the variable name is exported reliably;
update the token inside the loop (the export call) to declare -x "$arg" in
run_all_evaluations.sh and then re-run ShellCheck to verify the warning is
resolved.

---

Nitpick comments:
In `@backend/app/evaluation/common/helper.py`:
- Around line 41-51: In build_evaluation_report, prevent extra_fields from
overriding reserved keys ("guardrail", "num_samples", "performance") by checking
extra_fields.keys() for any intersection with that reserved set and failing fast
or removing them; update the function (build_evaluation_report) to compute
reserved = {"guardrail","num_samples","performance"}, compute conflicts =
reserved ∩ extra_fields.keys(), and if conflicts is non-empty either raise a
ValueError listing the conflicting keys or remove those keys from extra_fields
and log/warn before building the return dict so the reserved values from the
function cannot be clobbered.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a9e2db7 and 946c705.

📒 Files selected for processing (6)

backend/app/evaluation/ban_list/run.py
backend/app/evaluation/common/helper.py
backend/app/evaluation/gender_assumption_bias/run.py
backend/app/evaluation/lexical_slur/run.py
backend/app/evaluation/pii/run.py
backend/scripts/run_all_evaluations.sh

🚧 Files skipped from review as they are similar to previous changes (1)

backend/app/evaluation/ban_list/run.py

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (3)

backend/app/evaluation/ban_list/run.py (3)
32-34: Pass on_fail=OnFailAction.FIX explicitly to BanList for codebase consistency.

This validator is instantiated without an on_fail argument, while every other location in the codebase that constructs a BanList (e.g., ban_list_safety_validator_config.py) and the PIIRemover validator both pass on_fail explicitly.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/evaluation/ban_list/run.py` around lines 32 - 34, The BanList
validator is created without an explicit on_fail which is inconsistent with
other usages; update the constructor call for BanList (the variable validator
created with BANNED_WORDS) to pass on_fail=OnFailAction.FIX, and if OnFailAction
is not already imported where BanList is used, add the import for OnFailAction
so the symbol resolves.
52-57: ⚠️ Potential issue | 🟠 Major

KeyError when neither "label" nor "target_text" column is present.

The else branch at line 54 unconditionally accesses dataset["target_text"] without verifying the column exists — identical to the unresolved issue flagged in a prior review.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/evaluation/ban_list/run.py` around lines 52 - 57, The code
unconditionally accesses dataset["target_text"] in the else branch, causing a
KeyError when neither "label" nor "target_text" exists; update the logic that
assigns dataset["y_true"] to first check for the presence of "target_text" and
"source_text" in dataset.columns (or use dataset.get) and only compute the
difference when both exist, otherwise raise a clear ValueError indicating the
missing columns; reference the existing dataset variable and the assignment to
"y_true" so the change touches the same block that currently checks "label" and
falls back to comparing "source_text" vs "target_text".
39-41: ⚠️ Potential issue | 🟠 Major

or-based fallback silently returns unredacted text when fix_value is "".

When every character of text is a banned term, BanList._validate sets fix_value = "" (completely redacted). fix_value is Optional[Any] — the "auto-fix value that would be applied if the Validator's on_fail method is 'fix'", so an empty string is a valid and meaningful result. Using or treats "" as falsy and substitutes the original text, returning unredacted content and corrupting both redacted_text and exact_match metrics.
🐛 Proposed fix
-        return (result.fix_value or text), 1
+        return (text if result.fix_value is None else result.fix_value), 1
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/evaluation/ban_list/run.py` around lines 39 - 41, The code uses a
falsy check ("result.fix_value or text") that treats an intentional empty-string
fix_value as absent; change the fallback logic in the return so that you only
substitute text when fix_value is None (not when it's empty string). Update the
return that references FailResult/result to use an explicit None check (e.g.,
use result.fix_value if result.fix_value is not None else text) so empty-string
fixes are preserved; adjust the logic in the function where FailResult is
handled (the return in run.py that currently uses "or") to implement this
explicit None-aware fallback.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/evaluation/ban_list/run.py`:
- Line 28: The code builds BANNED_WORDS from BAN_LIST_WORDS_RAW but doesn't
guard against resulting empty list after filtering (e.g.,
BAN_LIST_WORDS_RAW=","), which can silently initialize BanList with no entries;
update the initialization logic that creates BANNED_WORDS to validate the
filtered result and raise/exit with a clear error (or fallback) when
BANNED_WORDS is empty—check the variable BAN_LIST_WORDS_RAW, compute
BANNED_WORDS as now, then if not BANNED_WORDS: log an explicit error and stop or
provide a sensible default before constructing the BanList to ensure BanList is
never created with an empty banned-words list.

---

Duplicate comments:
In `@backend/app/evaluation/ban_list/run.py`:
- Around line 32-34: The BanList validator is created without an explicit
on_fail which is inconsistent with other usages; update the constructor call for
BanList (the variable validator created with BANNED_WORDS) to pass
on_fail=OnFailAction.FIX, and if OnFailAction is not already imported where
BanList is used, add the import for OnFailAction so the symbol resolves.
- Around line 52-57: The code unconditionally accesses dataset["target_text"] in
the else branch, causing a KeyError when neither "label" nor "target_text"
exists; update the logic that assigns dataset["y_true"] to first check for the
presence of "target_text" and "source_text" in dataset.columns (or use
dataset.get) and only compute the difference when both exist, otherwise raise a
clear ValueError indicating the missing columns; reference the existing dataset
variable and the assignment to "y_true" so the change touches the same block
that currently checks "label" and falls back to comparing "source_text" vs
"target_text".
- Around line 39-41: The code uses a falsy check ("result.fix_value or text")
that treats an intentional empty-string fix_value as absent; change the fallback
logic in the return so that you only substitute text when fix_value is None (not
when it's empty string). Update the return that references FailResult/result to
use an explicit None check (e.g., use result.fix_value if result.fix_value is
not None else text) so empty-string fixes are preserved; adjust the logic in the
function where FailResult is handled (the return in run.py that currently uses
"or") to implement this explicit None-aware fallback.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 946c705 and 2d3314c.

📒 Files selected for processing (2)

backend/README.md
backend/app/evaluation/ban_list/run.py

🚧 Files skipped from review as they are similar to previous changes (1)

backend/README.md

coderabbitai · 2026-02-24T01:32:21Z

backend/app/evaluation/ban_list/run.py

+        "BAN_LIST_WORDS must be set for ban_list evaluation (comma-separated)."
+    )
+
+BANNED_WORDS = [word.strip() for word in BAN_LIST_WORDS_RAW.split(",") if word.strip()]


⚠️ Potential issue | 🟡 Minor

Guard against an empty BANNED_WORDS list after filtering.

if not BAN_LIST_WORDS_RAW only rejects an unset/empty variable. A value like BAN_LIST_WORDS="," passes that guard but produces BANNED_WORDS = [], silently initializing BanList with an empty banned-words list and rendering the entire evaluation meaningless.

🛡️ Proposed fix

BANNED_WORDS = [word.strip() for word in BAN_LIST_WORDS_RAW.split(",") if word.strip()] +if not BANNED_WORDS: + raise ValueError( + "BAN_LIST_WORDS contained no valid words after stripping whitespace and commas." + )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/app/evaluation/ban_list/run.py` at line 28, The code builds BANNED_WORDS from BAN_LIST_WORDS_RAW but doesn't guard against resulting empty list after filtering (e.g., BAN_LIST_WORDS_RAW=","), which can silently initialize BanList with no entries; update the initialization logic that creates BANNED_WORDS to validate the filtered result and raise/exit with a clear error (or fallback) when BANNED_WORDS is empty—check the variable BAN_LIST_WORDS_RAW, compute BANNED_WORDS as now, then if not BANNED_WORDS: log an explicit error and stop or provide a sensible default before constructing the BanList to ensure BanList is never created with an empty banned-words list.

Added ban list evaluation

a9e2db7

rkritika1508 marked this pull request as ready for review February 23, 2026 09:05

rkritika1508 self-assigned this Feb 23, 2026

rkritika1508 linked an issue Feb 23, 2026 that may be closed by this pull request

Ban list evaluation #63

Open

coderabbitai bot reviewed Feb 23, 2026

View reviewed changes

backend/app/evaluation/ban_list/run.py Outdated Show resolved Hide resolved

backend/app/evaluation/ban_list/run.py Outdated Show resolved Hide resolved

backend/app/evaluation/ban_list/run.py Outdated Show resolved Hide resolved

backend/scripts/run_all_evaluations.sh Show resolved Hide resolved

AkhileshNegi requested changes Feb 23, 2026

View reviewed changes

nishika26 requested changes Feb 23, 2026

View reviewed changes

backend/README.md Outdated Show resolved Hide resolved

rkritika1508 added 2 commits February 24, 2026 06:53

resolved comments

946c705

resolved comments

2d3314c

coderabbitai bot reviewed Feb 24, 2026

View reviewed changes

Merge branch 'main' into feat/ban_list_evals

cf771c7

Comments

Conversation

rkritika1508 commented Feb 23, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Checklist

Notes

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rkritika1508 commented Feb 23, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 23, 2026 •

edited

Loading