Skip to content

Comments

Added ban list evaluation#62

Open
rkritika1508 wants to merge 4 commits intomainfrom
feat/ban_list_evals
Open

Added ban list evaluation#62
rkritika1508 wants to merge 4 commits intomainfrom
feat/ban_list_evals

Conversation

@rkritika1508
Copy link
Collaborator

@rkritika1508 rkritika1508 commented Feb 23, 2026

Summary

Target issue is #63.
Explain the motivation for making this change. What existing problem does the pull request solve?

  • Updated the script run_all_evaluations.sh to run evaluations for all validators, including ban list.
  • Added a dataset for ban list validator. The dataset can be found here.
  • Added code to evaluate the performance of ban list validator on the curated dataset.

To run evaluations, use this command -
scripts/run_all_evaluations.sh BAN_LIST_WORDS="sonography,gender check"

Checklist

Before submitting a pull request, please ensure that you mark these task.

  • Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
  • If you've fixed a bug or added code that is tested and has test cases.

Notes

Please add here if any other information is required for the reviewer.

Summary by CodeRabbit

  • New Features

    • Added a ban-list evaluator with configurable banned-word checks, per-sample predictions, and aggregated metrics.
    • Runner now accepts environment-variable assignments when invoking evaluations.
  • Documentation

    • Updated evaluation guidance to document ban-list configuration, required environment variable, invocation examples, and output expectations.
    • Reworded test descriptions to clarify curated datasets and benchmark scope.
  • Refactor

    • Unified metrics and performance reporting across evaluators for consistent latency, memory, and summary payloads.

@rkritika1508 rkritika1508 marked this pull request as ready for review February 23, 2026 09:05
@coderabbitai
Copy link

coderabbitai bot commented Feb 23, 2026

📝 Walkthrough

Walkthrough

A new ban-list evaluator and reporting helpers were added. Documentation and the evaluation orchestration script were updated to require and propagate BAN_LIST_WORDS. Existing evaluators were refactored to use a shared evaluation-report builder and performance summarization utilities.

Changes

Cohort / File(s) Summary
Documentation
backend/README.md
Documented BAN_LIST_WORDS usage, added ban_list output path and standardized output structure.
Ban-list Evaluator
backend/app/evaluation/ban_list/run.py
New evaluator that reads BAN_LIST_WORDS, loads dataset, exposes run_ban_list(text) -> (redacted_text, fail_flag), profiles execution, computes binary/exact-match metrics, writes outputs/ban_list/predictions.csv and metrics.json.
Evaluation helpers
backend/app/evaluation/common/helper.py
Added summarize_latency, build_performance_payload, and build_evaluation_report to centralize latency/memory summarization and standardize metrics JSON payloads.
Existing evaluators updated
backend/app/evaluation/gender_assumption_bias/run.py, backend/app/evaluation/lexical_slur/run.py, backend/app/evaluation/pii/run.py
Replaced inline metrics dicts with build_evaluation_report(...) calls and adjusted imports to use the new helper and Profiler wiring.
Evaluation orchestration script
backend/scripts/run_all_evaluations.sh
Exports CLI env assignments of form KEY=VALUE before running evaluators and added ban_list to the RUNNERS sequence.
Manifests / deps
manifest_file, requirements.txt, pyproject.toml
Manifest/dependency records updated to account for added code (no public API signature changes other than new helper exports and run_ban_list).

Sequence Diagram(s)

sequenceDiagram
    participant Runner as "run_all_evaluations.sh"
    participant Env as "Environment\n(BAN_LIST_WORDS)"
    participant Evaluator as "ban_list/run.py\n(BanList Validator)"
    participant Dataset as "Dataset\n(files)"
    participant Profiler as "Profiler"
    participant Outputs as "Outputs\n(predictions.csv, metrics.json)"

    Runner->>Env: export KEY=VALUE args
    Runner->>Evaluator: invoke ban_list runner
    Evaluator->>Env: read BAN_LIST_WORDS
    Evaluator->>Dataset: load dataset (source/target/labels)
    Evaluator->>Profiler: start profiling
    Evaluator->>Evaluator: run_ban_list(text) per sample -> (redacted_text, fail_flag)
    Evaluator->>Profiler: stop profiling
    Evaluator->>Outputs: write predictions.csv
    Evaluator->>Outputs: build_evaluation_report(...) -> write metrics.json
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

ready-for-review

Suggested reviewers

  • nishika26

Poem

🐰 I hopped through code with carrot cheer,
Ban words gathered, now crystal clear.
Metrics counted, profiles hum,
CSVs tidy — outputs come.
Hooray — the checks now bloom with vim! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Added ban list evaluation' directly and clearly describes the main change in the changeset—the addition of ban list evaluation functionality and supporting infrastructure.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/ban_list_evals

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@rkritika1508 rkritika1508 self-assigned this Feb 23, 2026
@rkritika1508 rkritika1508 linked an issue Feb 23, 2026 that may be closed by this pull request
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
backend/app/evaluation/ban_list/run.py (1)

28-28: Consider wrapping module-level execution in if __name__ == "__main__".

pd.read_csv(DATASET_PATH), the BanList instantiation, and all evaluation logic execute unconditionally at import time. Wrapping them in an entry-point guard would prevent accidental side effects if run_ban_list is ever imported by another module (e.g., a test), and also makes the FileNotFoundError easier to trace.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/evaluation/ban_list/run.py` at line 28, The module currently
executes pd.read_csv(DATASET_PATH) and the BanList instantiation/evaluation at
import time; wrap that module-level work in an entry-point guard by moving the
dataset load, the BanList(...) instantiation and the evaluation/run logic into a
protected block (either a small main() function invoked under if __name__ ==
"__main__": or directly under that guard) so importing this module (e.g., for
tests) does not trigger pd.read_csv or run the evaluation automatically.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/evaluation/ban_list/run.py`:
- Around line 48-53: The code unconditionally accesses df["target_text"] when
"label" is absent which raises KeyError if target_text is missing; update the
conditional in run.py to explicitly handle three cases: if "label" in df.columns
set df["y_true"] from df["label"]; elif both "source_text" and "target_text" in
df.columns compute df["y_true"] = (df["source_text"].astype(str) !=
df["target_text"].astype(str)).astype(int); else (e.g., only "source_text"
present) set df["y_true"] to a sensible default (such as 0 or NaN) or raise a
clear ValueError explaining missing "target_text"/"label" so callers know the
dataset is invalid. Ensure you reference the df, "y_true", "label",
"source_text", and "target_text" symbols when making the change.
- Around line 65-77: The latency calculations in the write_json payload use
p.latencies directly and will raise ZeroDivisionError/ValueError/IndexError when
p.latencies is empty and also compute p95 with an off-by-one; fix by guarding on
n = len(p.latencies): if n == 0 set latency stats to None (or 0 per project
convention) and otherwise compute mean = sum(p.latencies)/n, compute
sorted_latencies = sorted(p.latencies), compute p95_index = max(0,
math.ceil(0.95 * n) - 1) (import math) and use sorted_latencies[p95_index], and
compute max = max(p.latencies); update the block that builds the "latency_ms"
dict inside the write_json call that references p.latencies to use this guarded
logic.
- Around line 30-38: Instantiate BanList with the explicit
on_fail=OnFailAction.FIX (update the validator creation to pass OnFailAction.FIX
and add an import for OnFailAction if missing) so intent matches
ban_list_safety_validator_config.py/PIIRemover; in run_ban_list (the function
using validator.validate) keep the existing branch that returns
(result.fix_value, 1) when isinstance(result, FailResult) and add a defensive
else branch that returns the original text (or result.value) with a 0 flag to
handle non-FailResult cases safely.

In `@backend/scripts/run_all_evaluations.sh`:
- Around line 10-14: The loop in run_all_evaluations.sh that iterates over
arguments and attempts to export assignments uses export "$arg", which triggers
ShellCheck SC2163; replace export "$arg" with declare -x "$arg" inside the
for-loop so the string-form assignment is exported in the portable/idiomatic way
(locate the for arg in "$@" loop and change the export call to declare -x).

---

Nitpick comments:
In `@backend/app/evaluation/ban_list/run.py`:
- Line 28: The module currently executes pd.read_csv(DATASET_PATH) and the
BanList instantiation/evaluation at import time; wrap that module-level work in
an entry-point guard by moving the dataset load, the BanList(...) instantiation
and the evaluation/run logic into a protected block (either a small main()
function invoked under if __name__ == "__main__": or directly under that guard)
so importing this module (e.g., for tests) does not trigger pd.read_csv or run
the evaluation automatically.
ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2f7bd27 and a9e2db7.

📒 Files selected for processing (3)
  • backend/README.md
  • backend/app/evaluation/ban_list/run.py
  • backend/scripts/run_all_evaluations.sh

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
backend/scripts/run_all_evaluations.sh (1)

10-12: ⚠️ Potential issue | 🟡 Minor

Replace export "$arg" with declare -x "$arg" to avoid SC2163 and keep exports idiomatic.

This is the same ShellCheck warning noted earlier; the current form still doesn’t export the variable name reliably.

🔧 Proposed fix
 for arg in "$@"; do
   if [[ "$arg" == *=* ]]; then
-    export "$arg"
+    declare -x "$arg"
   fi
 done

To verify, re-run ShellCheck on this script after the change.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/scripts/run_all_evaluations.sh` around lines 10 - 12, The loop that
exports key=value args (for arg in "$@"; do ... export "$arg") should use
declare -x "$arg" instead of export "$arg" to avoid ShellCheck SC2163 and ensure
the variable name is exported reliably; update the token inside the loop (the
export call) to declare -x "$arg" in run_all_evaluations.sh and then re-run
ShellCheck to verify the warning is resolved.
🧹 Nitpick comments (1)
backend/app/evaluation/common/helper.py (1)

41-51: Prevent extra_fields from overriding reserved report keys.

Right now extra_fields can clobber guardrail or num_samples (and could attempt to set performance). Consider guarding against reserved keys to avoid silent overrides.

💡 Suggested guard
 def build_evaluation_report(
     guardrail: str,
     num_samples: int,
     profiler: "Profiler",
     **extra_fields: Any,
 ) -> dict[str, Any]:
+    reserved = {"guardrail", "num_samples", "performance"}
+    overlap = reserved.intersection(extra_fields)
+    if overlap:
+        raise ValueError(f"extra_fields includes reserved keys: {', '.join(sorted(overlap))}")
     return {
         "guardrail": guardrail,
         "num_samples": num_samples,
         **extra_fields,
         "performance": build_performance_payload(profiler),
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/evaluation/common/helper.py` around lines 41 - 51, In
build_evaluation_report, prevent extra_fields from overriding reserved keys
("guardrail", "num_samples", "performance") by checking extra_fields.keys() for
any intersection with that reserved set and failing fast or removing them;
update the function (build_evaluation_report) to compute reserved =
{"guardrail","num_samples","performance"}, compute conflicts = reserved ∩
extra_fields.keys(), and if conflicts is non-empty either raise a ValueError
listing the conflicting keys or remove those keys from extra_fields and log/warn
before building the return dict so the reserved values from the function cannot
be clobbered.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@backend/scripts/run_all_evaluations.sh`:
- Around line 10-12: The loop that exports key=value args (for arg in "$@"; do
... export "$arg") should use declare -x "$arg" instead of export "$arg" to
avoid ShellCheck SC2163 and ensure the variable name is exported reliably;
update the token inside the loop (the export call) to declare -x "$arg" in
run_all_evaluations.sh and then re-run ShellCheck to verify the warning is
resolved.

---

Nitpick comments:
In `@backend/app/evaluation/common/helper.py`:
- Around line 41-51: In build_evaluation_report, prevent extra_fields from
overriding reserved keys ("guardrail", "num_samples", "performance") by checking
extra_fields.keys() for any intersection with that reserved set and failing fast
or removing them; update the function (build_evaluation_report) to compute
reserved = {"guardrail","num_samples","performance"}, compute conflicts =
reserved ∩ extra_fields.keys(), and if conflicts is non-empty either raise a
ValueError listing the conflicting keys or remove those keys from extra_fields
and log/warn before building the return dict so the reserved values from the
function cannot be clobbered.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a9e2db7 and 946c705.

📒 Files selected for processing (6)
  • backend/app/evaluation/ban_list/run.py
  • backend/app/evaluation/common/helper.py
  • backend/app/evaluation/gender_assumption_bias/run.py
  • backend/app/evaluation/lexical_slur/run.py
  • backend/app/evaluation/pii/run.py
  • backend/scripts/run_all_evaluations.sh
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/app/evaluation/ban_list/run.py

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
backend/app/evaluation/ban_list/run.py (3)

32-34: Pass on_fail=OnFailAction.FIX explicitly to BanList for codebase consistency.

This validator is instantiated without an on_fail argument, while every other location in the codebase that constructs a BanList (e.g., ban_list_safety_validator_config.py) and the PIIRemover validator both pass on_fail explicitly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/evaluation/ban_list/run.py` around lines 32 - 34, The BanList
validator is created without an explicit on_fail which is inconsistent with
other usages; update the constructor call for BanList (the variable validator
created with BANNED_WORDS) to pass on_fail=OnFailAction.FIX, and if OnFailAction
is not already imported where BanList is used, add the import for OnFailAction
so the symbol resolves.

52-57: ⚠️ Potential issue | 🟠 Major

KeyError when neither "label" nor "target_text" column is present.

The else branch at line 54 unconditionally accesses dataset["target_text"] without verifying the column exists — identical to the unresolved issue flagged in a prior review.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/evaluation/ban_list/run.py` around lines 52 - 57, The code
unconditionally accesses dataset["target_text"] in the else branch, causing a
KeyError when neither "label" nor "target_text" exists; update the logic that
assigns dataset["y_true"] to first check for the presence of "target_text" and
"source_text" in dataset.columns (or use dataset.get) and only compute the
difference when both exist, otherwise raise a clear ValueError indicating the
missing columns; reference the existing dataset variable and the assignment to
"y_true" so the change touches the same block that currently checks "label" and
falls back to comparing "source_text" vs "target_text".

39-41: ⚠️ Potential issue | 🟠 Major

or-based fallback silently returns unredacted text when fix_value is "".

When every character of text is a banned term, BanList._validate sets fix_value = "" (completely redacted). fix_value is Optional[Any] — the "auto-fix value that would be applied if the Validator's on_fail method is 'fix'", so an empty string is a valid and meaningful result. Using or treats "" as falsy and substitutes the original text, returning unredacted content and corrupting both redacted_text and exact_match metrics.

🐛 Proposed fix
-        return (result.fix_value or text), 1
+        return (text if result.fix_value is None else result.fix_value), 1
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/evaluation/ban_list/run.py` around lines 39 - 41, The code uses a
falsy check ("result.fix_value or text") that treats an intentional empty-string
fix_value as absent; change the fallback logic in the return so that you only
substitute text when fix_value is None (not when it's empty string). Update the
return that references FailResult/result to use an explicit None check (e.g.,
use result.fix_value if result.fix_value is not None else text) so empty-string
fixes are preserved; adjust the logic in the function where FailResult is
handled (the return in run.py that currently uses "or") to implement this
explicit None-aware fallback.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/evaluation/ban_list/run.py`:
- Line 28: The code builds BANNED_WORDS from BAN_LIST_WORDS_RAW but doesn't
guard against resulting empty list after filtering (e.g.,
BAN_LIST_WORDS_RAW=","), which can silently initialize BanList with no entries;
update the initialization logic that creates BANNED_WORDS to validate the
filtered result and raise/exit with a clear error (or fallback) when
BANNED_WORDS is empty—check the variable BAN_LIST_WORDS_RAW, compute
BANNED_WORDS as now, then if not BANNED_WORDS: log an explicit error and stop or
provide a sensible default before constructing the BanList to ensure BanList is
never created with an empty banned-words list.

---

Duplicate comments:
In `@backend/app/evaluation/ban_list/run.py`:
- Around line 32-34: The BanList validator is created without an explicit
on_fail which is inconsistent with other usages; update the constructor call for
BanList (the variable validator created with BANNED_WORDS) to pass
on_fail=OnFailAction.FIX, and if OnFailAction is not already imported where
BanList is used, add the import for OnFailAction so the symbol resolves.
- Around line 52-57: The code unconditionally accesses dataset["target_text"] in
the else branch, causing a KeyError when neither "label" nor "target_text"
exists; update the logic that assigns dataset["y_true"] to first check for the
presence of "target_text" and "source_text" in dataset.columns (or use
dataset.get) and only compute the difference when both exist, otherwise raise a
clear ValueError indicating the missing columns; reference the existing dataset
variable and the assignment to "y_true" so the change touches the same block
that currently checks "label" and falls back to comparing "source_text" vs
"target_text".
- Around line 39-41: The code uses a falsy check ("result.fix_value or text")
that treats an intentional empty-string fix_value as absent; change the fallback
logic in the return so that you only substitute text when fix_value is None (not
when it's empty string). Update the return that references FailResult/result to
use an explicit None check (e.g., use result.fix_value if result.fix_value is
not None else text) so empty-string fixes are preserved; adjust the logic in the
function where FailResult is handled (the return in run.py that currently uses
"or") to implement this explicit None-aware fallback.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 946c705 and 2d3314c.

📒 Files selected for processing (2)
  • backend/README.md
  • backend/app/evaluation/ban_list/run.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/README.md

"BAN_LIST_WORDS must be set for ban_list evaluation (comma-separated)."
)

BANNED_WORDS = [word.strip() for word in BAN_LIST_WORDS_RAW.split(",") if word.strip()]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Guard against an empty BANNED_WORDS list after filtering.

if not BAN_LIST_WORDS_RAW only rejects an unset/empty variable. A value like BAN_LIST_WORDS="," passes that guard but produces BANNED_WORDS = [], silently initializing BanList with an empty banned-words list and rendering the entire evaluation meaningless.

🛡️ Proposed fix
 BANNED_WORDS = [word.strip() for word in BAN_LIST_WORDS_RAW.split(",") if word.strip()]
+if not BANNED_WORDS:
+    raise ValueError(
+        "BAN_LIST_WORDS contained no valid words after stripping whitespace and commas."
+    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/evaluation/ban_list/run.py` at line 28, The code builds
BANNED_WORDS from BAN_LIST_WORDS_RAW but doesn't guard against resulting empty
list after filtering (e.g., BAN_LIST_WORDS_RAW=","), which can silently
initialize BanList with no entries; update the initialization logic that creates
BANNED_WORDS to validate the filtered result and raise/exit with a clear error
(or fallback) when BANNED_WORDS is empty—check the variable BAN_LIST_WORDS_RAW,
compute BANNED_WORDS as now, then if not BANNED_WORDS: log an explicit error and
stop or provide a sensible default before constructing the BanList to ensure
BanList is never created with an empty banned-words list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ban list evaluation

3 participants