Skip to content

Conversation

@kyle-deprow
Copy link

This PR fixes a critical bug in the CyberSOCEval4 Malware Analysis benchmark where prompts were being generated with "None" as the question text. The issue was caused by a field name mismatch in malware_analysis.py. The code was attempting to retrieve the question using test_case.get("question_text"), but the actual dataset file (questions.json) stores the question under the field name "question". This caused all prompts to be generated with the literal string "None" instead of the actual question text, resulting in prompts like "Answer the following multi-choice question: None." being sent to the models under test. The fix changes the field accessor from "question_text" to "question" to match the dataset schema, ensuring that the actual question text is properly included in the generated prompts.

'None' questions being passed to model
@meta-cla meta-cla bot added the cla signed label Oct 23, 2025
Copy link
Contributor

@laurendeason laurendeason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed a critical bug. Thank you so much for bringing it to our attention! We will be regenerating results for this benchmark post fix.

@meta-codesync
Copy link

meta-codesync bot commented Oct 24, 2025

@laurendeason has imported this pull request. If you are a Meta employee, you can view this in D85461327.

@meta-codesync
Copy link

meta-codesync bot commented Oct 24, 2025

@laurendeason merged this pull request in 1d30117.

@lshariprasad
Copy link

PR Summary:
This pull request fixes a critical bug in the CyberSOCEval4 Malware Analysis benchmark where prompts were incorrectly generated with "None" as the question text.

Root Cause:
The issue was caused by a field name mismatch in malware_analysis.py. The code attempted to access test_case.get("question_text"), but the dataset file (questions.json) actually stores the question under the field name "question".

Impact:
Because of this mismatch, all generated prompts used the literal string "None" instead of the intended question, producing malformed inputs such as:
Answer the following multi-choice question: None.

Fix Implemented:
The field accessor was updated from "question_text" to "question" in malware_analysis.py to correctly align with the dataset schema. This ensures that actual question text is now properly included in the generated prompts during model evaluation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants