Fixing field name mismatch in CyberSOCEval malware analysis prompt generation #136

kyle-deprow · 2025-10-23T21:33:41Z

This PR fixes a critical bug in the CyberSOCEval4 Malware Analysis benchmark where prompts were being generated with "None" as the question text. The issue was caused by a field name mismatch in malware_analysis.py. The code was attempting to retrieve the question using test_case.get("question_text"), but the actual dataset file (questions.json) stores the question under the field name "question". This caused all prompts to be generated with the literal string "None" instead of the actual question text, resulting in prompts like "Answer the following multi-choice question: None." being sent to the models under test. The fix changes the field accessor from "question_text" to "question" to match the dataset schema, ensuring that the actual question text is properly included in the generated prompts.

'None' questions being passed to model

laurendeason

This is indeed a critical bug. Thank you so much for bringing it to our attention! We will be regenerating results for this benchmark post fix.

meta-codesync · 2025-10-24T19:49:17Z

@laurendeason has imported this pull request. If you are a Meta employee, you can view this in D85461327.

meta-codesync · 2025-10-24T20:47:20Z

@laurendeason merged this pull request in 1d30117.

lshariprasad · 2025-10-25T01:39:05Z

PR Summary:
This pull request fixes a critical bug in the CyberSOCEval4 Malware Analysis benchmark where prompts were incorrectly generated with "None" as the question text.

Root Cause:
The issue was caused by a field name mismatch in malware_analysis.py. The code attempted to access test_case.get("question_text"), but the dataset file (questions.json) actually stores the question under the field name "question".

Impact:
Because of this mismatch, all generated prompts used the literal string "None" instead of the intended question, producing malformed inputs such as:
Answer the following multi-choice question: None.

Fix Implemented:
The field accessor was updated from "question_text" to "question" in malware_analysis.py to correctly align with the dataset schema. This ensures that actual question text is now properly included in the generated prompts during model evaluation.

Fixing bug in CyberSOCEval malware analysis resulting in

b1a3885

'None' questions being passed to model

meta-cla bot added the cla signed label Oct 23, 2025

laurendeason approved these changes Oct 24, 2025

View reviewed changes

thepointer1982 approved these changes Oct 24, 2025

View reviewed changes

meta-codesync bot closed this in 1d30117 Oct 24, 2025

facebook-github-bot added the Merged label Oct 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fixing field name mismatch in CyberSOCEval malware analysis prompt generation #136

Fixing field name mismatch in CyberSOCEval malware analysis prompt generation #136

Uh oh!

kyle-deprow commented Oct 23, 2025

Uh oh!

laurendeason left a comment

Uh oh!

meta-codesync bot commented Oct 24, 2025

Uh oh!

meta-codesync bot commented Oct 24, 2025

Uh oh!

lshariprasad commented Oct 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Fixing field name mismatch in CyberSOCEval malware analysis prompt generation #136

Fixing field name mismatch in CyberSOCEval malware analysis prompt generation #136

Uh oh!

Conversation

kyle-deprow commented Oct 23, 2025

Uh oh!

laurendeason left a comment

Choose a reason for hiding this comment

Uh oh!

meta-codesync bot commented Oct 24, 2025

Uh oh!

meta-codesync bot commented Oct 24, 2025

Uh oh!

lshariprasad commented Oct 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants