Conversation
📝 WalkthroughWalkthroughAdds a nullable Changes
Sequence DiagramsequenceDiagram
participant Client
participant Service as EvaluationService
participant DB as Database
participant S3 as Object Storage
participant Langfuse as Langfuse API
Client->>Service: get_evaluation_with_scores(resync_score=false)
Service->>DB: get_evaluation_run(id)
DB-->>Service: EvaluationRun (may include score_trace_url)
alt score_trace_url present and not resync
Service->>S3: load_json_from_object_store(score_trace_url)
alt S3 returns traces
S3-->>Service: traces list
Service->>Service: merge traces with summary_scores
else S3 fails
S3-->>Service: error
Service->>DB: read cached traces from eval_run.score
DB-->>Service: cached traces (if any)
Service->>Service: merge cached traces with summary_scores
end
else resync requested
Service->>Langfuse: fetch_trace_scores_from_langfuse()
Langfuse-->>Service: fresh traces & summary_scores
Service->>Service: merge Langfuse data with any existing summaries
Service->>S3: upload_jsonl_to_object_store(traces) (optional)
S3-->>Service: score_trace_url (on success)
Service->>DB: update_evaluation_run(score, score_trace_url)
end
Service-->>Client: evaluation with merged scores and traces
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
backend/app/core/storage_utils.py (1)
86-123:⚠️ Potential issue | 🟡 MinorAlign JSONL metadata with JSON array payload.
The payload is now a JSON array, but the docstring and content-type still say JSONL. Update the metadata to avoid confusion (or revert to JSONL formatting).
🧾 Suggested update
- Upload JSONL (JSON Lines) content to object store. + Upload JSON array content to object store. @@ - results: List of dictionaries to be converted to JSONL + results: List of dictionaries to be serialized as a JSON array @@ - headers = Headers({"content-type": "application/jsonl"}) + headers = Headers({"content-type": "application/json"})backend/app/crud/evaluations/core.py (1)
184-209:⚠️ Potential issue | 🟡 MinorDocument the new
score_trace_urlparameter.The Args section doesn’t mention the newly added parameter.
✍️ Suggested update
- object_store_url: New object store URL (optional) - score: New score dict (optional) - embedding_batch_job_id: New embedding batch job ID (optional) + object_store_url: New object store URL (optional) + score_trace_url: New per-trace score URL (optional) + score: New score dict (optional) + embedding_batch_job_id: New embedding batch job ID (optional)
🤖 Fix all issues with AI agents
In `@backend/app/alembic/versions/043_add_score_trace_url_to_evaluation_run.py`:
- Around line 19-32: The migration functions upgrade and downgrade lack return
type hints; update the function signatures for upgrade() and downgrade() to
declare return type -> None so they read as upgrade() -> None: and downgrade()
-> None: (keep existing bodies unchanged) to satisfy the type-hinting rule.
In `@backend/app/core/storage_utils.py`:
- Around line 151-152: The log statement in load_json_from_object_store has a
missing closing quote and brace in the f-string; update the logger.info call
(logger.info(...)) to properly close the string and include the url variable,
e.g., fix the f-string to something like f"[load_json_from_object_store] Loading
JSON from '{url}'" so the message is well-formed and includes the trailing quote
and bracket.
In `@backend/app/services/evaluations/evaluation.py`:
- Around line 233-267: The code in get_evaluation_with_scores treats S3 and DB
as mutually exclusive (uses `elif`), so if `eval_run.score_trace_url` exists but
`load_json_from_object_store` fails or returns None you never fall back to DB
traces (`has_cached_traces_db`); change the control flow so the DB check is a
separate conditional executed after the S3 attempt/exception (i.e., replace the
`elif has_cached_traces_db` with an independent `if has_cached_traces_db` and
ensure the S3-exception handler does not return or suppress the subsequent DB
check), using the existing symbols `eval_run`, `score_trace_url`,
`has_cached_traces_db`, `load_json_from_object_store`, and
`get_evaluation_with_scores` to locate and update the logic.
In `@backend/app/tests/core/test_storage_utils.py`:
- Around line 57-98: The test methods in TestLoadJsonFromObjectStore lack
parameter type hints; update each method signature (test_load_success,
test_load_returns_none_on_error, test_load_returns_none_on_invalid_json) to
include the implicit self parameter type (self: "TestLoadJsonFromObjectStore")
and an explicit return type -> None; ensure any local test helper parameters
would also be annotated (e.g., mock_storage: MagicMock) if present, and keep the
body unchanged so assertions against load_json_from_object_store and
mock_storage.stream continue to work.
- Around line 16-55: The test methods in class TestUploadJsonlToObjectStore lack
type annotations on the implicit self parameter; update the method signatures
for test_upload_success and test_upload_returns_none_on_error to annotate self
(e.g., self: TestUploadJsonlToObjectStore) while keeping the existing return
type hints, and run tests to ensure no behavioral changes; locate these methods
and the class name TestUploadJsonlToObjectStore and add the self parameter type
annotation accordingly.
In `@backend/app/tests/crud/evaluations/test_score_storage.py`:
- Around line 14-110: The fixture mock_eval_run should be a factory (callable)
and the test functions need parameter type hints; change the fixture name to
mock_eval_run_factory (or make mock_eval_run return a Callable[[], MagicMock])
that when called creates/returns a MagicMock(spec=EvaluationRun) with id=100,
then inside tests call mock_eval_run_factory() to get the eval_run; also add
type annotations to each test parameter (e.g., mock_engine: MagicMock,
mock_get_storage: MagicMock, mock_upload: MagicMock, mock_get_eval: MagicMock,
mock_update: MagicMock, mock_eval_run_factory: Callable[[], EvaluationRun] or
MagicMock) and return type -> None for
test_uploads_traces_to_s3_and_stores_summary_only,
test_fallback_to_db_when_s3_fails, and test_no_s3_upload_when_no_traces; keep
references to save_score and update_evaluation_run/get_evaluation_run_by_id
unchanged.
In `@backend/app/tests/services/evaluations/test_evaluation_service_s3.py`:
- Around line 42-120: The tests lack type hints: add annotations to the test
methods' parameters and fixtures referenced in this file—annotate mock
parameters (mock_get_storage, mock_load, mock_get_eval, mock_get_langfuse,
mock_fetch_langfuse, mock_save_score) as MagicMock, annotate fixture parameters
completed_eval_run_with_s3 and completed_eval_run_with_db_traces as
EvaluationRun, and annotate self as "TestGetEvaluationWithScoresS3" on the
methods test_loads_traces_from_s3, test_returns_db_traces_when_no_s3_url, and
test_resync_bypasses_cache_and_fetches_langfuse; also add return type
annotations for the fixtures (completed_eval_run_with_s3 -> EvaluationRun,
completed_eval_run_with_db_traces -> EvaluationRun) and import MagicMock and
EvaluationRun types where needed.
- Around line 14-37: Replace the two near-duplicate fixtures
completed_eval_run_with_s3 and completed_eval_run_with_db_traces with a single
factory fixture (e.g., eval_run_factory) that accepts parameters like id: int,
status: str, score: dict, score_trace_url: Optional[str], dataset_name:
Optional[str], run_name: Optional[str] and returns a
MagicMock(spec=EvaluationRun) populated accordingly; update tests to call
eval_run_factory(…) to create the two scenarios. Add explicit type hints to the
fixture and returned value (eval_run_factory: Callable[..., MagicMock] or ->
MagicMock) and to all test function parameters (e.g., mock_get_storage:
MagicMock, eval_run_factory: Callable[..., MagicMock]) so every fixture/test
signature is typed. Ensure existing symbols EvaluationRun,
completed_eval_run_with_s3 and completed_eval_run_with_db_traces are
removed/replaced and all tests updated to use the new eval_run_factory and typed
parameters.
🧹 Nitpick comments (1)
backend/app/crud/evaluations/core.py (1)
386-392: Resolve the TODO about DB storage fallback.This TODO leaves behavior undecided and includes a typo (“weather” → “whether”). Please either track it in an issue or implement the intended behavior.
If you want, I can draft the follow-up change or open an issue template entry.
| def upgrade(): | ||
| op.add_column( | ||
| "evaluation_run", | ||
| sa.Column( | ||
| "score_trace_url", | ||
| sqlmodel.sql.sqltypes.AutoString(), | ||
| nullable=True, | ||
| comment="S3 URL where per-trace evaluation scores are stored", | ||
| ), | ||
| ) | ||
|
|
||
|
|
||
| def downgrade(): | ||
| op.drop_column("evaluation_run", "score_trace_url") |
There was a problem hiding this comment.
Add return type hints to migration functions.
Both upgrade() and downgrade() should declare -> None to comply with the type-hinting rule.
✍️ Suggested update
-def upgrade():
+def upgrade() -> None:
op.add_column(
"evaluation_run",
sa.Column(
"score_trace_url",
sqlmodel.sql.sqltypes.AutoString(),
nullable=True,
comment="S3 URL where per-trace evaluation scores are stored",
),
)
-def downgrade():
+def downgrade() -> None:
op.drop_column("evaluation_run", "score_trace_url")As per coding guidelines, **/*.py: Always add type hints to all function parameters and return values in Python code.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def upgrade(): | |
| op.add_column( | |
| "evaluation_run", | |
| sa.Column( | |
| "score_trace_url", | |
| sqlmodel.sql.sqltypes.AutoString(), | |
| nullable=True, | |
| comment="S3 URL where per-trace evaluation scores are stored", | |
| ), | |
| ) | |
| def downgrade(): | |
| op.drop_column("evaluation_run", "score_trace_url") | |
| def upgrade() -> None: | |
| op.add_column( | |
| "evaluation_run", | |
| sa.Column( | |
| "score_trace_url", | |
| sqlmodel.sql.sqltypes.AutoString(), | |
| nullable=True, | |
| comment="S3 URL where per-trace evaluation scores are stored", | |
| ), | |
| ) | |
| def downgrade() -> None: | |
| op.drop_column("evaluation_run", "score_trace_url") |
🤖 Prompt for AI Agents
In `@backend/app/alembic/versions/043_add_score_trace_url_to_evaluation_run.py`
around lines 19 - 32, The migration functions upgrade and downgrade lack return
type hints; update the function signatures for upgrade() and downgrade() to
declare return type -> None so they read as upgrade() -> None: and downgrade()
-> None: (keep existing bodies unchanged) to satisfy the type-hinting rule.
| def load_json_from_object_store(storage: CloudStorage, url: str) -> list | None: | ||
| logger.info(f"[load_json_from_object_store] Loading JSON from '{url}") |
There was a problem hiding this comment.
Fix log message formatting.
The opening quote isn’t closed, which makes the log line confusing.
✍️ Suggested update
- logger.info(f"[load_json_from_object_store] Loading JSON from '{url}")
+ logger.info(f"[load_json_from_object_store] Loading JSON from '{url}'")🤖 Prompt for AI Agents
In `@backend/app/core/storage_utils.py` around lines 151 - 152, The log statement
in load_json_from_object_store has a missing closing quote and brace in the
f-string; update the logger.info call (logger.info(...)) to properly close the
string and include the url variable, e.g., fix the f-string to something like
f"[load_json_from_object_store] Loading JSON from '{url}'" so the message is
well-formed and includes the trailing quote and bracket.
| class TestUploadJsonlToObjectStore: | ||
| """Test uploading JSON content to object store.""" | ||
|
|
||
| def test_upload_success(self) -> None: | ||
| """Verify successful upload returns URL and content is correct.""" | ||
| mock_storage = MagicMock() | ||
| mock_storage.put.return_value = "s3://bucket/path/traces.json" | ||
|
|
||
| results = [{"trace_id": "t1", "score": 0.9}] | ||
|
|
||
| url = upload_jsonl_to_object_store( | ||
| storage=mock_storage, | ||
| results=results, | ||
| filename="traces.json", | ||
| subdirectory="evaluations/score/70", | ||
| ) | ||
|
|
||
| assert url == "s3://bucket/path/traces.json" | ||
| mock_storage.put.assert_called_once() | ||
|
|
||
| # Verify content is valid JSON | ||
| call_args = mock_storage.put.call_args | ||
| upload_file = call_args.kwargs.get("source") | ||
| content = upload_file.file.read().decode("utf-8") | ||
| assert json.loads(content) == results | ||
|
|
||
| def test_upload_returns_none_on_error(self) -> None: | ||
| """Verify returns None on CloudStorageError.""" | ||
| mock_storage = MagicMock() | ||
| mock_storage.put.side_effect = CloudStorageError("Upload failed") | ||
|
|
||
| url = upload_jsonl_to_object_store( | ||
| storage=mock_storage, | ||
| results=[{"id": 1}], | ||
| filename="test.json", | ||
| subdirectory="test", | ||
| ) | ||
|
|
||
| assert url is None | ||
|
|
There was a problem hiding this comment.
Add parameter type hints to test methods.
Annotate self to satisfy the type-hint requirement for function parameters.
✍️ Suggested update
- def test_upload_success(self) -> None:
+ def test_upload_success(self: "TestUploadJsonlToObjectStore") -> None:
"""Verify successful upload returns URL and content is correct."""
mock_storage = MagicMock()
mock_storage.put.return_value = "s3://bucket/path/traces.json"
@@
- def test_upload_returns_none_on_error(self) -> None:
+ def test_upload_returns_none_on_error(
+ self: "TestUploadJsonlToObjectStore",
+ ) -> None:
"""Verify returns None on CloudStorageError."""
mock_storage = MagicMock()
mock_storage.put.side_effect = CloudStorageError("Upload failed")As per coding guidelines, **/*.py: Always add type hints to all function parameters and return values in Python code.
🤖 Prompt for AI Agents
In `@backend/app/tests/core/test_storage_utils.py` around lines 16 - 55, The test
methods in class TestUploadJsonlToObjectStore lack type annotations on the
implicit self parameter; update the method signatures for test_upload_success
and test_upload_returns_none_on_error to annotate self (e.g., self:
TestUploadJsonlToObjectStore) while keeping the existing return type hints, and
run tests to ensure no behavioral changes; locate these methods and the class
name TestUploadJsonlToObjectStore and add the self parameter type annotation
accordingly.
| class TestLoadJsonFromObjectStore: | ||
| """Test loading JSON from object store.""" | ||
|
|
||
| def test_load_success(self) -> None: | ||
| """Verify successful load returns parsed JSON.""" | ||
| mock_storage = MagicMock() | ||
| test_data = [{"id": 1, "value": "test"}] | ||
| mock_storage.stream.return_value = BytesIO( | ||
| json.dumps(test_data).encode("utf-8") | ||
| ) | ||
|
|
||
| result = load_json_from_object_store( | ||
| storage=mock_storage, | ||
| url="s3://bucket/path/file.json", | ||
| ) | ||
|
|
||
| assert result == test_data | ||
| mock_storage.stream.assert_called_once_with("s3://bucket/path/file.json") | ||
|
|
||
| def test_load_returns_none_on_error(self) -> None: | ||
| """Verify returns None on CloudStorageError.""" | ||
| mock_storage = MagicMock() | ||
| mock_storage.stream.side_effect = CloudStorageError("Download failed") | ||
|
|
||
| result = load_json_from_object_store( | ||
| storage=mock_storage, | ||
| url="s3://bucket/file.json", | ||
| ) | ||
|
|
||
| assert result is None | ||
|
|
||
| def test_load_returns_none_on_invalid_json(self) -> None: | ||
| """Verify returns None on invalid JSON content.""" | ||
| mock_storage = MagicMock() | ||
| mock_storage.stream.return_value = BytesIO(b"not valid json") | ||
|
|
||
| result = load_json_from_object_store( | ||
| storage=mock_storage, | ||
| url="s3://bucket/file.json", | ||
| ) | ||
|
|
||
| assert result is None |
There was a problem hiding this comment.
Add parameter type hints to test methods.
Same type-hint requirement applies to this class’ methods.
✍️ Suggested update
- def test_load_success(self) -> None:
+ def test_load_success(self: "TestLoadJsonFromObjectStore") -> None:
"""Verify successful load returns parsed JSON."""
mock_storage = MagicMock()
test_data = [{"id": 1, "value": "test"}]
@@
- def test_load_returns_none_on_error(self) -> None:
+ def test_load_returns_none_on_error(
+ self: "TestLoadJsonFromObjectStore",
+ ) -> None:
"""Verify returns None on CloudStorageError."""
mock_storage = MagicMock()
mock_storage.stream.side_effect = CloudStorageError("Download failed")
@@
- def test_load_returns_none_on_invalid_json(self) -> None:
+ def test_load_returns_none_on_invalid_json(
+ self: "TestLoadJsonFromObjectStore",
+ ) -> None:
"""Verify returns None on invalid JSON content."""
mock_storage = MagicMock()
mock_storage.stream.return_value = BytesIO(b"not valid json")As per coding guidelines, **/*.py: Always add type hints to all function parameters and return values in Python code.
🤖 Prompt for AI Agents
In `@backend/app/tests/core/test_storage_utils.py` around lines 57 - 98, The test
methods in TestLoadJsonFromObjectStore lack parameter type hints; update each
method signature (test_load_success, test_load_returns_none_on_error,
test_load_returns_none_on_invalid_json) to include the implicit self parameter
type (self: "TestLoadJsonFromObjectStore") and an explicit return type -> None;
ensure any local test helper parameters would also be annotated (e.g.,
mock_storage: MagicMock) if present, and keep the body unchanged so assertions
against load_json_from_object_store and mock_storage.stream continue to work.
| @pytest.fixture | ||
| def mock_eval_run(self): | ||
| """Create a mock EvaluationRun.""" | ||
| eval_run = MagicMock(spec=EvaluationRun) | ||
| eval_run.id = 100 | ||
| return eval_run | ||
|
|
||
| @patch("app.crud.evaluations.core.update_evaluation_run") | ||
| @patch("app.crud.evaluations.core.get_evaluation_run_by_id") | ||
| @patch("app.core.storage_utils.upload_jsonl_to_object_store") | ||
| @patch("app.core.cloud.storage.get_cloud_storage") | ||
| @patch("app.core.db.engine") | ||
| def test_uploads_traces_to_s3_and_stores_summary_only( | ||
| self, | ||
| mock_engine, | ||
| mock_get_storage, | ||
| mock_upload, | ||
| mock_get_eval, | ||
| mock_update, | ||
| mock_eval_run, | ||
| ) -> None: | ||
| """Verify traces uploaded to S3, only summary_scores stored in DB.""" | ||
| mock_get_eval.return_value = mock_eval_run | ||
| mock_get_storage.return_value = MagicMock() | ||
| mock_upload.return_value = "s3://bucket/traces.json" | ||
|
|
||
| score = { | ||
| "summary_scores": [{"name": "accuracy", "avg": 0.9}], | ||
| "traces": [{"trace_id": "t1"}], | ||
| } | ||
|
|
||
| save_score(eval_run_id=100, organization_id=1, project_id=1, score=score) | ||
|
|
||
| # Verify upload was called with traces | ||
| mock_upload.assert_called_once() | ||
| assert mock_upload.call_args.kwargs["results"] == [{"trace_id": "t1"}] | ||
|
|
||
| # Verify DB gets summary only, not traces | ||
| call_kwargs = mock_update.call_args.kwargs | ||
| assert call_kwargs["score"] == { | ||
| "summary_scores": [{"name": "accuracy", "avg": 0.9}] | ||
| } | ||
| assert call_kwargs["score_trace_url"] == "s3://bucket/traces.json" | ||
|
|
||
| @patch("app.crud.evaluations.core.update_evaluation_run") | ||
| @patch("app.crud.evaluations.core.get_evaluation_run_by_id") | ||
| @patch("app.core.storage_utils.upload_jsonl_to_object_store") | ||
| @patch("app.core.cloud.storage.get_cloud_storage") | ||
| @patch("app.core.db.engine") | ||
| def test_fallback_to_db_when_s3_fails( | ||
| self, | ||
| mock_engine, | ||
| mock_get_storage, | ||
| mock_upload, | ||
| mock_get_eval, | ||
| mock_update, | ||
| mock_eval_run, | ||
| ) -> None: | ||
| """Verify full score stored in DB when S3 upload fails.""" | ||
| mock_get_eval.return_value = mock_eval_run | ||
| mock_get_storage.return_value = MagicMock() | ||
| mock_upload.return_value = None # S3 failed | ||
|
|
||
| score = { | ||
| "summary_scores": [{"name": "accuracy", "avg": 0.9}], | ||
| "traces": [{"trace_id": "t1"}], | ||
| } | ||
|
|
||
| save_score(eval_run_id=100, organization_id=1, project_id=1, score=score) | ||
|
|
||
| # Full score stored in DB as fallback | ||
| call_kwargs = mock_update.call_args.kwargs | ||
| assert call_kwargs["score"] == score | ||
| assert call_kwargs["score_trace_url"] is None | ||
|
|
||
| @patch("app.crud.evaluations.core.update_evaluation_run") | ||
| @patch("app.crud.evaluations.core.get_evaluation_run_by_id") | ||
| @patch("app.core.storage_utils.upload_jsonl_to_object_store") | ||
| @patch("app.core.cloud.storage.get_cloud_storage") | ||
| @patch("app.core.db.engine") | ||
| def test_no_s3_upload_when_no_traces( | ||
| self, | ||
| mock_engine, | ||
| mock_get_storage, | ||
| mock_upload, | ||
| mock_get_eval, | ||
| mock_update, | ||
| mock_eval_run, | ||
| ) -> None: | ||
| """Verify S3 upload skipped when traces is empty.""" | ||
| mock_get_eval.return_value = mock_eval_run | ||
|
|
||
| score = {"summary_scores": [{"name": "accuracy", "avg": 0.9}], "traces": []} | ||
|
|
||
| save_score(eval_run_id=100, organization_id=1, project_id=1, score=score) | ||
|
|
||
| mock_upload.assert_not_called() |
There was a problem hiding this comment.
Use a factory fixture and add parameter type hints.
Guidelines call for factory-style fixtures and typed parameters. Consider converting the fixture into a callable factory and annotating the mocked parameters.
✍️ Suggested update (apply pattern to other tests too)
+from typing import Callable
from unittest.mock import MagicMock, patch
@@
class TestSaveScoreS3Upload:
@@
`@pytest.fixture`
- def mock_eval_run(self):
+ def eval_run_factory(
+ self: "TestSaveScoreS3Upload",
+ ) -> Callable[[], EvaluationRun]:
"""Create a mock EvaluationRun."""
- eval_run = MagicMock(spec=EvaluationRun)
- eval_run.id = 100
- return eval_run
+ def _factory() -> EvaluationRun:
+ eval_run = MagicMock(spec=EvaluationRun)
+ eval_run.id = 100
+ return eval_run
+
+ return _factory
@@
def test_uploads_traces_to_s3_and_stores_summary_only(
- self,
- mock_engine,
- mock_get_storage,
- mock_upload,
- mock_get_eval,
- mock_update,
- mock_eval_run,
+ self: "TestSaveScoreS3Upload",
+ mock_engine: MagicMock,
+ mock_get_storage: MagicMock,
+ mock_upload: MagicMock,
+ mock_get_eval: MagicMock,
+ mock_update: MagicMock,
+ eval_run_factory: Callable[[], EvaluationRun],
) -> None:
@@
- mock_get_eval.return_value = mock_eval_run
+ mock_get_eval.return_value = eval_run_factory()As per coding guidelines, /*.py: Always add type hints to all function parameters and return values in Python code; backend/app/tests//*.py: Use factory pattern for test fixtures in backend/app/tests/.
🤖 Prompt for AI Agents
In `@backend/app/tests/crud/evaluations/test_score_storage.py` around lines 14 -
110, The fixture mock_eval_run should be a factory (callable) and the test
functions need parameter type hints; change the fixture name to
mock_eval_run_factory (or make mock_eval_run return a Callable[[], MagicMock])
that when called creates/returns a MagicMock(spec=EvaluationRun) with id=100,
then inside tests call mock_eval_run_factory() to get the eval_run; also add
type annotations to each test parameter (e.g., mock_engine: MagicMock,
mock_get_storage: MagicMock, mock_upload: MagicMock, mock_get_eval: MagicMock,
mock_update: MagicMock, mock_eval_run_factory: Callable[[], EvaluationRun] or
MagicMock) and return type -> None for
test_uploads_traces_to_s3_and_stores_summary_only,
test_fallback_to_db_when_s3_fails, and test_no_s3_upload_when_no_traces; keep
references to save_score and update_evaluation_run/get_evaluation_run_by_id
unchanged.
| @pytest.fixture | ||
| def completed_eval_run_with_s3(self): | ||
| """Completed eval run with S3 URL.""" | ||
| eval_run = MagicMock(spec=EvaluationRun) | ||
| eval_run.id = 100 | ||
| eval_run.status = "completed" | ||
| eval_run.score = {"summary_scores": [{"name": "accuracy", "avg": 0.9}]} | ||
| eval_run.score_trace_url = "s3://bucket/traces.json" | ||
| eval_run.dataset_name = "test_dataset" | ||
| eval_run.run_name = "test_run" | ||
| return eval_run | ||
|
|
||
| @pytest.fixture | ||
| def completed_eval_run_with_db_traces(self): | ||
| """Completed eval run with traces in DB.""" | ||
| eval_run = MagicMock(spec=EvaluationRun) | ||
| eval_run.id = 101 | ||
| eval_run.status = "completed" | ||
| eval_run.score = { | ||
| "summary_scores": [{"name": "accuracy", "avg": 0.85}], | ||
| "traces": [{"trace_id": "db_trace"}], | ||
| } | ||
| eval_run.score_trace_url = None | ||
| return eval_run |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
🧩 Analysis chain
🏁 Script executed:
# Find the test file
find . -type f -name "test_evaluation_service_s3.py" 2>/dev/nullRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 145
🏁 Script executed:
cat -n ./backend/app/tests/services/evaluations/test_evaluation_service_s3.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 5429
🏁 Script executed:
# Verify the guidelines apply and check if EvaluationRun type is available
rg "from app.models import EvaluationRun" ./backend/app/tests/services/evaluations/test_evaluation_service_s3.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 111
Use a factory fixture and add type hints to fixtures and test methods.
The fixtures duplicate setup logic and lack required type hints, violating the mandatory coding guidelines. A factory fixture reduces duplication and enables proper type annotations for all parameters and return values.
♻️ Suggested refactor
-from unittest.mock import MagicMock, patch
+from typing import Callable
+from unittest.mock import MagicMock, patch
...
class TestGetEvaluationWithScoresS3:
"""Test get_evaluation_with_scores() S3 retrieval."""
`@pytest.fixture`
- def completed_eval_run_with_s3(self):
- """Completed eval run with S3 URL."""
- eval_run = MagicMock(spec=EvaluationRun)
- eval_run.id = 100
- eval_run.status = "completed"
- eval_run.score = {"summary_scores": [{"name": "accuracy", "avg": 0.9}]}
- eval_run.score_trace_url = "s3://bucket/traces.json"
- eval_run.dataset_name = "test_dataset"
- eval_run.run_name = "test_run"
- return eval_run
+ def evaluation_run_factory(
+ self: "TestGetEvaluationWithScoresS3",
+ ) -> Callable[..., EvaluationRun]:
+ def _factory(
+ *,
+ eval_run_id: int,
+ status: str = "completed",
+ score: dict | None = None,
+ score_trace_url: str | None = None,
+ dataset_name: str = "test_dataset",
+ run_name: str = "test_run",
+ ) -> EvaluationRun:
+ eval_run = MagicMock(spec=EvaluationRun)
+ eval_run.id = eval_run_id
+ eval_run.status = status
+ eval_run.score = score
+ eval_run.score_trace_url = score_trace_url
+ eval_run.dataset_name = dataset_name
+ eval_run.run_name = run_name
+ return eval_run
+
+ return _factory
+
+ `@pytest.fixture`
+ def completed_eval_run_with_s3(
+ self: "TestGetEvaluationWithScoresS3",
+ evaluation_run_factory: Callable[..., EvaluationRun],
+ ) -> EvaluationRun:
- """Completed eval run with S3 URL."""
+ """Completed eval run with S3 URL."""
+ return evaluation_run_factory(
+ eval_run_id=100,
+ score={"summary_scores": [{"name": "accuracy", "avg": 0.9}]},
+ score_trace_url="s3://bucket/traces.json",
+ )
`@pytest.fixture`
- def completed_eval_run_with_db_traces(self):
- """Completed eval run with traces in DB."""
- eval_run = MagicMock(spec=EvaluationRun)
- eval_run.id = 101
- eval_run.status = "completed"
- eval_run.score = {
- "summary_scores": [{"name": "accuracy", "avg": 0.85}],
- "traces": [{"trace_id": "db_trace"}],
- }
- eval_run.score_trace_url = None
- return eval_run
+ def completed_eval_run_with_db_traces(
+ self: "TestGetEvaluationWithScoresS3",
+ evaluation_run_factory: Callable[..., EvaluationRun],
+ ) -> EvaluationRun:
+ """Completed eval run with traces in DB."""
+ return evaluation_run_factory(
+ eval_run_id=101,
+ score={
+ "summary_scores": [{"name": "accuracy", "avg": 0.85}],
+ "traces": [{"trace_id": "db_trace"}],
+ },
+ score_trace_url=None,
+ )Also add type hints to test method parameters (e.g., mock_get_storage: MagicMock instead of bare mock_get_storage).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @pytest.fixture | |
| def completed_eval_run_with_s3(self): | |
| """Completed eval run with S3 URL.""" | |
| eval_run = MagicMock(spec=EvaluationRun) | |
| eval_run.id = 100 | |
| eval_run.status = "completed" | |
| eval_run.score = {"summary_scores": [{"name": "accuracy", "avg": 0.9}]} | |
| eval_run.score_trace_url = "s3://bucket/traces.json" | |
| eval_run.dataset_name = "test_dataset" | |
| eval_run.run_name = "test_run" | |
| return eval_run | |
| @pytest.fixture | |
| def completed_eval_run_with_db_traces(self): | |
| """Completed eval run with traces in DB.""" | |
| eval_run = MagicMock(spec=EvaluationRun) | |
| eval_run.id = 101 | |
| eval_run.status = "completed" | |
| eval_run.score = { | |
| "summary_scores": [{"name": "accuracy", "avg": 0.85}], | |
| "traces": [{"trace_id": "db_trace"}], | |
| } | |
| eval_run.score_trace_url = None | |
| return eval_run | |
| `@pytest.fixture` | |
| def evaluation_run_factory( | |
| self: "TestGetEvaluationWithScoresS3", | |
| ) -> Callable[..., EvaluationRun]: | |
| """Factory for creating evaluation run fixtures.""" | |
| def _factory( | |
| *, | |
| eval_run_id: int, | |
| status: str = "completed", | |
| score: dict | None = None, | |
| score_trace_url: str | None = None, | |
| dataset_name: str = "test_dataset", | |
| run_name: str = "test_run", | |
| ) -> EvaluationRun: | |
| eval_run = MagicMock(spec=EvaluationRun) | |
| eval_run.id = eval_run_id | |
| eval_run.status = status | |
| eval_run.score = score | |
| eval_run.score_trace_url = score_trace_url | |
| eval_run.dataset_name = dataset_name | |
| eval_run.run_name = run_name | |
| return eval_run | |
| return _factory | |
| `@pytest.fixture` | |
| def completed_eval_run_with_s3( | |
| self: "TestGetEvaluationWithScoresS3", | |
| evaluation_run_factory: Callable[..., EvaluationRun], | |
| ) -> EvaluationRun: | |
| """Completed eval run with S3 URL.""" | |
| return evaluation_run_factory( | |
| eval_run_id=100, | |
| score={"summary_scores": [{"name": "accuracy", "avg": 0.9}]}, | |
| score_trace_url="s3://bucket/traces.json", | |
| ) | |
| `@pytest.fixture` | |
| def completed_eval_run_with_db_traces( | |
| self: "TestGetEvaluationWithScoresS3", | |
| evaluation_run_factory: Callable[..., EvaluationRun], | |
| ) -> EvaluationRun: | |
| """Completed eval run with traces in DB.""" | |
| return evaluation_run_factory( | |
| eval_run_id=101, | |
| score={ | |
| "summary_scores": [{"name": "accuracy", "avg": 0.85}], | |
| "traces": [{"trace_id": "db_trace"}], | |
| }, | |
| score_trace_url=None, | |
| ) |
🤖 Prompt for AI Agents
In `@backend/app/tests/services/evaluations/test_evaluation_service_s3.py` around
lines 14 - 37, Replace the two near-duplicate fixtures
completed_eval_run_with_s3 and completed_eval_run_with_db_traces with a single
factory fixture (e.g., eval_run_factory) that accepts parameters like id: int,
status: str, score: dict, score_trace_url: Optional[str], dataset_name:
Optional[str], run_name: Optional[str] and returns a
MagicMock(spec=EvaluationRun) populated accordingly; update tests to call
eval_run_factory(…) to create the two scenarios. Add explicit type hints to the
fixture and returned value (eval_run_factory: Callable[..., MagicMock] or ->
MagicMock) and to all test function parameters (e.g., mock_get_storage:
MagicMock, eval_run_factory: Callable[..., MagicMock]) so every fixture/test
signature is typed. Ensure existing symbols EvaluationRun,
completed_eval_run_with_s3 and completed_eval_run_with_db_traces are
removed/replaced and all tests updated to use the new eval_run_factory and typed
parameters.
| def test_loads_traces_from_s3( | ||
| self, mock_get_storage, mock_load, mock_get_eval, completed_eval_run_with_s3 | ||
| ) -> None: | ||
| """Verify traces loaded from S3 and score reconstructed.""" | ||
| mock_get_eval.return_value = completed_eval_run_with_s3 | ||
| mock_get_storage.return_value = MagicMock() | ||
| mock_load.return_value = [{"trace_id": "s3_trace"}] | ||
|
|
||
| result, error = get_evaluation_with_scores( | ||
| session=MagicMock(), | ||
| evaluation_id=100, | ||
| organization_id=1, | ||
| project_id=1, | ||
| get_trace_info=True, | ||
| resync_score=False, | ||
| ) | ||
|
|
||
| assert error is None | ||
| mock_load.assert_called_once() | ||
| assert result.score["traces"] == [{"trace_id": "s3_trace"}] | ||
| assert result.score["summary_scores"] == [{"name": "accuracy", "avg": 0.9}] | ||
|
|
||
| @patch("app.services.evaluations.evaluation.get_evaluation_run_by_id") | ||
| @patch("app.core.cloud.storage.get_cloud_storage") | ||
| def test_returns_db_traces_when_no_s3_url( | ||
| self, mock_get_storage, mock_get_eval, completed_eval_run_with_db_traces | ||
| ) -> None: | ||
| """Verify DB traces returned when no S3 URL.""" | ||
| mock_get_eval.return_value = completed_eval_run_with_db_traces | ||
|
|
||
| result, error = get_evaluation_with_scores( | ||
| session=MagicMock(), | ||
| evaluation_id=101, | ||
| organization_id=1, | ||
| project_id=1, | ||
| get_trace_info=True, | ||
| resync_score=False, | ||
| ) | ||
|
|
||
| assert error is None | ||
| mock_get_storage.assert_not_called() | ||
| assert result.score["traces"] == [{"trace_id": "db_trace"}] | ||
|
|
||
| @patch("app.services.evaluations.evaluation.save_score") | ||
| @patch("app.services.evaluations.evaluation.fetch_trace_scores_from_langfuse") | ||
| @patch("app.services.evaluations.evaluation.get_langfuse_client") | ||
| @patch("app.services.evaluations.evaluation.get_evaluation_run_by_id") | ||
| @patch("app.core.storage_utils.load_json_from_object_store") | ||
| @patch("app.core.cloud.storage.get_cloud_storage") | ||
| def test_resync_bypasses_cache_and_fetches_langfuse( | ||
| self, | ||
| mock_get_storage, | ||
| mock_load, | ||
| mock_get_eval, | ||
| mock_get_langfuse, | ||
| mock_fetch_langfuse, | ||
| mock_save_score, | ||
| completed_eval_run_with_s3, | ||
| ) -> None: | ||
| """Verify resync=True skips S3/DB and fetches from Langfuse.""" | ||
| mock_get_eval.return_value = completed_eval_run_with_s3 | ||
| mock_get_langfuse.return_value = MagicMock() | ||
| mock_fetch_langfuse.return_value = { | ||
| "summary_scores": [], | ||
| "traces": [{"trace_id": "new"}], | ||
| } | ||
| mock_save_score.return_value = completed_eval_run_with_s3 | ||
|
|
||
| get_evaluation_with_scores( | ||
| session=MagicMock(), | ||
| evaluation_id=100, | ||
| organization_id=1, | ||
| project_id=1, | ||
| get_trace_info=True, | ||
| resync_score=True, | ||
| ) | ||
|
|
||
| mock_load.assert_not_called() # S3 skipped | ||
| mock_fetch_langfuse.assert_called_once() # Langfuse called |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cd backend && find . -name "test_evaluation_service_s3.py" -type fRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 137
🏁 Script executed:
# Get file info
wc -l backend/app/tests/services/evaluations/test_evaluation_service_s3.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 147
🏁 Script executed:
# Read lines 42-120 to verify content
sed -n '42,120p' backend/app/tests/services/evaluations/test_evaluation_service_s3.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 3139
🏁 Script executed:
# Check imports and fixture definitions at the top of the file
head -50 backend/app/tests/services/evaluations/test_evaluation_service_s3.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 1956
🏁 Script executed:
# Search for the test class and fixture definitions
rg -n "class Test|@pytest.fixture|def test_" backend/app/tests/services/evaluations/test_evaluation_service_s3.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 308
Add type hints to test method parameters and fixture return types.
Test methods and fixtures are missing type hint annotations. Per coding guidelines for **/*.py and backend/app/tests/**/*.py, all parameters and return values must have type hints.
Type hints to add
Test method parameters require type hints:
- Mock parameters (e.g.,
mock_get_storage,mock_load) should be annotated asMagicMock - Fixture parameters (e.g.,
completed_eval_run_with_s3) should be annotated with their actual type (EvaluationRun) selfparameter should be annotated as"TestGetEvaluationWithScoresS3"
Fixtures also need return type annotations:
`@pytest.fixture`
def completed_eval_run_with_s3(self) -> EvaluationRun:
...
`@pytest.fixture`
def completed_eval_run_with_db_traces(self) -> EvaluationRun:
...🤖 Prompt for AI Agents
In `@backend/app/tests/services/evaluations/test_evaluation_service_s3.py` around
lines 42 - 120, The tests lack type hints: add annotations to the test methods'
parameters and fixtures referenced in this file—annotate mock parameters
(mock_get_storage, mock_load, mock_get_eval, mock_get_langfuse,
mock_fetch_langfuse, mock_save_score) as MagicMock, annotate fixture parameters
completed_eval_run_with_s3 and completed_eval_run_with_db_traces as
EvaluationRun, and annotate self as "TestGetEvaluationWithScoresS3" on the
methods test_loads_traces_from_s3, test_returns_db_traces_when_no_s3_url, and
test_resync_bypasses_cache_and_fetches_langfuse; also add return type
annotations for the fixtures (completed_eval_run_with_s3 -> EvaluationRun,
completed_eval_run_with_db_traces -> EvaluationRun) and import MagicMock and
EvaluationRun types where needed.
Issue: #494
Summary
Test plan
Summary by CodeRabbit
New Features
Chores
Tests