Skip to content

Conversation

@lotif
Copy link
Collaborator

@lotif lotif commented Jan 28, 2026

Summary

Adding Langfuse integration code and adding an evaluation script to the report generation agent.

Clickup Ticket(s): NA

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 📝 Documentation update
  • 🔧 Refactoring (no functional changes)
  • ⚡ Performance improvement
  • 🧪 Test improvements
  • 🔒 Security fix

Changes Made

  • Small refactorings and fixes
  • Adding common Langfuse code in the langfuse.py file
  • Adding a ground truth dataset
  • Adding a script to upload the dataset to Langfuse
  • Adding an evaluation script to run an LLM-as-a-judge against the Report Generation Agent and the ground truth dataset
  • Updating the instructions in the README.md file

For an example of how the evaluation results are looking like:
https://us.cloud.langfuse.com/project/cmkwsswke005dad07gxujnipq/datasets/cmkyev4nd000nad084ds2xm30/runs/27328bba-9843-4ccb-940f-6fe1b9e3b0ea

Testing

  • Tests pass locally (uv run pytest tests/)
  • Type checking passes (uv run mypy <src_dir>)
  • Linting passes (uv run ruff check src_dir/)
  • Manual testing performed (describe below)

Manual testing details:

Performed manual testing by following the instructions in the README.md file.

Checklist

  • Code follows the project's style guidelines
  • Self-review of code completed
  • Documentation updated (if applicable)
  • No sensitive information (API keys, credentials) exposed

lotif and others added 30 commits January 16, 2026 17:56
Copy link
Collaborator

@fcogidi fcogidi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of comments:

  • Will you switch to google-adk later since Amrit and I are using that? Asking 'cause the langfuse integration may be different for that.
  • The langfuse_upload script may be general enough to be in the aieng-eval-agents package

@lotif
Copy link
Collaborator Author

lotif commented Jan 30, 2026

@fcogidi

Will you switch to google-adk later since Amrit and I are using that? Asking 'cause the langfuse integration may be different for that.

Yes. I will have one more PR that I will put out on monday with the trajectory evals and next in line is the move to google-adk.

The langfuse_upload script may be general enough to be in the aieng-eval-agents package

Good point. I'm planning to move things around in follow up PRs as well and will keep that in mind.

@lotif lotif requested a review from fcogidi January 30, 2026 22:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants