You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi authors, thanks for the excellent work on UI‑TARS.
I have a question about whether you’ve considered exposing reflection tuning at inference time so that end‑users can interactively correct their agent’s errors on real tasks.
Question
After an annotator corrects the step and constructs T₊, do you then feed T₊ back to the model—continuing generation from the corrected state—to attempt completion of the same original task in that online session?
Have you considered enabling end users (not just annotators) to perform these corrections in inference stage? Concretely, on a live task the user could:
1. Monitor the partial trace T₋,
2. Spot an invalid/suboptimal thought t_τ or action a_τ,
3. Edit the thought to t′_τ and/or delete the bad action,
4. Ask the agent to continue from that corrected state toward finishing the same task.
This interactive workflow would both
Improve the user’s immediate experience by allowing recovery from mistakes on the spot, and
Generate fresh online traces (the corrected T₊ and completed trace) for further SFT or RLHF to continually strengthen the agent.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Hi authors, thanks for the excellent work on UI‑TARS.
I have a question about whether you’ve considered exposing reflection tuning at inference time so that end‑users can interactively correct their agent’s errors on real tasks.
Question
After an annotator corrects the step and constructs T₊, do you then feed T₊ back to the model—continuing generation from the corrected state—to attempt completion of the same original task in that online session?
Have you considered enabling end users (not just annotators) to perform these corrections in inference stage? Concretely, on a live task the user could:
This interactive workflow would both
Improve the user’s immediate experience by allowing recovery from mistakes on the spot, and
Generate fresh online traces (the corrected T₊ and completed trace) for further SFT or RLHF to continually strengthen the agent.
Beta Was this translation helpful? Give feedback.
All reactions