-
Notifications
You must be signed in to change notification settings - Fork 69
Avoid the explicit CATransaction
#275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
CATransaction commitCATransaction
cecb0bc to
e8ddecc
Compare
|
Wow. This is dramatically faster (up to 1000x !!!) for me. I'm seeing present times measured in 10s of microseconds rather than 10s of milliseconds. Specifically: running Blitz (a winit application) on my 14" MacBook Pro (M1), I'm getting the following for the times to call
To reproduce:
Then:
You should then see output like: It is the |
e8ddecc to
91c1904
Compare
|
Well, to be fair, this PR is not actually doing any work in the The actual work now happens in |
91c1904 to
e23a8fb
Compare
Hmm... I wasn't previously timing |
Sorry, that wasn't particularly clear; I meant that
Definitely agree. |
|
Reading the definition of Or were we accidentally waiting for the transaction to have completed, while this new model allows multiple implicit transactions to be created and submitted "asynchronously"? Just curious to map this to all other platforms' "compositor transaction" abstractions :) |
I think that's true, yeah. Disassembling I suspect that the real problem is actually in the way that Winit schedules redraws such that they happen outside |
|
Thanks for looking into that! Some quick local testing shows that Could it be that this call is blocking, when applications use it directly? Assigning a completion handler shows that it completes at around the same time. Curious how you "disassembled" so that we can look into Never Note that |
I did
Uhh, pretty sure that's invalid use of the API, otherwise you may be committing work done by something higher in your call stack.
I don't completely know how And yeah, with this PR, you will get a bit of latency here, in that the result is not actually (I'm pretty sure all of these issues just go away if the Winit issue was fixed, since then the |
|
Apologies, I meant to also skip Curious to see those complete out of order, some frames taking very long. I should've assumed the debugger might have "enough" debug symbols to see what is going on under the hood 😅 And yeah, I've been wanting to write better present-timing abstractions in Winit for years. |
No idea honestly, and unsure of how I'd test it? |
This is what I did, perhaps we could add it to that let s = Instant::now();
unsafe { self.imp.layer.setContents(Some(image.as_ref())) };
static mut FRAME: u32 = 0;
let frame = unsafe { FRAME };
unsafe { FRAME += 1 };
unsafe {
CATransaction::setCompletionBlock(Some(
// Does this clone or otherwise reference the block? After the move,
// the closure lifetime is 'static and could use StackBlock as well?
block2::RcBlock::new(move || {
println!("{frame:0>6}: {:?}", s.elapsed());
})
.deref(),
))
}; |
|
This PR seems to have a memory problem. I am able to get memory usage to spike as high as 3GB+ with this PR just by scrolling my app (which it causes it to render frames). Interestingly, it does drop if I resize the window, but only down to ~700mb. Rendering this same app with |
|
So, I reverse-engineered a bit, and found that CFRunLoopObserverCreate(
NULL, // allocator
kCFRunLoopBeforeWaiting | kCFRunLoopExit, // activities
1, // repeats
2000000, // order
CA::Transaction::observer_callback, // callout
ptr::null(), // context
);This clashes a bit with Winit (as expected), since Winit issues This also matches the documenation for Furthermore, I tried stepping through a debugger, and found that right after issuing This leads me to believe that To put it in Wayland terms:
This can be seen by drawing twice per frame: // Draw black
let buffer = surface.buffer_mut().unwrap();
buffer.present().unwrap();
let mut buffer = surface.buffer_mut().unwrap();
draw(&mut buffer);
buffer.present().unwrap();Before this PR, the window flickers between black and the actual contents, after this PR it only shows the final contents. Now the question becomes: What should See also #29. |
Hmm, weird. Is there a public example in Blitz that reproduces that? Or is there some other way I would be able to reproduce this? If not, maybe I could get you to try the following diff in Winit 0.30 + this PR? diff --git a/src/platform_impl/macos/observer.rs b/src/platform_impl/macos/observer.rs
index 833980308..3b83401fb 100644
--- a/src/platform_impl/macos/observer.rs
+++ b/src/platform_impl/macos/observer.rs
@@ -220,7 +220,7 @@ pub fn setup_control_flow_observers(mtm: MainThreadMarker, panic_info: Weak<Pani
);
run_loop.add_observer(
kCFRunLoopExit | kCFRunLoopBeforeWaiting,
- CFIndex::MAX,
+ 10000, // Less than `2000000`
control_flow_end_handler,
&mut context as *mut _,
); |
The
Will this also work with Winit 0.31? I upgraded Blitz |
Hmm, the
The specific patch there would only cleanly apply on 0.30, but should be the same situation on 0.31, the patch there is just instead: diff --git a/winit-appkit/src/observer.rs b/winit-appkit/src/observer.rs
index 48c934491..e4d98bcc7 100644
--- a/winit-appkit/src/observer.rs
+++ b/winit-appkit/src/observer.rs
@@ -160,7 +160,7 @@ pub fn setup_control_flow_observers(mtm: MainThreadMarker) {
);
run_loop.add_observer(
CFRunLoopActivity::Exit | CFRunLoopActivity::BeforeWaiting,
- CFIndex::MAX,
+ 2000,
Some(control_flow_end_handler),
&mut context as *mut _,
); |
That exact command works for me :/ Do you have an old version somehow? Perhaps I hadn't pushed the It's resizing the window a lot where I see high memory usage. I tried the Winit patch and it didn't seem to make any difference. |
|
Okay, so I pulled nicoburns/blitz@67ee43d (though I had to do a On macOS 15, Activity Monitor reports a steady <100MB memory usage for On macOS 26 though, if I moved the window offscreen and resized it to be larger than my screen, I could get the memory usage to spike a lot, easily a 1GB and more. But, the thing is, I can reproduce that before this PR too, it's just "only" roughly half as bad 1. And doing the Winit fix brings it down to before this PR levels. A guess: The allocator changed in macOS 26 (which has caused other problems), perhaps the memory allocations are more "lazily" released there, which means that Activity Monitor can't report it as accurately? That hypothesis is supported by the following screenshot where I resized the window ~20 times my screen width (i.e. with most of it offscreen). Does this match the behaviour you're seeing? Or is there something else going on? Footnotes |
|
The problematic case isn't when the window is resized once to a large size. It's when it's frequently resized repeatedly. That being said, I'm not seeing behaviour that was as bad as I remember seeing (possibly due to OS updates??? Or I suppose Winit 0.31 upgrade?), and I'm also seeing basically the same behaviour with Blitz Testing on macOS 15 btw. And this PR is still much faster than the released version. The speed is dependent on window size. The current version of softbuffer gets progressively slower the larger the window gets and this PR doesn't. Video of the testing process I've been using (apologies for poor quality - had to compress for file size reasons): clip.mp4 |
|
Hmm... I've just noticed that my original comment did explicitly say the issue happened during scrolling. Let me retest that. |
|
Ok, so I am able to get memory usage to increase pretty fast by running:
And then scrolling up and down like a madman for a few seconds. However, this seems to happen on With the kind of differences (in speed of memory growth) I'm seeing now I'd say it's entirely possible that memory usage is increasing a little faster with this PR simply because it's able to render frames faster so it's rendering more frames in the same period of time. |
Yeah, I tested that as well.
I can get the "Real Mem" up to about 1GB on macOS 15 too if I do that, but that's (seemingly) true both before and after this PR.
Sounds likely. I'll add, I'm not that familiar with exactly how memory allocation works, but I'm pretty sure that just because Activity Monitor is reporting a high memory usage doesn't actually mean that there are any leaks - it might again just be the allocator doing funny stuff. I tried running under Instruments' leak checker, the only reported leak is in Winit's |
Oh definitely! At the very least, you should allocate once and then re-use the buffer (and on window resize, as long as it's large enough, keep using that buffer it). Softbuffer is currently also allocating on macOS on every frame, which just makes the problem even worse (it should be double-buffering I know you're aware, but I'll just note it here too that linebender/vello#1382 is tracking allowing |
I think this is possible today. It just requires swizzling in-place. I'll probably give that another go if this PR lands. |
e23a8fb to
6964486
Compare
|
Thinking about it, I don't believe this PR actually affects performance real-world, sending the data to the compositor still has to happen somewhere. E.g. doing: let time = Instant::now();
CATransaction::begin();
CATransaction::commit();
println!("elapsed: {:?}", time.elapsed());Reveals times of around ~100 nanoseconds, which sounds probable to "just" be bookkeeping and a few checks to see if there's any pending modifications we need to send - it might not actually be sending anything to the compositor here! I also tried adding a simple FPS counter to the So what does this PR do? My understanding (now) is that it allows resizing to work more smoothly, since the updates to the window frame and decorations (which are fully under the control of the client / AppKit / the app) are sent to the compositor at the same time as the re-rendered content. Wayland solves this for subsurfaces IIUC with I'm a bit unsure what the default behaviour should be, but I think this PR is the right choice (especially if Winit got better at this). This also matches Wayland in that subsurfaces are synchronized by default (though it's a bit different in that I think the surface Winit's Wayland impl provides to Softbuffer is the root surface, whereas in the macOS impl the In the future, we might want to provide a way to configure this behaviour, but let's track that in #29 though since it's relevant for the other stuff in there. |
This effectively desyncs the surface from the rest of the window, which isn't desirable when resizing. Instead, we now commit in the implicit transaction that happens at the end of the current runloop iteration.
6964486 to
1c617ea
Compare


Transactions are expensive, and the layer should be able to figure out the timing of when to render by itself (by virtue of being installed in a view). The only reason why we did it before was to avoid a fade transition between layer content changes.
Part of #83. I have not benchmarked this, but I have visibly confirmed less stuttering when resizing.
Note that removing this does also remove any form of frame-limiting, that's tracked in #29 for that.