-
BitBot
[Github] (servo/webrender) [PR] bors-servo merged #3623 into master: Sync changes from mozilla-central -
git.io/fjsY9
-
pcwalton
ajeffrey|pto: I got reprojection mostly working, sorta, though there is one magic constant I determined through experimentation that I need to do some trig to be principled about :)
-
BitBot
[Github] (servo/mozjs) [PR] bors-servo merged #178 into master: Allow more recent SDK versions -
git.io/fjsqA
-
pcwalton
the quality loss is noticeable, but acceptable given the quality loss inherent in lens distortion correction anyway
-
pcwalton
besides, we can tweak parameters to improve quality
-
pcwalton
one problem is that I don't know why my initial technique, to move the camera back to unify frusta, didn't work
-
ajeffrey|pto
pcwalton: can we alternate which eye is rendered, and reproject the "off" eye from the previous frame? Might give less distortion than reprojecting from the "on" eye?
-
BitBot
[Github] (servo/servo) [PR] bors-servo merged #23250 into master: Sync WPT with upstream (24-04-2019) -
git.io/fj3bJ
-
crowbot
servo-mac6 is overdue! (build started 2 hours ago)
-
BitBot
[Github] (servo/servo) [PR] Akhilesh1996 requested #23259 merge into master: Finished: Implement AudioParam.setValueCurveAtTime #22897 -
git.io/fjsG9
-
crowbot
servo-mac6 is overdue! (build started 2 hours ago)
-
BitBot
[Github] (servo/servo) [PR] servo-wpt-sync requested #23260 merge into master: Sync WPT with upstream (25-04-2019) -
git.io/fjsGF
-
BitBot
[Github] (servo/servo) [PR] CYBAI closed #23260: Sync WPT with upstream (25-04-2019) -
git.io/fjsGF
-
BitBot
[Github] (servo/servo) [issue] g10guang opened #23261: What's different from chrome V8? -
git.io/fjsZV
-
BitBot
[Github] (servo/servo) [issue] CYBAI closed #23261: What's different from chrome V8? -
git.io/fjsZV
-
BitBot
[Github] (servo/servo) [PR] paulrouget requested #23262 merge into master: Update SetCursor behavior -
git.io/fjscT
-
paul
Does anyone use reviewable.io for Servo's PRs?
-
taskcluster
Task "Servo daily: Linux x64: with Rust Nightly. On failure, ping: SimonSapin, nox, emilio" complete with status 'failed'. Inspect:
tools.taskcluster.net/tasks/JW0p9pXrRGCEitLMWhcdEQ
-
SimonSapin
it’d been a while
-
eijebong
SimonSapin: Don't know if a nightly with the new hashmaps is out yet but it might be worth waiting for it ?
-
SimonSapin
eijebong: good point, that might be it
-
eijebong
Yeah all the tests failures are on size_of
-
eijebong
I guess the hashmap is a bit bigger in a struct now
-
BitBot
[Github] (servo/servo) [PR] SimonSapin requested #23263 merge into master: Upgrade to rustc 1.36.0-nightly (e305df184 2019-04-24) -
git.io/fjs8X
-
SimonSapin
standups: rustup + size_of regression investigation
servo/servo #23263
-
crowbot
PR #23263: Upgrade to rustc 1.36.0-nightly (e305df184 2019-04-24) -
servo/servo #23263
-
crowbot
Status submitted successfully.
-
SimonSapin
-
crowbot
Issue #69: Reduce size_of<HashMap> ? -
rust-lang/hashbrown #69
-
crowbot
Status submitted successfully.
-
SimonSapin
paul: occasionally, though not in a while
-
BitBot
[Github] (servo/servo) [issue] jdm opened #23264: CopyTexImage2D needs to invalidate the texture -
git.io/fjsRh
-
BitBot
[Github] (servo/servo) [issue] jdm opened #23265: Verify that TexImage2D invalidates a texture that is attached to the framebuffer when a framebuffer is not bound -
git.io/fjs0v
-
nical
Manishearth: should I merge the rigid transform PR now?
-
nical
or do you want to do some modifications beforehand (I have no complaint)
-
jdm
standups: reading webgl/es 2.0/s3tc specs; reviews.
-
crowbot
Status submitted successfully.
-
jdm
ooh, will I actually clear off all of the urgent items on my todo list today :o
-
ajeffrey
standups: Got bindgen to generate the magicleap C API bindings.
asajeffrey/rust-webvr 900a723
-
crowbot
Status submitted successfully.
-
jdm
\o/
-
Manishearth
nical: sure
-
BitBot
[Github] (servo/euclid) [PR] bors-servo merged #337 into master: Simplify rigid transform premultiplication math -
git.io/fjsLg
-
BitBot
[Github] (servo/servo) [PR] bors-servo merged #23263 into master: Upgrade to rustc 1.36.0-nightly (e305df184 2019-04-24) -
git.io/fjs8X
-
BitBot
[Github] (servo/servo) [issue] jdm opened #23266: Create a helper API for entering a DOM object's compartment -
git.io/fjszo
-
BitBot
[Github] (servo/servo) [issue] asajeffrey opened #23267: Magic Leap app doesn't package with SDK v0.20.0 -
git.io/fjszD
-
ajeffrey
jdm: re ^ we can either get rid of the use of the deprecated API (and require v0.20.0 of the SDK, and a matching OS version on the device) or suppress the deprecation error, neither sounds great to me.
-
jdm
ajeffrey: we could ifdef, couldn't we?
-
ajeffrey
jdm: blech, yes we could but really???
-
jdm
ajeffrey: let's just suppress the deprecation for now
-
jdm
and kick that can down the road
-
ajeffrey
jdm: OK
-
BitBot
[Github] (servo/servo) [PR] asajeffrey requested #23268 merge into master: Magicleap package fixes -
git.io/fjsgU
-
bzm3r
-
bzm3r
someone asked me a good question though, which i am having trouble answering: "is my understanding correct that, due to accumulating signed areas additively to the alpha texture, pathfinder can't batch multiple separate but overlapping paths in one draw call?"
-
bzm3r
i believe the answer is: "pathfinder can do this, and in fact, doing this efficiently is what gives pathfinder its claim to fame"
-
bzm3r
but when i tried to back up my answer, by playing around with a silly model i made in inkscape, i ran into some issues
-
pcwalton
bzm3r: Pathfinder uses the GPU blender
-
pcwalton
to accumulate signed areas
-
pcwalton
so yes, it can
-
bzm3r
GPU blender, okay, will put that in as a note, and research it further
-
pcwalton
glBlendFunc()
-
pcwalton
this is how it accumulates the signed areas
-
pcwalton
btw this is the same thing that Skia's CCPR does I think
-
bzm3r
i see, i see
-
bzm3r
does pathfinder remake tiles as a user changes their "zoom level"?
-
bzm3r
a tile is 16x16 pixels, these are screen pixels, right?
-
pcwalton
yes and yes
-
pcwalton
everything bets blown away and rebuilt from scratch
-
pcwalton
this might seem inefficient, but there is really no alternative other than tessellation in the 3D case especially
-
pcwalton
and tessellation has its own problems
-
bzm3r
right
-
pcwalton
one could imagine just changing the tile size and resubmitting the commands to GPU without any CPU work, but this gets tricky (1) when the zoom is a subpixel amount; (2) when you want to further subdivide curves
-
pcwalton
furthermore we're often GPU bound anyway so this wouldn't really help that much
-
bzm3r
how does signed area coverage work with respect to anti aliasing?
-
pcwalton
I'm confused as to what you're asking -- signed area coverage is an implementation of antialiasing
-
bzm3r
pcwalton you're right, the question is confusing because i was confused
-
bzm3r
an old pathfinder blog post mentions:
nothings.org/gamedev/rasterize
-
raph
pcwalton: why is the bottom edge of the tile the tile size rather than max(from, to)?
github.com/pcwalton/pathfinder/blob…/resources/shaders/fill.vs.glsl#L48
-
pcwalton
raph: because of the fill rule
-
pcwalton
we fill top to bottom for historical reasons
-
raph
Got it, thanks.
-
raph
I get it, but I think explaining it for a general audience would be something :)
-
pcwalton
so if you have a tile that looks like ◀
-
pcwalton
then you will have two fills, one on the top that looks like / and one on the bottom that looks like \
-
pcwalton
the top one will add +1 to everything below it
-
pcwalton
and the bottom one will add -1 to everything below it
-
bzm3r
ahhh
-
pcwalton
the bottom area cancels out to 0 so you end up with the shape of ◀
-
bzm3r
right
-
pcwalton
along the edges it'll be fractional positive/negative amounts
-
pcwalton
to create the antialiasing
-
raph
That sounds like it could be converted into a fine illustration (hint, hint, bzm3r)
-
bzm3r
raph already working on it! started a bunch of illustrator drawings yesterday
-
raph
The tricky bits are tile crossings
-
raph
I don't fully understand that yet.
-
bzm3r
you mean, path crossings?
-
pcwalton
the question is how to accumulate it but that's where the GPU's blending hardware comes in
-
raph
I mean lines that cross tile boundaries, or another way of putting it is shapes that are more than one tile.
-
pcwalton
I render in a mode where I say "if two fragments overlap the same spot, then add their results" -- glBlendFunc(GL_ONE, GL_ONE) with glBlendEquation(FUNC_ADD)
-
pcwalton
and since it's a signed floating point framebuffer I'm rendering to, it all just works
-
raph
pcwalton: nice. That's basically the same as an atomic add, I imagine
-
raph
But blend hardware is super optimized
-
pcwalton
raph: yep, except the GPU has special hardwarefor it
-
pcwalton
on desktop, specialized ALUs that collect the outgoing fragments from the shading units and execute the operation
-
raph
pcwalton: I'm going to explore my compute idea during a cabin-in-the-woods retreat next week
-
raph
It's going to be very different from this, very much "pull" while you're doing "push"
-
bzm3r
pcwalton for path crossings (where there are two paths crossing within one tile), would pathfinder have two alpha tiles associated with that tile? one for each path?
-
raph
(not to derail this discussion)
-
pcwalton
(BTW, this is why GPU manufacturers don't want to add more blend modes -- it's because they're literally separate ALUs and they don't want to spend more die area down there)
-
bzm3r
and each alpha tile would sample from a solid tile that's relevant to a particular path (if the paths had different colours?)
-
raph
bzm3r: if you have a path with multiple lines, they all accumulate to the same tile
-
raph
but a line might cross multiple tiles, so that means multiple fill instances
-
raph
(sorry if I've gotten that wrong, patrick can probably explain better than i can)
-
raph
two paths with different colors are different tiles of course
-
bzm3r
raph ah yeah, i'm asking about the two paths with different colours case
-
bzm3r
they occupy the same "physical tile" (in terms of screen pixels)
-
bzm3r
but have two different alpha tiles?
-
raph
right, i think for the purpose of accumulation they're completely separate
-
pcwalton
bzm3r: they are separate "objects" and so will have completely different tiles allocated
-
pcwalton
and will eventually be blended during the alpha tile compositing phase
-
pcwalton
during the fill phase they're considered completely independent
-
bzm3r
right
-
raph
the "sampling from a solid tile" is something that happens in alpha tile compositing, not fill, right?
-
bzm3r
raph if i am understanding correctly, the answer is "yes", right?
-
bzm3r
a fill just generates the alpha tile
-
pcwalton
so we never sample from a solid tile
-
pcwalton
solid tiles only exist as rendering commands
-
bzm3r
oh true, i wrote this out in the condensation too...
-
pcwalton
there is a lookup table texture that records the color ("shader") of each object
-
pcwalton
but that's separate
-
pcwalton
raph: so, when it comes to tile crossings, there are basically 3 cases to handle
-
pcwalton
remember that we are doing a top to bottom sweep line
-
pcwalton
(1) horizontal tile crossings -- i.e. a single line crosses multiple tiles
-
pcwalton
in that case, we simply have a loop that clips and generates fill commands
-
pcwalton
let me link the code
-
pcwalton
-
pcwalton
I really should move that into tiles.rs
-
pcwalton
(2) vertical tile crossings -- in that case we clip at the current tile strip lower edge, handle the top part, and push the bottom part onto the active edge list
github.com/pcwalton/pathfinder/blob/pf3/renderer/src/tiles.rs#L473
-
pcwalton
a "tile strip" is a horizontal row of tiles
-
raph
ok i'm gonna read those
-
bzm3r
me too
-
bzm3r
and there is a third case?
-
pcwalton
(3) is when you have are processing an existing active edge -- and I actually have to page this back into my brain :(
-
pcwalton
-
bzm3r
will look through it!
-
bzm3r
ty very much, as always, for your time
-
pcwalton
the basic idea is that you have to handle fill coming through from tiles above
-
bzm3r
i hope to produce some nice illustrations for you too :)
-
pcwalton
and there are two cases to consider
-
pcwalton
(3a) there is fill coming from tiles above that partially, but not fully, spans the horizontal width of this tile
-
pcwalton
-
pcwalton
(3b) there is fill that completely spans the horizontal width of this tile,
github.com/pcwalton/pathfinder/blob/pf3/renderer/src/tiles.rs#L174
-
raph
so just to make sure i have this right, if you were rendering a shape that was n tiles wide and 1 tile high, you wouldn't need to worry about active edges
-
pcwalton
raph: yes
-
pcwalton
in case (3b) we do not explicitly add a fill operation, but add a constant amount of fill to the tile itself, known as "backdrop"
-
pcwalton
the reason why we do this is that it makes it easy to identify solid tiles -- they are tiles with backdrop != 0 that have no fill operations attached to them
-
pcwalton
also, it cuts down on the load on the GPU blender a bit
-
raph
in that case the backdrop is always an integer winding number?
-
pcwalton
right.
-
pcwalton
it's actually an i8
-
pcwalton
(might need to be expanded in the future)
-
» raph makes a shape which nests 129 concentric paths
-
pcwalton
used to be i16, but then I needed to pack tighter and I made it i8
-
pcwalton
oh well, could always expand it
-
raph
sure sure, i don't think it's a big problem in practice
-
pcwalton
raph: oh, I also figured if it was a problem I could just overflow by adding explicit fill operations and making such tiles not solid
-
pcwalton
so "solid tile" would mean "tile whose winding number is between [-128,-1] or [1,127]"
-
pcwalton
which seems fine in practice
-
pcwalton
btw, you can see why I do this on CPU -- it's quite branchy and furthermore each curve in the path can produce a variable number of outputs (alpha tiles, solid tiles, fills). it is possible to do on GPU, but I'm not certain it will really be a speedup on most hardware
-
pcwalton
doing transforms and clipping on GPU would be more of an obvious win. unfortunately that happens before tiling, so it would be CPU -> GPU -> CPU -> GPU. still might be worth it, though
-
pcwalton
even clipping is difficult on GPU, though, because Sutherland-Hodgman clipping creates a varying number of output curves for each input curve
-
pcwalton
some amount of stream compaction will likely be necessary
-
raph
yeah, i don't see doing this exact thing on gpu
-
raph
i do wonder if it might be possible to do something a little more gpu-ish. for example, you could render delta-backdrop, then do cumulative sum to get the actual backdrop
-
raph
or, and i'm totally just spitballing here, you could do the tile-backdrop calculation in one dimension but not the other
-
raph
so you do a cumulative sum (font-rs style) in 1/16th the pixels
-
raph
as opposed to the current backdrop calculation needing to be done for 1/256 the pixels
-
jdm
oh good, there's a servo s3 bucket that I don't have access to
-
raph
(you can tell I like cumulative sum - it's not necessarily a good idea on gpu unless you have compute, and maybe not even then)
-
raph
Ok I understand the algorithm. Good stuff.
-
raph
Lemme see if I understand something (this might be useful for exposition)
-
raph
Say you have an axis-aligned rectangle
-
raph
If the whole thing fits within a single tile in height, then only the horizontal edges contribute, the vertical edges could basically be discarded
-
raph
But if the vertical edges cross tile boundaries, then those crossings generate "active tile fills"
-
raph
which are essentially equivalent to horizontal lines at the top of the tile strip
-
raph
(one difference being that if the active tile fill covers a whole tile, it becomes backdrop)
-
ajeffrey
standups: Uploaded the pathfinder magicleap demo to
dropbox.com/s/ovxfj3ghwghg7xk/PathfinderDemo.mpk
-
crowbot
Status submitted successfully.
-
Manishearth
standups: webxr spec discussions on editor-collab
-
crowbot
Status submitted successfully.
-
BitBot
[Github] (servo/servo) [PR] bors-servo merged #23256 into master: Use clang-cl for Windows builds -
git.io/fjsm3
-
pcwalton
raph: vertical edges can be completely ignored (they are not, but they contribute zero)
-
pcwalton
that is, vertical lines
-
pcwalton
raph: I considered doing prefix sum but decided against it for two reasons
-
pcwalton
(1) it doesn't really result in that many fewer pixels drawn -- the GPU hardware rasterizer does not know how to rasterize a line with anything other than Bresenham's algorithm (which is not sufficient) and therefore I draw the whole axis aligned bounding box for each line
-
pcwalton
the only thing that prefix sum would mean in terms of fragment shader invocation is that I wouldn't have to extend the AABB to the bottom of each tile
-
pcwalton
but I would be paying for that with the actual prefix sum
-
pcwalton
I did some back of the envelope measurements to test this and found that the gains from only drawing the AABB of each line would not be very large
-
pcwalton
(2) prefix sum is not realistically implementable without compute shader. I considered various techniques but they were too ugly for words, and they were way too expensive in terms of ROPs
-
pcwalton
it might be worth revisiting this choice if you had something like CUDA Dynamic Parallelism to be able to launch work groups from within the GPU itself
developer.download.nvidia.com/asset…ief_Dynamic_Parallelism_in_CUDA.pdf
-
pcwalton
but it's probably years before we get this in GL or Vulkan, if we ever get it (it's been 5 years and there has been no interest outside of CUDA)
-
raph
pcwalton: right, I see the limitations of prefix sum on GPU
-
pcwalton
raph: the real benefit of prefix sum would be that it allows for greater parallelism during the tiling phase
-
raph
and to be clear, I'm not advocating prefix sum for the actual device pixels (like font-rs), I'm talking about tile generation
-
pcwalton
I could envision a fully on-GPU tiling phase that uses compute shader on all curves in parallel to generate fills and allocate alpha tiles
-
pcwalton
and then uses prefix sum to propagate fill between the tiles and generate solid tiles
-
pcwalton
in fact I suspect that is how I will want to proceed if and when I have an optional GPU tiling mode
-
pcwalton
however, I don't really see this as practical without compute shader. it might be doable, but not maintainably :)
-
raph
Actually I don't think you need prefix sum for that; if all tiles see all curves, then you can just add up the contribution from each curve
-
pcwalton
so GL3 support demands that I do tiling on CPU, for now.
-
raph
I'm likely to experiment with this next week
-
pcwalton
I suspect that will be asymptotically too slow
-
pcwalton
but I'm happy to be proven wrong
-
pcwalton
raph: my gut feeling tells me to do this:
-
pcwalton
allocate one GPU thread per curve
-
raph
It's very likely what I end up with will be worse than PF3 :)
-
pcwalton
then for each curve sequentially output fills in the PF3 sense, using atomics to allocate alpha tiles as you go
-
pcwalton
happily, most curves in complex SVGs (which are, after all, the SVGs that we care about accelerating, since complex SVGs are the slow ones) only have 1 or 2 fills
-
pcwalton
so there should be quite good parallelism from that
-
raph
right
-
pcwalton
essentially write a geometry shader.
-
pcwalton
in fact, I thought about using geometry shaders and asked Kai from the Chrome team yesterday if we could have them in WebGPU for this -- he said no :)
-
pcwalton
(which I'm fine with)
-
raph
webgpu is on track to have an ok compute shader?
-
raph
so having a curve generate fills makes sense, but how do you generate the backdrops (and the "active edge fills" which can be seen as partial backdrop)?
-
raph
in your current code, this depends on the active edge list being sorted, doing the whole thing in scanline order
-
Manishearth
standups: some looking at webaudio stuff
-
crowbot
Status submitted successfully.
-
pcwalton
raph: yes, Kai demo'd it to me yesterday :)
-
raph
sweet. And it has stuff like ballot and shuffle?
-
pcwalton
dunno the specifics
-
raph
I know I'm limiting myself very much to the future by experimenting with compute
-
raph
And I should be very clear what my goals are here
-
pcwalton
no, it's valuable. if you succeed I'll probably seriously consider merging it into PF
-
pcwalton
given how we're skating right up to the perf edge in VR we need all the help we can get
-
raph
I'm looking to see if I can do something simple and with smooth performance characteristics
-
pcwalton
raph: so here's what I was thinking: you first generate fills in compute shader, and then you go to fill them as usual -- and as you fill, you accumulate "outgoing fill" either in a separate buffer or in the bottom row of each tile
-
pcwalton
or, you could do a separate pass to accumulate outgoing fill, either way
-
raph
Right, that's what I was talking about with the cumulative sums
-
pcwalton
then you use a prefix sum between tiles to propagate the fill around
-
raph
You say prefix sum, I say cumulative sum, let's call the whole thing off
-
pcwalton
:)
-
pcwalton
anyway, then you have one "backdrop" per *vertical column* of each tie
-
pcwalton
tie
-
pcwalton
tile. stupid macbook pro keyboard.
-
raph
Yup. I think that would work.
-
pcwalton
at that point you can allocate solid tiles
-
pcwalton
I do believe that you will never have to allocate any alpha tiles after initial fill generation as long as you make sure to allocate alpha tiles even if there no actual fills
-
pcwalton
think vertical lines
-
pcwalton
they generate zero-size fills
-
pcwalton
however, you still need to allocate alpha tiles for them because they are partially filled
-
raph
I believe you're correct
-
raph
But I'm not sure we're totally on the same page
-
pcwalton
BTW, the "Massively Parallel Vector Graphics" paper has the same idea, but it calls active edge fills/backdrops "shortcuts"
-
pcwalton
based on the analogy of Sutherland-Hodgman clipping
-
pcwalton
I don't like that term but you might see it around
-
raph
I'm seeing generating those alpha tiles from intersections between vertical lines and tile boundaries
-
raph
But I thought I heard you say vertical lines don't contribute anything
-
pcwalton
right, so consider rendering an axis aligned (but not tile aligned) filled rect
-
pcwalton
the vertical sides need to get alpha tiles allocated
-
pcwalton
however there will not actually be any fills in them -- or at least if you generate a fill it will be a degenerate fill with zero width and therefore no pixels filled
-
pcwalton
all of the paint in those tiles will be generated from the prefix sum pass
-
pcwalton
this is fine, you just need to make sure that they get an alpha tile assigned to them at some point so that you render them
-
pcwalton
in other words, all of their paint will come from the column backdrops generated during prefix sum
-
raph
Yeah, this is making sense.
-
pcwalton
BTW compute shader is perfectly shippable on most devices these days. shipping large-scale desktop software is the exception, not the rule :)
-
pcwalton
(lucky us)
-
raph
Just took another scan through the MPVG paper. It seems pretty complex :/
-
pcwalton
raph: yes, it's very... engineered :)
-
raph
I'm not 100% convinced by quadtrees, they seem maybe more academic than practical
-
pcwalton
right
-
pcwalton
some gfx people proposed similar things to me at the last All Hands
-
pcwalton
I feel I'd need to see that the benefit is commensurate with the complexity
-
raph
that said, have you been able to compare pf3 performance against their approach?
-
pcwalton
I have not
-
raph
we really need people to do that kind of performance evaluation work
-
pcwalton
agreed, it would be good
-
pcwalton
I would like to, it's just as always time
-
raph
one thing i *do* potentially see as useful from that paper is green's theorem area evaluation of beziers
-
raph
i think it's not just time, i think there's also a case to be made for unbiased evaluation
-
pcwalton
yes, that is cool, especially since I think we're more memory bound than ALU bound in the fragment shader
-
pcwalton
might as well put those cycles to use
-
raph
that's the sense i get too, anything that reduces the number of primitives is a win, as long as it isn't a ridiculous amount of computation
-
raph
green's theorem itself is pretty simple, it's the crossing tests
-
pcwalton
note that splitting curves at tile boundaries is a pain
-
raph
my personal feeling is that quad bez might be the sweet spot
-
pcwalton
I used to do it before I switched to converting to lines on-the-fly
-
raph
cubics are really hard
-
pcwalton
possibly. monotonic quadratic beziers might help
-
pcwalton
definitely want monotonicity so you don't have to deal with two roots all the time
-
raph
i have a pretty darn good cubic to quadratic converter in kurbo
-
pcwalton
a lot of PF3 is the desire to ship something simple so we can build on it later
-
pcwalton
it's been a couple of years, I feel I should show results
-
raph
oh yes, i'm also motivated by simplicity
-
pcwalton
and it's a fairly conservative design that's extensible over time
-
raph
actually, if you don't mind me bending your ear, i can explain more about where my motivation is coming from right now
-
pcwalton
sure
-
raph
i've been following the progress of makepad, and it's quite impressive
-
raph
long story short, it's a very direct generation of gl instances using imgui techniques, with as much intelligence as possible in the shader
-
raph
for example, they have one shader for an animated "tab close" icon
-
raph
performance is impressive, both cpu and gpu are in the 1ms range
-
raph
(they're doing msdf for text, which is a whole nother discussion)
-
raph
so basically if this is the future of gui, then i'm wasting my time with piet
-
raph
i want to explore the question, what if you put the 2d graphics scene graph on the gpu, and evaluate it with compute
-
pcwalton
raph: so before you jump in, note that Unity is rapidly deprecating their IMGUI in favor of Unity UI, which is traditional
-
pcwalton
they did the IMGUI experiment, and are having perf problems, and are throwing it out
-
pcwalton
the jury is still out here
-
raph
right, there are a lot of intertwined problems
-
raph
so i want to know whether gpu compute evaluation of a 2d scene graph can be done performantly
-
pcwalton
and yes, I would like to get Pathfinder to that point, essentially -- putting the 2D graphics scene graph on the GPU and having a full GPU pipeline
-
raph
if yes, then the druid/piet approach is valid
-
pcwalton
WebRender more or less does the Makepad approach
-
raph
if no, that's evidence that you have to design your ui around gpu-friendly primitives to get performance
-
pcwalton
they have special shaders for borders and whatnot
-
pcwalton
I wrote some of them, once upon a time :)
-
raph
rounded rectangles etc?
-
pcwalton
yeah
-
pcwalton
they have extremely optimized paths for rounded rects, CSS borders, even dashing and so forth
-
raph
i'm particularly interested in clipping and opacity
-
pcwalton
yeah, they have fast paths for that, rounded rect clips in particular are very optimized
-
pcwalton
in ways that do not work for arbitrary vector clips
-
raph
sure, easy to believe
-
raph
so basically I don't feel I have to *do* the whole thing
-
pcwalton
lots of axis aligned bounding box tricks
-
pcwalton
you might compare WR perf with Pathfinder on scenes that both can render
-
raph
but I just want to get a swag on whether it's possible
-
pcwalton
to get some initial numbers
-
raph
if so, then i'll push forward with piet and expect pretty good performance on direct2d at least this year
-
pcwalton
this is easy to do, just save a web page from a browser to PDF, then convert the PDF to SVG and render it
-
pcwalton
... via Pathfinder
-
pcwalton
then compare to WebRender rendering the page directly
-
raph
right'
-
pcwalton
beware of text though, PF has no glyph caching optimizations
-
raph
text is kinda the interesting part :)
-
pcwalton
well, as far as text is concerned, the big question in my mind is whether you want (a) static 2D text, (b) smoothly zoomable 2D text, or (c) 3D text
-
raph
but the alternative hypothesis is that a 2d api such as piet is a performance bottleneck that will kill performance in the long run
-
pcwalton
I don't know…I used to be skeptical of traditional 2D APIs but now with PF's experience I'm warming up to them
-
raph
certainly if you have to do a lot of the work on cpu that is true
-
pcwalton
let me put it this way
-
pcwalton
I don't think that the current Web, or desktop UIs, on a flat screen is a particularly good fit for a traditional 2D vector API descended from PostSCript
-
raph
so i basically want to explore uploading bits of scene graph to the gpu and letting the gpu basically run the whole pipeline
-
pcwalton
mostly because they involve translating relatively static axis aligned boxes
-
pcwalton
to accelerate this workload, it's mostly blitting. vector needs are minimal if any
-
pcwalton
for text, you typically just render to a glyph atlas and then call it a day
-
pcwalton
however, for game UI, or for UI in VR, that's a whole 'nother ball game
-
raph
I feel like I understand this part of the space pretty well
-
pcwalton
game UI is mostly arbitrary vector scenes, often designed in Flash, that are frequently scaled
-
pcwalton
and VR of course is even more so
-
pcwalton
so tl;dr I think WebRender/Makepad's approach is probably best for desktop and mobile apps
-
raph
I think the main part I don't understand well is getting the scene out of the UI logic and onto the GPU
-
pcwalton
ah, I see what you mean
-
raph
The other thing I want to explore here is immediate / retained
-
raph
druid is very old-school retained, in sharp contrast to makepad
-
pcwalton
well, you are always going to have a step that generates GPU buffers from a CPU-side display list. I don't see a way around it
-
raph
ah, but i think there *is* something we can do
-
pcwalton
a CPU-side data structure
-
pcwalton
one could imagine a GPU data structure that you incrementally update
-
pcwalton
that may be what you were about to propose :)
-
raph
which is generally inspired by flutter - basically widgets render into "layers", which are bits of display list
-
bzm3r
(just caught up to this discussion, ...so much to learn....but pretty cool :) )
-
raph
clipping and translation are just nodes in the scene graph, so if you want to scroll, just update those, the contents of the scrolled window are retained
-
raph
the sense i get is that wr tries to do a bunch of this, but makepad is aggressively immediate
-
raph
the world gets drawn from zero every frame
-
pcwalton
raph: I mean, that's true for WR too
-
pcwalton
I don't see how you don't redraw the world from zero every frame
-
raph
i mean it generates all instances on the cpu and uploads them every frame
-
pcwalton
That's what WR does too
-
pcwalton
well, ok, not for scrolling
-
raph
so any sense of retaining it has is to avoid traversing the dom?
-
raph
right, scrolling
-
bzm3r
(dom stands for?)
-
raph
maybe what i'm talking about is generalizing what wr does for scrolling
-
raph
document object model, it's the representation of the web content
-
raph
i see adding scaling and opacity, for pinch-to-zoom and layer fade effects
-
raph
anyway, you upload a tree root (i'm thinking in immutable style) and then tell the gpu "go" and it does the rest
-
raph
so very much *unlike* flutter and ios you're not rendering these layers to textures for compositing
-
raph
so it's entirely possible i'm smoking crack
-
raph
but i want to build a rough prototype and take it for a spin on real hardware
-
raph
i don't expect this retained stuff to cover for too many sins, it still has to be fast to build the scene graph
-
pcwalton
yeah, having a retained GPU data structure sounds doable
-
pcwalton
that you then render with indirect draw
-
raph
but there i'm actually not worried, i see this as a fairly thin serialization/packing step over the traditional 2d api
-
pcwalton
this is what indirect draw was made for -- putting scene graphs on GPU
-
raph
one thing that i do expect to be done on cpu is bounding boxes
-
raph
(basically per-tile i want to do a bbox culled traversal of the graph)
-
raph
i think for text glyphs etc you already have a bbox, i don't see this adding significant cost cpu-side
-
pcwalton
yeah, you will want to measure the perf of doing things on CPU before deciding to incur the complexity of moving to GPU
-
raph
yes, i think it's *similar* to indirect draw but very different because my primitives are 2d not 3d
-
raph
so this is why i'm going all in on compute, rather than trying to map to things that existing 3d pipelines do well
-
raph
in any case, it feels worth playing with
-
raph
in the worst case, it won't work but i will have learned a lot :)
-
bzm3r
pcwalton why is create_vertex_array only ever used by the GroundLineDrawing object in the demo?
-
pcwalton
bzm3r: because the actual code that does the vector renderer is in renderer/src/gpu/renderer.rs
-
bzm3r
found it :)
-
bzm3r
so every fill has to create a vertex array
-
bzm3r
and we don't really know up front how many fills there are going to be
-
BitBot
[Github] (servo/servo) [issue] Manishearth opened #23269: Gamepad should be live -
git.io/fjsV8
-
Manishearth
standups: gamepad liveness digging
servo/servo #23269
-
crowbot
Issue #23269: Gamepad should be live -
servo/servo #23269
-
crowbot
Status submitted successfully.
-
raph
By my read, there's a vertex array for a *slice* of fills, might be amortized over a fairly large number
-
pcwalton
correct
-
bzm3r
raph there can be an indeterminate number of slices of fills too though?
-
raph
it tries to do MAX_FILLS_PER_BATCH, which is 16000
-
bzm3r
right
-
bzm3r
right right right
-
raph
16384, sorry
-
raph
over 9000 in any case
-
bzm3r
heh
-
bzm3r
my head's hurting a bit
-
pcwalton
raph: sorry, there are two MAX_FILLS_PER_BATCH, one of which is unused :(
-
» pcwalton makes a note to fix that
-
pcwalton
it's 0x1000
-
pcwalton
err, sorry, 0x4000
-
pcwalton
yeah, 16384
-
pcwalton
bzm3r: correct, there are an indeterminate number. we stream batches of 16K fills to the GPU as soon as they are ready.
-
bzm3r
pcwalton so, opengl "manages" vertex array object creation for you, but in vulkan/gfx-hal, you manage "vertex array object" ("AttributeDescs") --- basically, there's going to be one vertex buffer each for fills, and alpha tiles
-
bzm3r
the data within each buffer will be refreshed as new batches are created
-
pcwalton
yes
-
bzm3r
yeah, then i think i have a good handle on what `create_vertex_array` should do for the gfx-hal backend: it will create an attribute desc, and then these attribute descs will be stored within the Device struct
-
bzm3r
at "program" (pipeline in gfx-hal land) creation, use the stored attribute descs to fill out necessary info for pipeline creation
-
raph
bzm3r: you might want to have two buffers, so you can be filling one while the gpu is drawing the other
-
raph
you have to have barriers etc so you don't start writing into a buffer until the gpu is done consuming it
-
bzm3r
raph yeah, was reading a bunch about that yesterday
-
bzm3r
put the relevant resources into the graphics resource dump too
-
bzm3r
i'm still somewhat far from that point though
-
BitBot
[Github] (servo/servo) [PR] Manishearth requested #23270 merge into master: Add input validation for AudioParam methods -
git.io/fjsV6
-
bzm3r
for now i was just figuring out what create_vertex_array should do (recall our discussion yesterday? i've changed my mind again, after seeing how renderer.rs uses the device trait)
-
Manishearth
standups: webaudio validation
servo/servo #23270
-
crowbot
PR #23270: Add input validation for AudioParam methods -
servo/servo #23270
-
crowbot
Status submitted successfully.
-
bzm3r
(so i'll stick as closely as possible to the device trait, and only make small modifications like &mut)
-
raph
sounds right
-
BitBot
[Github] (servo/servo) [issue] Manishearth opened #23271: Audioparam tests seem to have lots of failures with the specific values -
git.io/fjsVp
-
pcwalton
yay, reprojection works now with improved quality
-
pcwalton
and no magic constants, everything is worked out on paper :)