Tries to reset returned queries in background when possible, rather than ending the render pass.
Still ends render pass when resetting a counter after draws, but maybe that can be solved too. (by just pulling an empty object off the pool?)
This path is only taken if the much faster check of "is the buffer rented at all" is triggered, so it doesn't actually end up costing too much, and the time saved by not ending render passes (and on gpu for not waiting on barriers) is probably helpful.
Avoids ending render passes to update buffer data (not all the time)
- 140-180 to 35-45 in SMO metro kingdom (these updates are in the UI)
- Very variable 60-150(!) to 16-25 in mario kart 8 (these updates are in the UI)
As well as allowing more data to be preloaded persistently, this will also allow more data to be loaded in the preload buffer, which should be faster as it doesn't need to insert barriers between draws. (and on tbdr, does not need to flush and reload tile memory)
Improves performance in GPU limited scenarios. Should notably improve performance on TBDR gpus. Still a lot more to do here.
Fixes a performance regression introduced by texture scaling in the vertex stage where support buffer updates would be very frequent, even at 1x, if any textures were used on the vertex stage.
This check doesn't exactly look cheap (a flag in the shader stage would probably be preferred), but it is much cheaper than uploading scales in both vulkan and opengl, so it will do for now.
Fixes a bunch of freezes with SPIR-V on AMD hardware, and validation errors. Note: Obviously assumes input offsets are constant, which they currently are.
Rather than allocating a large array of all registers for each block in the shader, allocate one array of all registers and clear it between blocks. Reduces allocations in the shader translator.
- Pools for Instructions and LiteralIntegers. Can be passed in when creating the generator module.
- NewInstruction is called instead of new Instruction()
- Ryujinx SpirvGenerator passes in some pools that are static. The idea is for these to be shared between threads eventually.
- Estimate code size when creating the output MemoryStream
- LiteralInteger pools using ThreadStatic pools that are initialized before and after creation... not sure of a better way since the way these are created is via implicit cast.
Also, cache delegates for Spv.Generator for functions that are passed around to GenerateBinary etc, since passing the function raw creates a delegate on each call.
TODO: update python spv cs generator to make the coregrammar with NewInstruction and the `params` overloads.
- Dictionary for lookups of type declarations, constants, extinst
- LiteralInteger internal data format -> ushort
- Deterministic HashCode implementation to avoid spirv result not being the same between runs
- Inline operand list instead of List<T>, falls back to array if many operands. (large performance boost)
TODO: improve instruction allocation, structured program creator, ssa?
Passes BinaryWriter instead of the stream to Write and WriteOperand
- Removes creation of BinaryWriter for each instruction
- Removes allocations for literal string