Commit graph

165 commits

Author SHA1 Message Date
gdkchan 2232e4ae87
Vulkan backend (#2518)
* WIP Vulkan implementation

* No need to initialize attributes on the SPIR-V backend anymore

* Allow multithreading shaderc and vkCreateShaderModule

You'll only really see the benefit here with threaded-gal or parallel shader cache compile.

Fix shaderc multithreaded changes

Thread safety for shaderc Options constructor

Dunno how they managed to make a constructor not thread safe, but you do you. May avoid some freezes.

* Support multiple levels/layers for blit.

Fixes MK8D when scaled, maybe a few other games. AMD software "safe" blit not supported right now.

* TextureStorage should hold a ref of the foreign storage, otherwise it might be freed while in use

* New depth-stencil blit method for AMD

* Workaround for AMD driver bug

* Fix some tessellation related issues (still doesn't work?)

* Submit command buffer before Texture GetData. (UE4 fix)

* DrawTexture support

* Fix BGRA on OpenGL backend

* Fix rebase build break

* Support format aliasing on SetImage

* Fix uniform buffers being lost when bindings are out of order

* Fix storage buffers being lost when bindings are out of order

(also avoid allocations when changing bindings)

* Use current command buffer for unscaled copy (perf)

Avoids flushing commands and renting a command buffer when fulfilling copy dependencies and when games do unscaled copies.

* Update to .net6

* Update Silk.NET to version 2.10.1

Somehow, massive performance boost. Seems like their vtable for looking up vulkan methods was really slow before.

* Fix PrimitivesGenerated query, disable Transform Feedback queries for now

Lets Splatoon 2 work on nvidia. (mostly)

* Update counter queue to be similar to the OGL one

Fixes softlocks when games had to flush counters.

* Don't throw when ending conditional rendering for now

This should be re-enabled when conditional rendering is enabled on nvidia etc.

* Update findMSB/findLSB to match master's instruction enum

* Fix triangle overlay on SMO, Captain Toad, maybe others?

* Don't make Intel Mesa pay for Intel Windows bugs

* Fix samplers with MinFilter Linear or Nearest (fixes New Super Mario Bros U Deluxe black borders)

* Update Spv.Generator

* Add alpha test emulation on shader (but no shader specialisation yet...)

* Fix R4G4B4A4Unorm texture format permutation

* Validation layers should be enabled for any log level other than None

* Add barriers around vkCmdCopyImage

Write->Read barrier for src image (we want to wait for a write to read it)
Write->Read barrier for dst image (we want to wait for the copy to complete before use)

* Be a bit more careful with texture access flags, since it can be used for anything

* Device local mapping for all buffers

May avoid issues with drivers with NVIDIA on linux/older gpus on windows when using large buffers (?)
Also some performance things and fixes issues with opengl games loading textures weird.

* Cleanup, disable device local buffers for now.

* Add single queue support

Multiqueue seems to be a bit more responsive on NVIDIA. Should fix texture flush on intel. AMD has been forced to single queue for an experiment.

* Fix some validation errors around extended dynamic state

* Remove Intel bug workaround, it was fixed on the latest driver

* Use circular queue for checking consumption on command buffers

Speeds up games that spam command buffers a little. Avoids checking multiple command buffers if multiple are active at once.

* Use SupportBufferUpdater, add single layer flush

* Fix counter queue leak when game decides to use host conditional rendering

* Force device local storage for textures (fixes linux performance)

* Port #3019

* Insert barriers around vkCmdBlitImage (may fix some amd flicker)

* Fix transform feedback on Intel, gl_Position feedback and clears to inexistent depth buffers

* Don't pause transform feedback for multi draw

* Fix draw outside of render pass and missing capability

* Workaround for wrong last attribute on AMD (affects FFVII, STRIKERS1945, probably more)

* Better workaround for AMD vertex buffer size alignment issue

* More instructions + fixes on SPIR-V backend

* Allow custom aspect ratio on Vulkan

* Correct GTK UI status bar positions

* SPIR-V: Functions must always end with a return

* SPIR-V: Fix ImageQuerySizeLod

* SPIR-V: Set DepthReplacing execution mode when FragDepth is modified

* SPIR-V: Implement LoopContinue IR instruction

* SPIR-V: Geometry shader support

* SPIR-V: Use correct binding number on storage buffers array

* Reduce allocations for Spir-v serialization

Passes BinaryWriter instead of the stream to Write and WriteOperand

- Removes creation of BinaryWriter for each instruction
- Removes allocations for literal string

* Some optimizations to Spv.Generator

- Dictionary for lookups of type declarations, constants, extinst
- LiteralInteger internal data format -> ushort
- Deterministic HashCode implementation to avoid spirv result not being the same between runs
- Inline operand list instead of List<T>, falls back to array if many operands. (large performance boost)

TODO: improve instruction allocation, structured program creator, ssa?

* Pool Spv.Generator resources, cache delegates, spv opts

- Pools for Instructions and LiteralIntegers. Can be passed in when creating the generator module.
  - NewInstruction is called instead of new Instruction()
  - Ryujinx SpirvGenerator passes in some pools that are static. The idea is for these to be shared between threads eventually.
- Estimate code size when creating the output MemoryStream
- LiteralInteger pools using ThreadStatic pools that are initialized before and after creation... not sure of a better way since the way these are created is via implicit cast.

Also, cache delegates for Spv.Generator for functions that are passed around to GenerateBinary etc, since passing the function raw creates a delegate on each call.

TODO: update python spv cs generator to make the coregrammar with NewInstruction and the `params` overloads.

* LocalDefMap for Ssa Rewriter

Rather than allocating a large array of all registers for each block in the shader, allocate one array of all registers and clear it between blocks. Reduces allocations in the shader translator.

* SPIR-V: Transform feedback support

* SPIR-V: Fragment shader interlock support (and image coherency)

* SPIR-V: Add early fragment tests support

* SPIR-V: Implement SwizzleAdd, add missing Triangles ExecutionMode for geometry shaders, remove SamplerType field from TextureMeta

* Don't pass depth clip state right now (fix decals)

Explicitly disabling it is incorrect. OpenGL currently automatically disables based on depth clamp, which is the behaviour if this state is omitted.

* Multisampling support

* Multisampling: Use resolve if src samples count > dst samples count

* Multisampling: We can only resolve for unscaled copies

* SPIR-V: Only add FSI exec mode if used.

* SPIR-V: Use ConstantComposite for Texture Offset Vector

Fixes a bunch of freezes with SPIR-V on AMD hardware, and validation errors. Note: Obviously assumes input offsets are constant, which they currently are.

* SPIR-V: Don't OpReturn if we already OpExit'ed

Fixes spir-v parse failure and stack smashing in RADV (obviously you still need bolist)

* SPIR-V: Only use input attribute type for input attributes

Output vertex attributes should always be of type float.

* Multithreaded Pipeline Compilation

* Address some feedback

* Make this 32

* Update topology with GpuAccessorState

* Cleanup for merge (note: disables spir-v)

* Make more robust to shader compilation failure

- Don't freeze when GLSL compilation fails
- Background SPIR-V pipeline compile failure results in skipped draws, similar to GLSL compilation failure.

* Fix Multisampling

* Only update fragment scale count if a vertex texture needs a scale.

Fixes a performance regression introduced by texture scaling in the vertex stage where support buffer updates would be very frequent, even at 1x, if any textures were used on the vertex stage.

This check doesn't exactly look cheap (a flag in the shader stage would probably be preferred), but it is much cheaper than uploading scales in both vulkan and opengl, so it will do for now.

* Use a bitmap to do granular tracking for buffer uploads.

This path is only taken if the much faster check of "is the buffer rented at all" is triggered, so it doesn't actually end up costing too much, and the time saved by not ending render passes (and on gpu for not waiting on barriers) is probably helpful.

Avoids ending render passes to update buffer data (not all the time)
- 140-180 to 35-45 in SMO metro kingdom (these updates are in the UI)
- Very variable 60-150(!) to 16-25 in mario kart 8 (these updates are in the UI)

As well as allowing more data to be preloaded persistently, this will also allow more data to be loaded in the preload buffer, which should be faster as it doesn't need to insert barriers between draws. (and on tbdr, does not need to flush and reload tile memory)

Improves performance in GPU limited scenarios. Should notably improve performance on TBDR gpus. Still a lot more to do here.

* Copy query results after RP ends, rather than ending to copy

We need to end the render pass to get the data (submit command buffer) anyways...

Reduces render passes created in games that use queries.

* Rework Query stuff a bit to avoid render pass end

Tries to reset returned queries in background when possible, rather than ending the render pass.

Still ends render pass when resetting a counter after draws, but maybe that can be solved too. (by just pulling an empty object off the pool?)

* Remove unnecessary lines

Was for testing

* Fix validation error for query reset

Need to think of a better way to do this.

* SPIR-V: Fix SwizzleAdd and some validation errors

* SPIR-V: Implement attribute indexing and StoreAttribute

* SPIR-V: Fix TextureSize for MS and Buffer sampler types

* Fix relaunch issues

* SPIR-V: Implement LogicalExclusiveOr

* SPIR-V: Constant buffer indexing support

* Ignore unsupported attributes rather than throwing (matches current GLSL behaviour)

* SPIR-V: Implement tessellation support

* SPIR-V: Geometry shader passthrough support

* SPIR-V: Implement StoreShader8/16 and StoreStorage8/16

* SPIR-V: Resolution scale support and fix TextureSample multisample with LOD bug

* SPIR-V: Fix field index for scale count

* SPIR-V: Fix another case of wrong field index

* SPIRV/GLSL: More scaling related fixes

* SPIR-V: Fix ImageLoad CompositeExtract component type

* SPIR-V: Workaround for Intel FrontFacing bug

* Enable SPIR-V backend by default

* Allow null samplers (samplers are not required when only using texelFetch to access the texture)

* Fix some validation errors related to texel block view usage flag and invalid image barrier base level

* Use explicit subgroup size if we can (might fix some block flickering on AMD)

* Take componentMask and scissor into account when clearing framebuffer attachments

* Add missing barriers around CmdFillBuffer (fixes Monster Hunter Rise flickering on NVIDIA)

* Use ClampToEdge for Clamp sampler address mode on Vulkan (fixes Hollow Knight)

Clamp is unsupported on Vulkan, but ClampToEdge behaves almost the same. ClampToBorder on the other hand (which was being used before) is pretty different

* Shader specialization for new Vulkan required state (fixes remaining alpha test issues, vertex stretching on AMD on Crash Bandicoot, etc)

* Check if the subgroup size is supported before passing a explicit size

* Only enable ShaderFloat64 if the GPU supports it

* We don't need to recompile shaders if alpha test state changed but alpha test is disabled

* Enable shader cache on Vulkan and implement MultiplyHighS32/U32 on SPIR-V (missed those before)

* Fix pipeline state saving before it is updated.

This should fix a few warnings and potential stutters due to bad pipeline states being saved in the cache. You may need to clear your guest cache.

* Allow null samplers on OpenGL backend

* _unit0Sampler should be set only for binding 0

* Remove unused PipelineConverter format variable (was causing IOR)

* Raise textures limit to 64 on Vulkan

* No need to pack the shader binaries if shader cache is disabled

* Fix backbuffer not being cleared and scissor not being re-enabled on OpenGL

* Do not clear unbound framebuffer color attachments

* Geometry shader passthrough emulation

* Consolidate UpdateDepthMode and GetDepthMode implementation

* Fix A1B5G5R5 texture format and support R4G4 on Vulkan

* Add barrier before use of some modified images

* Report 32 bit query result on AMD windows (smo issue)

* Add texture recompression support (disabled for now)

It recompresses ASTC textures into BC7, which might reduce VRAM usage significantly on games that uses ASTC textures

* Do not report R4G4 format as supported on Vulkan

It was causing mario head to become white on Super Mario 64 (???)

* Improvements to -1 to 1 depth mode.

- Transformation is only applied on the last stage in the vertex pipeline.
- Should fix some issues with geometry and tessellation (hopefully)
- Reading back FragCoord Z on fragment will transform back to -1 to 1.

* Geometry Shader index count from ThreadsPerInputPrimitive

Generally fixes SPIR-V emitting too many triangles, may change games in OpenGL

* Remove gl_FragDepth scaling

This is always 0-1; the other two issues were causing the problems. Fixes regression with Xenoblade.

* Add Gl StencilOp enum values to Vulkan

* Update guest cache to v1.1 (due to specialization state changes)

This will explode your shader cache from earlier vulkan build, but it must be done. 😔

* Vulkan/SPIR-V support for viewport inverse

* Fix typo

* Don't create query pools for unsupported query types

* Return of the Vector Indexing Bug

One day, everyone will get this right.

* Check for transform feedback query support

Sometimes transform feedback is supported without the query type.

* Fix gl_FragCoord.z transformation

FragCoord.z is always in 0-1, even when the real depth range is -1 to 1. Turns out the only bug was geo and tess stage outputs.

Fixes Pokemon Sword/Shield, possibly others.

* Fix Avalonia Rebase

Vulkan is currently not available on Avalonia, but the build does work and you can use opengl.

* Fix headless build

* Add support for BC6 and BC7 decompression, decompress all BC formats if they are not supported by the host

* Fix BCn 4/5 conversion, GetTextureTarget

BCn 4/5 could generate invalid data when a line's size in bytes was not divisible by 4, which both backends expect.

GetTextureTarget was not creating a view with the replacement format.

* Fix dependency

* Fix inverse viewport transform vector type on SPIR-V

* Do not require null descriptors support

* If MultiViewport is not supported, do not try to set more than one viewport/scissor

* Bounds check on bitmap add.

* Flush queries on attachment change rather than program change

Occlusion queries are usually used in a depth only pass so the attachments changing is a better indication of the query block ending.

Write mask changes are also considered since some games do depth only pass by setting 0 write mask on all the colour targets.

* Add support for avalonia (#6)

* add avalonia support

* only lock around skia flush

* addressed review

* cleanup

* add fallback size if avalonia attempts to render but the window size is 0. read desktop scale after enabling dpi check

* fix getting window handle on linux. skip render is size is 0

* Combine non-buffer with buffer image descriptor sets

* Support multisample texture copy with automatic resolve on Vulkan

* Remove old CompileShader methods from the Vulkan backend

* Add minimal pipeline layouts that only contains used bindings

They are used by helper shaders, the intention is avoiding needing to recompile the shaders (from GLSL to SPIR-V) if the bindings changes on the translated guest shaders

* Pre-compile helper shader as SPIR-V, and some fixes

* Remove pre-compiled shaderc binary for Windows as its no longer needed by default

* Workaround RADV crash

Enabling the descriptor indexing extension, even if it is not used, forces the radv driver to use "bolist".

* Use RobustBufferAccess on NVIDIA gpus

Avoids the SMO waterfall triangle on older NVIDIA gpus.

* Implement GPU selector and expose texture recompression on the UI and config

* Fix and enable background compute shader compilation

Also disables warnings from shader cache pipeline misses.

* Fix error due to missing subpass dependency when Attachment Write -> Shader Read barriers are added

* If S8D24 is not supported, use D32FS8

* Ensure all fences are destroyed on dispose

* Pre-allocate arrays up front on DescriptorSetUpdater, allows the removal of some checks

* Add missing clear layer parameter after rebase

* Use selected gpu from config for avalonia (#7)

* use configured device

* address review

* Fix D32S8 copy workaround (AMD)

Fixes water in Pokemon Legends Arceus on AMD GPUs. Possibly fixes other things.

* Use push descriptors for uniform buffer updates (disabled for now)

* Push descriptor support check, buffer redundancy checks

Should make push descriptors faster, needs more testing though.

* Increase light command buffer pool to 2 command buffers, throw rather than returning invalid cbs

* Adjust bindings array sizes

* Force submit command buffers if memory in use by its resources is high

* Add workaround for AMD GCN cubemap view sins

`ImageCreateCubeCompatibleBit` seems to generally break 2D array textures with mipmaps... even if they are eventually aliased as a cubemap with mipmaps. Forcing a copy here works around the issue.

This could be used in future if enabling this bit reduces performance on certain GPUs. (mobile class is generally a worry)

Currently also enabled on Linux as I don't know if they managed to dodge this bug (someone please tell me). Not enabled on Vega at the moment, but easy to add if the issue is there.

* Add mobile, non-RX variants to the GCN regex.

Also make sure that the 3 digit ones only include numbers starting with 7 or 8.

* Increase image limit per stage from 8 to 16

Xenoblade Chronicles 2 was hiting the limit of 8

* Minor code cleanup

* Fix NRE caused by SupportBufferUpdater calling pipeline ClearBuffer

* Add gpu selector to Avalonia (#8)

* Add gpu selector to avalonia settings

* show backend label on window

* some fixes

* address review

* Minor changes to the Avalonia UI

* Update graphics window UI and locales. (#9)

* Update xaml and update locales

* locale updates

Did my best here but likely needs to be checked by native speakers, especially the use of ampersands in greek, russian and turkish?

* Fix locales with more (?) correct translations.

* add separator to render widget

* fix spanish and portuguese

* Add new IdList, replaces buffer list that could not remove elements and had unbounded growth

* Don't crash the settings window if Vulkan is not supported

* Fix Actions menu not being clickable on GTK UI after relaunch

* Rename VulkanGraphicsDevice to VulkanRenderer and Renderer to OpenGLRenderer

* Fix IdList and make it not thread safe

* Revert useless OpenGL format table changes

* Fix headless project build

* List throws ArgumentOutOfRangeException

* SPIR-V: Fix tessellation

* Increase shader cache version due to tessellation fix

* Reduce number of Sync objects created (improves perf in some specific titles)

* Fix vulkan validation errors for NPOT compressed upload and GCN workaround.

* Add timestamp to the shader cache and force rebuild if host cache is outdated

* Prefer Mail box present mode for popups (#11)

* Prefer Mail box present mode

* fix debug

* switch present mode when vsync is toggled

* only disable vsync on the main window

* SPIR-V: Fix geometry shader input load with transform feedback

* BC7 Encoder: Prefer more precision on alpha rather than RGB when alpha is 0

* Fix Avalonia build

* Address initial PR feedback

* Only set transform feedback outputs on last vertex stage

* Address riperiperi PR feedback

* Remove outdated comment

* Remove unused constructor

* Only throw for negative results

* Throw for QueueSubmit and other errors

No point in delaying the inevitable

* Transform feedback decorations inside gl_PerVertex struct breaks the NVIDIA compiler

* Fix some resolution scale issues

* No need for two UpdateScale calls

* Fix comments on SPIR-V generator project

* Try to fix shader local memory size

On DOOM, a shader is using local memory, but both Low and High size are 0, CRS size is 1536, it seems to store on that region?

* Remove RectangleF that is now unused

* Fix ImageGather with multiple offsets

Needs ImageGatherExtended capability, and must use `ConstantComposite` instead of `CompositeConstruct`

* Address PR feedback from jD in all projects except Avalonia

* Address most of jD PR feedback on Avalonia

* Remove unsafe

* Fix VulkanSkiaGpu

* move present mode request out of Create Swapchain method

* split more parts of create swapchain

* addressed reviews

* addressed review

* Address second batch of jD PR feedback

* Fix buffer <-> image copy row length and height alignment

AlignUp helper does not support NPOT alignment, and ASTC textures can have NPOT block sizes

* Better fix for NPOT alignment issue

* Use switch expressions on Vulkan EnumConversion

Thanks jD

* Fix Avalonia build

* Add Vulkan selection prompt on startup

* Grammar fixes on Vulkan prompt message

* Add missing Vulkan migration flag

Co-authored-by: riperiperi <rhy3756547@hotmail.com>
Co-authored-by: Emmanuel Hansen <emmausssss@gmail.com>
Co-authored-by: MutantAura <44103205+MutantAura@users.noreply.github.com>
2022-07-31 18:26:06 -03:00
gdkchan 37b6e081da
Fix DMA linear texture copy fast path (#3496)
* Fix DMA linear texture copy fast path

* Formatting
2022-07-28 13:46:12 -03:00
gdkchan 3c3bcd82fe
Add a sampler pool cache and improve texture pool cache (#3487)
* Add a sampler pool cache and improve texture pool cache

* Increase disposal timestamp delta more to be on the safe side

* Nits

* Use abstract class for PoolCache, remove factory callback
2022-07-27 21:07:48 -03:00
riperiperi 5811d121df
Avoid scaling 2d textures that could be used as 3d (#3464) 2022-07-15 09:24:13 -03:00
gdkchan 5afd521c5a
Bindless elimination for constant sampler handle (#3424)
* Bindless elimination for constant sampler handle

* Shader cache version bump

* Update TextureHandle.ReadPackedId for new bindless elimination
2022-07-02 15:03:35 -03:00
gdkchan 625f5fb88a
Account for pool change on texture bindings cache (#3420)
* Account for pool change on texture bindings cache

* Reduce the number of checks needed
2022-06-25 16:52:38 +02:00
gdkchan e747f5cd83
Ensure texture ID is valid before getting texture descriptor (#3406) 2022-06-24 02:41:57 +02:00
riperiperi 68f9091870
Account for res scale changes when updating bindings (#3403)
Fixes a regression introduced by the texture bindings PR.

Also renames TextureStatePerStage, as it's no longer per stage.
2022-06-17 17:41:38 -03:00
riperiperi 99ffc061d3
Optimize Texture Binding and Shader Specialization Checks (#3399)
* Changes 1

* Changes 2

* Better ModifiedSequence handling

This should handle PreciseEvents properly, and simplifies a few things.

* Minor changes, remove debug log

* Handle stage.Info being null

Hopefully fixes Catherine crash

* Fix shader specialization fast texture lookup

* Fix some things.

* Address Feedback Part 1

* Make method static.
2022-06-17 13:09:14 -03:00
gdkchan 851f56b08a
Support Array/3D depth-stencil render target, and single layer clears (#3400)
* Support Array/3D depth-stencil render target, and single layer clears

* Alignment
2022-06-14 13:30:39 -03:00
gdkchan a3e7bb8eb4
Copy dependency for multisample and non-multisample textures (#3382)
* Use copy dependency for textures that differs in multisample but are otherwise compatible

* Remove allowMs flag as it's no longer required for correctness, it's just an optimization now

* Dispose intermmediate pool
2022-06-05 14:06:47 -03:00
riperiperi d64594ec74
Fix various issues with texture sync (#3302)
* Fix various issues with texture sync

A variable called _actionRegistered is used to keep track of whether a tracking action has been registered for a given texture group handle. This variable is set when the action is registered, and should be unset when it is consumed. This is used to skip registering the tracking action if it's already registered, saving some time for render targets that are modified very often.

There were two issues with this. The worst issue was that the tracking action handler exits early if the handle's modified flag is false... which means that it never reset _actionRegistered, as that was done within the Sync() method called later. The second issue was that this variable was set true after the sync action was registered, so it was technically possible for the action to run immediately, set the flag to false, then set it to true.

Both situations would lead to the action never being registered again, as the texture group handle would be sure the action is already registered. This breaks the texture for the remaining runtime, or until it is disposed.

It was also possible for a texture to register sync once, then on future frames the last modified sync number did not update. This may have caused some more minor issues.

Seems to fix the Xenoblade flashing bug. Obviously this needs a lot of testing, since it was random chance. I typically had the most luck getting it to happen by switching time of day on the event theatre screen for a while, then entering the equipment screen by pressing X on an event.

May also fix weird things like random chance air swimming in BOTW, maybe a few texture streaming bugs.

* Exchange rather than CompareExchange
2022-04-29 18:34:11 -03:00
gdkchan 3139a85a2b
Allow copy texture views to have mismatching multisample state (#3152) 2022-04-08 11:26:48 +02:00
gdkchan 79408b68c3
De-tile GOB when DMA copying from block linear to pitch kind memory regions (#3207)
* De-tile GOB when DMA copying from block linear to pitch kind memory regions

* XML docs + nits

* Remove using

* No flush for regular buffer copies

* Add back ulong casts, fix regression due to oversight
2022-03-20 13:55:07 -03:00
gdkchan e5ad1dfa48
Implement S8D24 texture format and tweak depth range detection (#2458) 2022-03-15 03:42:08 +01:00
gdkchan 0a24aa6af2
Allow textures to have their data partially mapped (#2629)
* Allow textures to have their data partially mapped

* Explicitly check for invalid memory ranges on the MultiRangeList

* Update GetWritableRegion to also support unmapped ranges
2022-02-22 13:34:16 -03:00
riperiperi c9c65af59e
Perform unscaled 2d engine copy on CPU if source texture isn't in cache. (#3112)
* Initial implementation of fast 2d copy

TODO: Partial copy for mismatching region/size.

* WIP

* Cleanup

* Update Ryujinx.Graphics.Gpu/Engine/Twod/TwodClass.cs

Co-authored-by: gdkchan <gab.dark.100@gmail.com>

Co-authored-by: gdkchan <gab.dark.100@gmail.com>
2022-02-22 11:21:29 -03:00
gdkchan b944941733
Fix bug that could cause depth buffer to be missing after clear (#3067) 2022-01-31 00:11:43 -03:00
riperiperi fd6d3ec88f
Fix res scale parameters not being updated in vertex shader (#3046)
This fixes an issue where the render scale array would not be updated when technically the scales on the flat array were the same, but the start index for the vertex scales was different.
2022-01-27 14:17:13 -03:00
gdkchan 42c75dbb8f
Add support for BC1/2/3 decompression (for 3D textures) (#2987)
* Add support for BC1/2/3 decompression (for 3D textures)

* Optimize and clean up

* Unsafe not needed here

* Fix alpha value interpolation when a0 <= a1
2022-01-22 19:23:00 +01:00
gdkchan 6e0799580f
Fix render target clear when sizes mismatch (#2994) 2022-01-11 20:15:17 +01:00
riperiperi ef24c8983d
Fix adjacent 3d texture slices being detected as Incompatible Overlaps (#2993)
This fixes some regressions caused by #2971 which caused rendered 3D texture data to be lost for most slices. Fixes issues with Xenoblade 2's colour grading, probably a ton of other games.

This also removes the check from TextureCache, making it the tiniest bit smaller (any win is a win here).
2022-01-11 09:37:40 +01:00
gdkchan 952c6e4d45
Fix sampled multisample image size (#2984) 2022-01-10 08:45:25 +01:00
riperiperi cda659955c
Texture Sync, incompatible overlap handling, data flush improvements. (#2971)
* Initial test for texture sync

* WIP new texture flushing setup

* Improve rules for incompatible overlaps

Fixes a lot of issues with Unreal Engine games. Still a few minor issues (some caused by dma fast path?) Needs docs and cleanup.

* Cleanup, improvements

Improve rules for fast DMA

* Small tweak to group together flushes of overlapping handles.

* Fixes, flush overlapping texture data for ASTC and BC4/5 compressed textures.

Fixes the new Life is Strange game.

* Flush overlaps before init data, fix 3d texture size/overlap stuff

* Fix 3D Textures, faster single layer flush

Note: nosy people can no longer merge this with Vulkan. (unless they are nosy enough to implement the new backend methods)

* Remove unused method

* Minor cleanup

* More cleanup

* Use the More Fun and Hopefully No Driver Bugs method for getting compressed tex too

This one's for metro

* Address feedback, ASTC+ETC to FormatClass

* Change offset to use Span slice rather than IntPtr Add

* Fix this too
2022-01-09 13:28:48 -03:00
riperiperi 79adba4402
Add support for render scale to vertex stage. (#2763)
* Add support for render scale to vertex stage.

Occasionally games read off textureSize on the vertex stage to inform the fragment shader what size a texture is without querying in there. Scales were not present in the vertex shader to correct the sizes, so games were providing the raw upscaled texture size to the fragment shader, which was incorrect.

One downside is that the fragment and vertex support buffer description must be identical, so the full size scales array must be defined when used. I don't think this will have an impact though. Another is that the fragment texture count must be updated when vertex shader textures are used. I'd like to correct this so that the update is folded into the update for the scales.

Also cleans up a bunch of things, like it making no sense to call CommitRenderScale for each stage.

Fixes render scale causing a weird offset bloom in Super Mario Party and Clubhouse Games. Clubhouse Games still has a pixelated look in a number of its games due to something else it does in the shader.

* Split out support buffer update, lazy updates.

* Commit support buffer before compute dispatch

* Remove unnecessary qualifier.

* Address Feedback
2022-01-08 14:48:48 -03:00
gdkchan c05c8e09d4
Add support for the R4G4 texture format (#2956) 2021-12-30 17:10:54 +01:00
gdkchan a87f7f2029
Fix DMA copy fast path line size when xCount < stride (#2942) 2021-12-26 13:05:26 -03:00
gdkchan e7c2dc8ec3
Fix for texture pool not being updated when it should + buffer texture related fixes (#2911) 2021-12-19 11:50:44 -03:00
riperiperi bc4e70b6fa
Move texture anisotropy check to SetInfo (#2843)
Rather than calculating this for every sampler, this PR calculates if a texture can force anisotropy when its info is set, and exposes the value via a public boolean.

This should help texture/sampler heavy games when anisotropic filtering is not Auto, like UE4 ones (or so i hear?). There is another cost where samplers are created twice when anisotropic filtering is enabled, but I'm not sure how relevant this one is.
2021-12-08 18:09:36 -03:00
riperiperi 788aec511f
Limit Custom Anisotropic Filtering to mipmapped textures with many levels (#2832)
* Limit Custom Anisotropic Filtering to only fully mipmapped textures

There's a major flaw with the anisotropic filtering setting that causes @GamerzHell9137 to report graphical bugs that otherwise wouldn't be there, because he just won't set it to Auto. This should fix those issues, hopefully.

These bugs are generally because anisotropic filtering is enabled on something that it shouldn't be, such as a post process filter or some data texture. This PR maintains two host samplers when custom AF is enabled, and only uses the forced AF one when the texture is 2d and fully mipmapped (goes down to 1x1). This is because game textures are the ideal target for this filtering, and they are typically fully mipmapped, unlike things like screen render targets which usually have 1 or just a few levels.

This also only enables AF on mipmapped samplers where the filtering is bilinear or trilinear. This should be self explanatory.

This PR also allows the changing of Anisotropic Filtering at runtime, and you can immediately see the changes. All samplers are flushed from the cache if the setting changes, causing them to be recreated with the new custom AF value. This brings it in line with our resolution scale. 😌

* Expected minimum mip count for large textures rather than all, address feedback

* Use Target rather than Info.Target

* Retrigger build?

* Fix rebase
2021-11-13 16:04:21 -03:00
gdkchan 611bec6e44
Implement DrawTexture functionality (#2747)
* Implement DrawTexture functionality

* Non-NVIDIA support

* Disable some features that should not affect draw texture (slow path)

* Remove space from shader source

* Match 2D engine names

* Fix resolution scale and add missing XML docs

* Disable transform feedback for draw texture fallback
2021-11-10 15:37:49 -03:00
gdkchan 25fd4ef10e
Extend bindless elimination to work with masked and shifted handles (#2727)
* Extent bindless elimination to work with masked handles

* Extend bindless elimination to catch shifted pattern, refactor handle packing/unpacking
2021-10-17 17:28:18 -03:00
gdkchan 75f4b1ff2d
Relax sampler pool requirement (#2703) 2021-10-04 14:35:28 -03:00
riperiperi d92fff541b
Replace CacheResourceWrite with more general "precise" write (#2684)
* Replace CacheResourceWrite with more general "precise" write

The goal of CacheResourceWrite was to notify GPU resources when they were modified directly, by looking up the modified address/size in a structure and calling a method on each resource. The downside of this is that each resource cache has to be queried individually, they all have to implement their own way to do this, and it can only signal to resources using the same PhysicalMemory instance.

This PR adds the ability to signal a write as "precise" on the tracking, which signals a special handler (if present) which can be used to avoid unnecessary flush actions, or maybe even more. For buffers, precise writes specifically do not flush, and instead punch a hole in the modified range list to indicate that the data on GPU has been replaced.

The downside is that precise actions must ignore the page protection bits and always signal - as they need to notify the target resource to ignore the sequence number optimization.

I had to reintroduce the sequence number increment after I2M, as removing it was causing issues in rabbids kingdom battle. However - all resources modified by I2M are notified directly to lower their sequence number, so the problem is likely that another unrelated resource is not being properly updated. Thankfully, doing this does not affect performance in the games I tested.

This should fix regressions from #2624. Test any games that were broken by that. (RF4, rabbids kingdom battle)

I've also added a sequence number increment to ThreedClass.IncrementSyncpoint, as it seems to fix buffer corruption in OpenGL homebrew. (this was a regression from removing sequence number increment from constant buffer update - another unrelated resource thing)

* Add tests.

* Add XML docs for GpuRegionHandle

* Skip UpdateProtection if only precise actions were called

This allows precise actions to skip reprotection costs.
2021-09-29 02:27:03 +02:00
riperiperi b6e093b0fc
Force copy when auto-deleting a texture with dependencies (#2687)
When a texture is deleted by falling to the bottom of the AutoDeleteCache, its data is flushed to preserve any GPU writes that occurred. This ensures that the data appears in any textures recreated in the future, but didn't account for a texture that already existed with a copy dependency.

This change forces copy dependencies to complete if a texture falls out from from the AutoDeleteCache. (not removed via overlap, as that would be wasted effort)

Fixes broken lighting caused by pausing in SMO's Metro Kingdom. May fix some other issues.
2021-09-29 02:11:05 +02:00
gdkchan fd7567a6b5
Only make render target 2D textures layered if needed (#2646)
* Only make render target 2D textures layered if needed

* Shader cache version bump

* Ensure topology is updated on channel swap
2021-09-29 01:55:12 +02:00
gdkchan 83bdafccda
Share scales array for graphics and compute (#2653) 2021-09-28 23:52:27 +02:00
riperiperi 7c5ead1c19
Fast path for Inline2Memory buffer write that skips write tracking (#2624)
* Fast path for Inline2Memory buffer write

This PR adds a method to PhysicalMemory that attempts to write all cached resources directly, so that memory tracking can be avoided. The goal of this is both to avoid flushing buffer data, and to avoid raising the sequence number when data is written, which causes buffer and texture handles to be re-checked.

This currently only targets buffers, with a side check on textures that falls back to a tracked write if any exist within the target range. It's not expected to write textures from here - this is just a mechanism to protect us if someone does decide to do that. It's possible to add a fast path for this in future (and for ShaderCache, once that starts using tracking)

The forced read before inline2memory begins has been skipped, as the data is fully written when the transfer is completed anyways. This allows us to flush on read in emergency situations, but still write the new data over the flushed data.

Improves performance on Xenoblade 2 and DE, which was flushing buffer data on the GPU thread when trying to write compute data. May improve performance in other games that write SSBOs from compute, and update data in the same/nearby pages often.

Super Smash Bros Ultimate should probably be tested to make sure the vertex explosions haven't returned, as I think that's what this AdvanceSequence was for.

* ForceDirty before write, to make sure data does not flush over the new write
2021-09-19 15:09:53 +02:00
riperiperi b0af010247
Set texture/image bindings in place rather than allocating and passing an array (#2647)
* Remove allocations for texture bindings and state

* Rent rather than stackalloc + copy

A bit faster.
2021-09-19 14:03:05 +02:00
gdkchan ac4ec1a015
Account for negative strides on DMA copy (#2623)
* Account for negative strides on DMA copy

* Should account for non-zero Y
2021-09-11 22:54:18 +02:00
riperiperi b0e410a828
Lift textures in the AutoDeleteCache for all modifications. (#2615)
* Lift textures in the AutoDeleteCache for all modifications.

Before, this would only apply to render targets and texture blit. Now it applies to image stores, the fast dma copy path and any other type of modification.

Image store always at least has one reference in the texture pool, so the function of the AutoDeleteCache keeping textures _alive_ is not useful, but a very important function for a while has been its use to flush textures in order of modification when they are dereferenced, so that their data is not lost.

Before, textures populated using image stores were being dereferenced and reloaded as garbage. Now, when these textures are dereferenced, their data will be put back into memory, and everything stays intact.

Fixes lighting breaking when switching levels in THPS1+2, and potentially some more UE4 games. I've tested a bunch more games for regressions and performance impact, but they all seem fine.

* Lift copy srcTexture so that it doesn't remain referenceless

* Perform lift before reference count change on unbind.

It's important to lift on unbind as that is the moment the texture was truly last modified, but definitely not after releasing every single reference.
2021-09-11 21:52:54 +02:00
riperiperi f0b00c1ae9
Fix TXQ for 3D textures. (#2613)
* Fix TXQ for 3D textures.

Assumes the texture is 3D if the component mask contains Z.

This fixes a bug in UE4 games where parts of the map had garbage pointers to lighting voxels, as the lookup 3D texture was not being initialized. Most notable game is THPS1+2.

May need another PR to keep image store data alive and properly flush it in order using the AutoDeleteCache.

* Get sampler type for TextureSize from bound textures.
2021-09-02 00:17:43 -03:00
riperiperi 15e7fe3ac9
Avoid deleting textures when their data does not overlap. (#2601)
* Avoid deleting textures when their data does not overlap.

It's possible that while two textures start and end addresses indicate an overlap, that the actual data contained within them is sparse due to a layer stride. One such possibility is array slices of a cubemap at different mip levels - they overlap on a whole, but the actual texture data fills the gaps between each other's layers rather than actually overlapping.

This fixes issues with UE4 games having incorrect lighting (solid white screen or really dark shadows). There are still remaining issues with games that use the 3D texture prebaked lighting, such as THPS1+2.

This PR also fixes a bug with TexturePool's resized texture handling where the base level in the descriptor was not considered.

* AllRegions granularity for 3d textures is now by level rather than by slice.

* Address feedback
2021-08-29 16:22:13 -03:00
riperiperi 76e8f9ac87
Only reupload the texture scale array if it changes. (#2595)
* Only reupload the texture scale array if it changes.

Before, this would be called all the time if any shader needed a scale value. The cost of doing this has increased with threaded-gal, as the scale array is copied to a span pool, and it's was called on pretty much every draw sometimes.

This improves GPU performance in games, scaled or not. Most affected game seems to be Xenoblade Chronicles: Definitive Edition.

* Just use = instead of |=
2021-08-27 17:08:30 -03:00
riperiperi bdc1f91a5b
Remove pool cache entries for incompatible overlapping textures (#2568)
This greatly reduces memory usage in games that aggressively reuse memory without removing dead textures from the pool, such as the Xenoblade games, UE3 games, and to a lesser extent, UE4/unity games.

This change stops memory usage from ballooning in xenoblade and some other games. It will also reduce texture view/dependency complexity in some games - for example in MK8D it will reduce the number of surface copies between lighting cubemaps generated for actors.

There shouldn't be any performance impact from doing this, though the deletion and creation of textures could be improved by improving the OpenGL texture storage cache, which is very simple and limited right now. This will be improved in future.

Another potential error has been fixed with the texture cache, which could prevent data loss when data is interchangably written to textures from both the GPU and CPU. It was possible that the dirty flag for a texture would be consumed without the data being synchronized on next use, due to the old overlap check. This check no longer consumes the dirty flag.

Please test a bunch of games to make sure they still work, and there are no performance regressions.
2021-08-20 17:52:09 -03:00
riperiperi 97aedc030d
Fix GetHandleInformation for mipmapped 3d textures (#2569)
Got this the wrong way round - was causing games to try synchronize mipmap levels of like 52 on a 3d texture with 6 levels. Also, corrected the variable name in the method that _was_ working.
2021-08-20 14:59:39 -03:00
riperiperi 0a80a837cb
Use "Undesired" scale mode for certain textures rather than blacklisting (#2537)
* Use "Undesired" scale mode for certain textures rather than blacklisting

* Nit

Co-authored-by: gdkchan <gab.dark.100@gmail.com>

Co-authored-by: gdkchan <gab.dark.100@gmail.com>
2021-08-11 22:44:51 +02:00
riperiperi 4b60371e64
Return mapped buffer pointer directly for flush, WriteableRegion for textures (#2494)
* Return mapped buffer pointer directly for flush, WriteableRegion for textures

A few changes here to generally improve performance, even for platforms not using the persistent buffer flush.

- Texture and buffer flush now return a ReadOnlySpan<byte>. It's guaranteed that this span is pinned in memory, but it will be overwritten on the next flush from that thread, so it is expected that the data is used before calling again.
- As a result, persistent mappings no longer copy to a new array - rather the persistent map is returned directly as a Span<>. A similar host array is used for the glGet flushes instead of allocating new arrays each time.
- Texture flushes now do their layout conversion into a WriteableRegion when the texture is not MultiRange, which allows the flush to happen directly into guest memory rather than into a temporary span, then copied over. This avoids another copy when doing layout conversion.

Overall, this saves 1 data copy for buffer flush, 1 copy for linear textures with matching source/target stride, and 2 copies for block textures or linear textures with mismatching strides.

* Fix tests

* Fix array pointer for Mesa/Intel path

* Address some feedback

* Update method for getting array pointer.
2021-07-19 19:10:54 -03:00
riperiperi ca5ac37cd6
Flush buffers and texture data through a persistent mapped buffer. (#2481)
* Use persistent buffers to flush texture data

* Flush buffers via copy to persistent buffers.

* Log error when timing out, small refactoring.
2021-07-16 18:10:20 -03:00
gdkchan bb6fab2009
Ensure that DMA copy target textures are kept alive or flushed (#2478) 2021-07-14 14:48:57 -03:00