Ryujinx

mirror of https://git.naxdy.org/Mirror/Ryujinx.git synced 2024-11-15 09:35:27 +00:00

Author	SHA1	Message	Date
gdkchan	86fd0643c2	Implement support for page sizes > 4KB (#4252 ) * Implement support for page sizes > 4KB * Check and work around more alignment issues * Was not meant to change this * Use MemoryBlock.GetPageSize() value for signal handler code * Do not take the path for private allocations if host supports 4KB pages * Add Flags attribute on MemoryMapFlags * Fix dirty region size with 16kb pages Would accidentally report a size that was too high (generally 16k instead of 4k, uploading 4x as much data) Co-authored-by: riperiperi <rhy3756547@hotmail.com>	2023-01-17 05:13:24 +01:00
riperiperi	f0e27a23a5	Add short duration texture cache (#3754 ) * Add short duration texture cache This texture cache takes textures that lose their last pool reference and keeps them alive until the next frame, or until an incompatible overlap removes it. This is done since under certain circumstances, a texture's reference can be wiped from a pool despite it still being in use - though typically the reference will return when rendering the next frame. While this may slightly increase texture memory usage when quickly going through a bunch of temporary textures, it's still bounded due to the overlap removal rule. This greatly increases performance in Hyrule Warriors: Age of Calamity. It may positively affect some UE4 games which dip framerate severely under certain circumstances. * Small optimization * Don't forget this. * Add short cache dictionary * Address feedback * Address some feedback	2023-01-17 04:39:46 +01:00
gdkchan	93df366b2c	Fix texture flush from CPU WaitSync regression on OpenGL (#4289 )	2023-01-14 11:23:57 -03:00
gdkchan	cd3a15aea5	Fix NRE when MemoryUnmappedHandler is called for a destroyed channel (#4285 )	2023-01-14 00:16:06 -03:00
gdkchan	070136b3f7	Fix texture modified on CPU from GPU thread after being modified on GPU not being updated (#4284 )	2023-01-13 23:46:45 -03:00
riperiperi	8fa248ceb4	Vulkan: Add workarounds for MoltenVK (#4202 ) * Add MVK basics. * Use appropriate output attribute types * 4kb vertex alignment, bunch of fixes * Add reduced shader precision mode for mvk. * Disable ASTC on MVK for now * Only request robustnes2 when it is available. * It's just the one feature actually * Add triangle fan conversion * Allow NullDescriptor on MVK for some reason. * Force safe blit on MoltenVK * Use ASTC only when formats are all available. * Disable multilevel 3d texture views * Filter duplicate render targets (on backend) * Add Automatic MoltenVK Configuration * Do not create color attachment views with formats that are not RT compatible * Make sure that the host format matches the vertex shader input types for invalid/unknown guest formats * FIx rebase for Vertex Attrib State * Fix 4b alignment for vertex * Use asynchronous queue submits for MVK * Ensure color clear shader has correct output type * Update MoltenVK config * Always use MoltenVK workarounds on MacOS * Make MVK supersede all vendors * Fix rebase * Various fixes on rebase * Get portability flags from extension * Fix some minor rebasing issues * Style change * Use LibraryImport for MVKConfiguration * Rename MoltenVK vendor to Apple Intel and AMD GPUs on moltenvk report with the those vendors - only apple silicon reports with vendor 0x106B. * Fix features2 rebase conflict * Rename fragment output type * Add missing check for fragment output types Might have caused the crash in MK8 * Only do fragment output specialization on MoltenVK * Avoid copy when passing capabilities * Self feedback * Address feedback Co-authored-by: gdk <gab.dark.100@gmail.com> Co-authored-by: nastys <nastys@users.noreply.github.com>	2023-01-13 01:31:21 +01:00
gdkchan	94a64f2aea	Remove textures from cache on unmap if not mapped and modified (#4211 )	2023-01-11 01:53:56 +00:00
gdkchan	9dfe81770a	Use vector outputs for texture operations (#3939 ) * Change AggregateType to include vector type counts * Replace VariableType uses with AggregateType and delete VariableType * Support new local vector types on SPIR-V and GLSL * Start using vector outputs for texture operations * Use vectors on more texture operations * Use vector output for ImageLoad operations * Replace all uses of single destination texture constructors with multi destination ones * Update textureGatherOffsets replacement to split vector operations * Shader cache version bump Co-authored-by: Ac_K <Acoustik666@gmail.com>	2022-12-29 16:09:34 +01:00
riperiperi	e20abbf9cc	Vulkan: Don't flush commands when creating most sync (#4087 ) * Vulkan: Don't flush commands when creating most sync When the WaitForIdle method is called, we create sync as some internal GPU method may read back written buffer data. Some games randomly intersperse compute dispatch into their render passes, which result in this happening an unbounded number of times depending on how many times they run compute. Creating sync in Vulkan is expensive, as we need to flush the current command buffer so that it can be waited on. We have a limited number of active command buffers due to how we track resource usage, so submitting too many command buffers will force us to wait for them to return to the pool. This PR allows less "important" sync (things which are less likely to be waited on) to wait on a command buffer's result without submitting it, instead relying on AutoFlush or another, more important sync to flush it later on. Because of the possibility of us waiting for a command buffer that hasn't submitted yet, any thread needs to be able to force the active command buffer to submit. The ability to do this has been added to the backend multithreading via an "Interrupt", though it is not supported without multithreading. OpenGL drivers should already be doing something similar so they don't blow up when creating lots of sync, which is why this hasn't been a problem for these games over there. Improves Vulkan performance on Xenoblade DE, Pokemon Scarlet/Violet, and Zelda BOTW (still another large issue here) * Add strict argument This is technically a separate concern from whether the sync is a host syncpoint. * Remove _interrupted variable * Actually wait for the invoke This is required by AMD GPUs, and also may have caused some issues on other GPUs. * Remove unused using. * I don't know why it added these ones. * Address Feedback * Fix typo	2022-12-29 15:39:04 +01:00
riperiperi	470be03c2f	GPU: Add fallback when 16-bit formats are not supported (#4108 ) * Add conversion for 16 bit RGBA formats (not supported in Rosetta) * Rebase fix Rebase fix * Forgot to remove this * Fix RGBA16 format conversion * Add RGBA4 -> RGBA8 conversion * Handle host stride alignment * Address Feedback Part 1 * Can't count * Don't zero out rgb when alpha is 0 * Separate RGBA4 and 5-bit component formats Not sure of a better way to name them... * Add A1B5G5R5 conversion * Put this in the right place. * Make format naming consistent for capabilities * Change method names	2022-12-26 15:50:27 -03:00
Hunter	c963b3c804	Added Generic Math to BitUtils (#3929 ) * Generic Math Update Updated Several functions in Ryujinx.Common/Utilities/BitUtils to use generic math * Updated BitUtil calls * Removed Whitespace * Switched decrement * Fixed changed method calls. The method calls were originally changed on accident due to me relying too much on intellisense doing stuff for me * Update Ryujinx.Common/Utilities/BitUtils.cs Co-authored-by: gdkchan <gab.dark.100@gmail.com> Co-authored-by: gdkchan <gab.dark.100@gmail.com>	2022-12-26 14:11:05 +00:00
gdkchan	f906eb06c2	Implement a software ETC2 texture decoder (#4121 ) * Implement a software ETC2 texture decoder * Fix output size calculation for non-2D textures * Address PR feedback	2022-12-21 20:39:58 -03:00
gdkchan	1cca3e99ab	GPU: Force rebind when pool changes (#4129 )	2022-12-21 17:35:28 -03:00
gdkchan	cb70e7bb30	Fix DrawArrays vertex buffer size (#4141 )	2022-12-21 19:08:12 +01:00
gdkchan	ec4cd57ccf	Implement another non-indexed draw method on GPU (#4123 )	2022-12-16 12:06:38 -03:00
riperiperi	5a085cba0f	GPU: Fix layered attachment write (#4131 ) Fixes a regression caused by #4003 where the code that writes `_vtgWritesRtLayer` was removed, breaking the crowd in mario strikers.	2022-12-16 09:40:01 -03:00
gdkchan	f4d731ae20	Fix NRE when loading Vulkan shader cache with Vertex A shaders (#4124 )	2022-12-15 17:52:12 +01:00
Isaac Marovitz	8ac53c66b4	Remove Half Conversion (#4106 ) * Remove HalfConversion * Update `CodeGenVersion`	2022-12-14 21:13:23 -03:00
Andrey Sukharev	edf7e628ca	Use method overloads that support trimming. Mark some types to be trimming friendly (#4083 ) * Use method overloads that support trimming. Mark some types to be trimming friendly * Use generic version of marshalling method	2022-12-12 15:10:05 +01:00
Isaac Marovitz	851d81d24a	Fix Redundant Qualifer Warnings (#4091 ) * Fix Redundant Qualifer Warnings * Remove unnecessary using	2022-12-10 21:21:13 +01:00
gdkchan	459c4caeba	Fix HasUnalignedStorageBuffers value when buffers are always unaligned (#4078 )	2022-12-09 17:41:40 -03:00
gdkchan	8428bb6541	Fix shader FSWZADD instruction (#4069 ) * Fix shader FSWZADD instruction * Shader cache version bump	2022-12-08 14:08:07 -03:00
gdkchan	9a0330f7f8	Shader: Implement PrimitiveID (#4067 ) * Shader: Implement PrimitiveID * Shader cache version bump	2022-12-08 10:55:03 +01:00
riperiperi	f23b2878cc	Shader: Add fallback for LDG from "ube" buffer ranges. (#4027 ) We have a conversion from LDG on the compute shader to a special constant buffer binding that's used to exceed hardware limits on compute, but it was only running if the byte offset could be identified. The fallback that checks all of the bindings at runtime only checks the storage buffers. This PR adds checking ube ranges to the LoadGlobal fallback. This extends the changes in #4011 to only check ube entries which are accessed by the shader. Fixes particles affected by the wind in The Legend of Zelda: Breath of the Wild. May fix other weird issues with compute shaders in some games. Try a bunch of games and drivers to make sure they don't blow up loading constants willynilly from searchable buffers.	2022-12-06 23:15:44 +00:00
gdkchan	dde9bb5c69	Fix storage buffer access when match fails (#4037 ) * Fix storage buffer access when match fails * Shader cache version bump	2022-12-06 03:36:54 +00:00
gdkchan	de06ffb0f7	Fix shaders with global memory access from unknown locations (#4029 ) * Fix shaders with global memory access from unknown locations * Shader cache version bump	2022-12-06 01:09:24 +00:00
gdkchan	bbb24d8c7e	Restrict shader storage buffer search when match fails (#4011 ) * Restrict storage buffer search when match fails * Shader cache version bump	2022-12-05 19:11:32 +00:00
Andrey Sukharev	4da44e09cb	Make structs readonly when applicable (#4002 ) * Make all structs readonly when applicable. It should reduce amount of needless defensive copies * Make structs with trivial boilerplate equality code record structs * Remove unnecessary readonly modifiers from TextureCreateInfo * Make BitMap structs readonly too	2022-12-05 14:47:39 +01:00
gdkchan	17a1cab5d2	Allow SNorm buffer texture formats on Vulkan (#3957 ) * Allow SNorm buffer texture formats on Vulkan * Shader cache version bump	2022-12-04 15:36:03 -03:00
riperiperi	9ac66336a2	GPU: Use lazy checks for specialization state (#4004 ) * GPU: Use lazy checks for specialization state This PR adds a new class, the SpecializationStateUpdater, that allows elements of specialization state to be updated individually, and signal the state is checked when it changes between draws, instead of building and checking it on every draw. This also avoids building spec state when Most state updates have been moved behind the shader state update, so that their specialization state updates make it in before shaders are fetched. Downside: Fields in GpuChannelGraphicsState are no longer readonly. To counteract copies that might be caused this I pass it as `ref` when possible, though maybe `in` would be better? Not really sure about the quirks of `in` and the difference probably won't show on a benchmark. The result is around 2 extra FPS on SMO in the usual spot. Not much right now, but it will remove costs when we're doing more expensive specialization checks, such as fragment output type specialization for macos. It may also help more on other games with more draws. * Address Feedback * Oops	2022-12-04 18:41:17 +01:00
riperiperi	4965681e06	GPU: Swap bindings array instead of copying (#4003 ) * GPU: Swap bindings array instead of copying Reduces work on UpdateShaderState. Now the cost is a few reference moves for arrays, rather than copying data. Downside: bindings arrays are no longer readonly. * Micro optimisation * Add missing docs * Address Feedback	2022-12-04 18:18:40 +01:00
riperiperi	458452279c	GPU: Track buffer migrations and flush source on incomplete copy (#3952 ) * Track buffer migrations and flush source on incomplete copy Makes sure that the modified range list is always from the latest iteration of the buffer, and flushes earlier iterations of a buffer if the data has not been migrated yet. * Cleanup 1 * Reduce cost for redundant signal checks on Vulkan * Only inherit the range list if there are pending ranges. * Fix OpenGL * Address Feedback * Whoops	2022-12-01 16:30:13 +01:00
gdkchan	4905101df1	Remove shader dependency on SPV_KHR_shader_ballot and SPV_KHR_subgroup_vote extensions (#3943 ) * Remove shader dependency on SPV_KHR_shader_ballot and SPV_KHR_subgroup_vote extensions * Shader cache version bump	2022-11-30 18:24:15 -03:00
gdkchan	8750b90a7f	Ensure that vertex attribute buffer index is valid on GPU (#3942 ) * Ensure that vertex attribute buffer index is valid on GPU * Remove vertex buffer validation code from OpenGL * Remove some fields that are no longer necessary	2022-11-30 18:06:40 -03:00
riperiperi	476b4683cf	Fix CB0 alignment with addresses used for 8/16-bit LDG/STG (#3897 ) This replacement is meant to be done with the original identified byteOffset, not the one assigned later on by the below conditionals (that already has the constant offset added, for instance). This fixes videos being pixelated in Xenoblade 3, and other regressions that might have happened since #3847.	2022-11-25 14:39:03 +00:00
riperiperi	65778a6b78	GPU: Don't trigger uploads for redundant buffer updates (#3828 ) * Initial implementation * Actually do The Thing * Add remark about performance to IVirtualMemoryManager	2022-11-24 15:50:15 +01:00
riperiperi	ece36b274d	GAL: Send all buffer assignments at once rather than individually (#3881 ) * GAL: Send all buffer assignments at once rather than individually The `(int first, BufferRange[] ranges)` method call has very significant performance implications when the bindings are spread out, which they generally always are in Vulkan. This change makes it so that these methods are only called a maximum of one time per draw. Significantly improves GPU thread performance in Pokemon Scarlet/Violet. * Address Feedback Removed SetUniformBuffers(int first, ReadOnlySpan<BufferRange> buffers)	2022-11-24 07:50:59 +00:00
riperiperi	f3cc2e5703	GPU: Access non-prefetch command buffers directly (#3882 ) * GPU: Access non-prefetch command buffers directly Saves allocating new arrays for them constantly - they can be quite small so it can be very wasteful. About 0.4% of GPU thread in SMO, but was a bit higher in S/V when I checked. Assumes that non-prefetch command buffers won't be randomly clobbered before they finish executing, though that's probably a safe bet. * Small change while I'm here * Address feedback	2022-11-24 01:56:55 +00:00
riperiperi	5a39d3c4a1	GPU: Relax locking on Buffer Cache (#3883 ) I did this on ncbuffer2 when we were using it for LDN 3, but I noticed that it can apply to the current buffer manager too, and it's an easy performance win. The only buffer access that can come from another thread is the overlap search for buffers that have been unmapped. Everything else, including modifications, come from the main GPU thread. That means we only need to lock the range list when it's being modified, as that's the only time where we'll cause a race with the unmapped handler. This has a significant performance improvements in situations where FIFO is high, like the other two PRs. Joined together they give a nice boost (73.6 master -> 79 -> 83 fps in SMO).	2022-11-24 01:41:16 +00:00
gdkchan	f088c3d344	Do not update shader state for DrawTextures (#3876 )	2022-11-21 18:16:00 +01:00
gdkchan	5de6ae426e	Unsubscribe MemoryUnmappedHandler even when GPU channel is destroyed (#3872 )	2022-11-19 23:54:33 -03:00
gdkchan	69ced3a6e8	Fix shader cache on Vulkan when geometry shaders are inserted (#3868 )	2022-11-19 10:24:23 +01:00
gdkchan	2e43d01d36	Move gl_Layer from vertex to geometry if GPU does not support it on vertex (#3866 ) * Move gl_Layer from vertex to geometry if GPU does not support it on vertex * Shader cache version bump * PR feedback	2022-11-18 23:27:54 -03:00
riperiperi	de162a648b	Gpu: Fix thread safety of ReregisterRanges (#3865 ) A quick fix to prevent reading the wrong value of Count when reregistering ranges for a new target buffer. Buffer flushes from another thread can modify the range list when the lock isn't active, which can change the count. This prevents some crashes in Pokemon Scarlet/Violet. It's probably likely that buffer migration during flush is causing some other issues in this game, but this at least prevents the crashing.	2022-11-18 21:47:29 +01:00
riperiperi	187372cbde	Prune ForceDirty and CheckModified caches on unmap (#3862 ) * Prune ForceDirty and CheckModified caches on unmap Since we're now using this for modified checks on the HLE indirect draw method, I'm worried that leaving these to forever gather cache entries isn't the best idea for performance in the long term, and it could keep old buffer objects alive for longer than they should be. This PR adds the ability to prune invalid entries before checking these caches, and queues it whenever gpu memory is unmapped. It also aligns modified checks to the page size, as I figured it would be possible for a huge number of overlapping over a game's runtime. This prevents Super Mario Odyssey from having 10s of thousands of entries in the modified cache in Metro Kingdom, and them duplicating when entering and leaving a building (should be cleared, as they were unmapped). * Address Feedback	2022-11-18 14:58:24 +00:00
riperiperi	7c53b69c30	SPIR-V: Fix unscaling helper not being able to find Array textures (#3863 ) The type in the `texOp` in the textureSize instruction doesn't have the exact type on SPIR-V (for example, it is missing the Array flag). This PR gives it the proper type before giving it to the unscaling helper. This fixes the ground textures being broken on Pokemon Scarlet/Violet when scaling. It wasn't finding the texture, so the descriptor index it provided was -1...	2022-11-18 02:37:37 +00:00
riperiperi	33a4d7d1ba	GPU: Eliminate CB0 accesses when storage buffer accesses are resolved (#3847 ) * Eliminate CB0 accesses Still some work to do, decouple from hle? * Forgot the important part somehow * Fix and improve alignment test * Address Feedback * Remove some complexity when checking storage buffer alignment * Update Ryujinx.Graphics.Shader/Translation/Optimizations/GlobalToStorage.cs Co-authored-by: gdkchan <gab.dark.100@gmail.com> Co-authored-by: gdkchan <gab.dark.100@gmail.com>	2022-11-17 18:47:41 +01:00
gdkchan	f1d1670b0b	Implement HLE macro for DrawElementsIndirect (#3748 ) * Implement HLE macro for DrawElementsIndirect * Shader cache version bump * Use GL_ARB_shader_draw_parameters extension on OpenGL * Fix DrawIndexedIndirectCount on Vulkan when extension is not supported * Implement DrawIndex * Alignment * Fix some validation errors * Rename BaseIds to DrawParameters * Fix incorrect index buffer and vertex buffer size in some cases * Add HLE macros for DrawArraysInstanced and DrawElementsInstanced * Perform a regular draw when indirect data is not modified * Use non-indirect draw methods if indirect buffer was not GPU modified * Only check if draw parameters match if the shader actually uses them * Expose Macro HLE setting on GUI * Reset FirstVertex and FirstInstance after draw * Update shader cache version again since some people already tested this * PR feedback Co-authored-by: riperiperi <rhy3756547@hotmail.com>	2022-11-16 14:53:04 -03:00
gdkchan	9daf029f35	Use vector transform feedback outputs if possible (#3832 )	2022-11-12 20:20:40 -03:00
gdkchan	51a27032f0	Fix VertexId and InstanceId on Vulkan (#3833 ) * Fix VertexId and InstanceId on Vulkan * Shader cache version bump	2022-11-11 13:22:49 -03:00

1 2 3 4 5 ...

477 commits