* Refactoring of KMemoryManager class
* Replace some trivial uses of DRAM address with VA
* Get rid of GetDramAddressFromVa
* Abstracting more operations on derived page table class
* Run auto-format on KPageTableBase
* Managed to make TryConvertVaToPa private, few uses remains now
* Implement guest physical pages ref counting, remove manual freeing
* Make DoMmuOperation private and call new abstract methods only from the base class
* Pass pages count rather than size on Map/UnmapMemory
* Change memory managers to take host pointers
* Fix a guest memory leak and simplify KPageTable
* Expose new methods for host range query and mapping
* Some refactoring of MapPagesFromClientProcess to allow proper page ref counting and mapping without KPageLists
* Remove more uses of AddVaRangeToPageList, now only one remains (shared memory page checking)
* Add a SharedMemoryStorage class, will be useful for host mapping
* Sayonara AddVaRangeToPageList, you served us well
* Start to implement host memory mapping (WIP)
* Support memory tracking through host exception handling
* Fix some access violations from HLE service guest memory access and CPU
* Fix memory tracking
* Fix mapping list bugs, including a race and a error adding mapping ranges
* Simple page table for memory tracking
* Simple "volatile" region handle mode
* Update UBOs directly (experimental, rough)
* Fix the overlap check
* Only set non-modified buffers as volatile
* Fix some memory tracking issues
* Fix possible race in MapBufferFromClientProcess (block list updates were not locked)
* Write uniform update to memory immediately, only defer the buffer set.
* Fix some memory tracking issues
* Pass correct pages count on shared memory unmap
* Armeilleure Signal Handler v1 + Unix changes
Unix currently behaves like windows, rather than remapping physical
* Actually check if the host platform is unix
* Fix decommit on linux.
* Implement windows 10 placeholder shared memory, fix a buffer issue.
* Make PTC version something that will never match with master
* Remove testing variable for block count
* Add reference count for memory manager, fix dispose
Can still deadlock with OpenAL
* Add address validation, use page table for mapped check, add docs
Might clean up the page table traversing routines.
* Implement batched mapping/tracking.
* Move documentation, fix tests.
* Cleanup uniform buffer update stuff.
* Remove unnecessary assignment.
* Add unsafe host mapped memory switch
On by default. Would be good to turn this off for untrusted code (homebrew, exefs mods) and give the user the option to turn it on manually, though that requires some UI work.
* Remove C# exception handlers
They have issues due to current .NET limitations, so the meilleure one fully replaces them for now.
* Fix MapPhysicalMemory on the software MemoryManager.
* Null check for GetHostAddress, docs
* Add configuration for setting memory manager mode (not in UI yet)
* Add config to UI
* Fix type mismatch on Unix signal handler code emit
* Fix 6GB DRAM mode.
The size can be greater than `uint.MaxValue` when the DRAM is >4GB.
* Address some feedback.
* More detailed error if backing memory cannot be mapped.
* SetLastError on all OS functions for consistency
* Force pages dirty with UBO update instead of setting them directly.
Seems to be much faster across a few games. Need retesting.
* Rebase, configuration rework, fix mem tracking regression
* Fix race in FreePages
* Set memory managers null after decrementing ref count
* Remove readonly keyword, as this is now modified.
* Use a local variable for the signal handler rather than a register.
* Fix bug with buffer resize, and index/uniform buffer binding.
Should fix flickering in games.
* Add InvalidAccessHandler to MemoryTracking
Doesn't do anything yet
* Call invalid access handler on unmapped read/write.
Same rules as the regular memory manager.
* Make unsafe mapped memory its own MemoryManagerType
* Move FlushUboDirty into UpdateState.
* Buffer dirty cache, rather than ubo cache
Much cleaner, may be reusable for Inline2Memory updates.
* This doesn't return anything anymore.
* Add sigaction remove methods, correct a few function signatures.
* Return empty list of physical regions for size 0.
* Also on AddressSpaceManager
Co-authored-by: gdkchan <gab.dark.100@gmail.com>
* Use branch instead of tailcall for recursive calls
Use a branch instead of doing a tailcall for recursive calls. This
avoids having to store the dispatch address, setting up the epilogue and
keeps guest registers in host registers for longer.
The rejit check is moved down into the entry block so that the rejit
behaviour remains the same as before.
* Set PTC version
Co-authored-by: gdkchan <gab.dark.100@gmail.com>
* Allow `LocalVariable` to be assigned more than once
This allows us to write flow controls like loops and if-elses with
LocalVariables participating in phi nodes.
* Add `GetLocalNumber` to operand
* Add EntryTable<TEntry>
* Add on translation call counting
* Add Counter
* Add PPTC support
* Make Counter a generic & use a 32-bit counter instead
* Return false on overflow
* Set PPTC version
* Print more information about the rejit queue
* Make Counter<T> disposable
* Remove Block.TailCall since it is not used anymore
* Apply suggestions from code review
Address gdkchan's feedback
Co-authored-by: gdkchan <gab.dark.100@gmail.com>
* Fix more stale docs
* Remove rejit requests queue logging
* Make Counter<T> finalizable
Most certainly quite an odd use case.
* Make EntryTable<T>.TryAllocate set entry to default
* Re-trigger CI
* Dispose Counters before they hit the finalizer queue
* Re-trigger CI
Just for good measure...
* Make EntryTable<T> expandable
* EntryTable is now expandable instead of being a fixed slab.
* Remove EntryTable<T>.TryAllocate
* Remove Counter<T>.TryCreate
Address LDj3SNuD's feedback
* Apply suggestions from code review
Address LDj3SNuD's feedback
Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>
* Remove useless return
* POH approach, but the sequel
* Revert "POH approach, but the sequel"
This reverts commit 5f5abaa247.
The sequel got shelved
* Add extra documentation
Co-authored-by: gdkchan <gab.dark.100@gmail.com>
Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>
* PPTC vs. giant ExeFS.
* InternalVersion = 2168
* Add new heuristic algorithm for calculating the number of threads for parallel translations that also takes into account the user's free physical memory and not just the number of CPU cores.
* Nit.
* Add an outer Header structure and add the hashes for both this new structure and the existing "inner" Header structure.
* InternalVersion = 2169
* Improve StoreToContext emission
Hoist StoreToContext in dynamic branch fast & slow paths out into
their predecessor.
Reduces register pressure, code size and compile time because we're
throwing less stuff down the pipeline.
* Set PTC internal version
* Turn EmitDynamicTableCall private
* Re-trigger CI
Signal and setup events correctly
Eliminate possible races
Use a single event
Mark volatiles and reduce scope of waithandles
Common handler
100ms -> 50ms
* Implement VCNT based on AArch64 CNT
Add tests
* Update PTC version
* Address LDj's comments
* Explicit size in encoding
* Tighter tests
* Replace SoftFallback with IR helper
Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>
* Reduce one BitwiseAnd from IR fallback
Based on popcount64b from https://en.wikipedia.org/wiki/Hamming_weight#Efficient_implementation
* Rename parameter and add assert
Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>
Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>
* PPTC & Pool Enhancements.
* Avoid buffer allocations in CodeGenContext.GetCode(). Avoid stream allocations in PTC.PtcInfo.
Refactoring/nits.
* Use XXHash128, for Ptc.Load & Ptc.Save, x10 faster than Md5.
* Why not a nice Span.
* Added a simple PtcFormatter library for deserialization/serialization, which does not require reflection, in use at PtcJumpTable and PtcProfiler; improves maintainability and simplicity/readability of affected code.
* Nits.
* Revert #1987.
* Revert "Revert #1987."
This reverts commit 998be765cf.
* Enable PTE null checks again
* Do address validation on EmitPtPointerLoad, and make it branchless
* PTC version increment
* Mask of pointer tag for exclusive access
* Move mask to the correct place
Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>
* Optimization | Modify Add Instruction to use LEA instead.
Currently, the add instruction requires 4 registers to take place. By using LEA, we can effectively perform the same working using 3 registers, reducing memory spills and improving translation efficiency.
* Fix IsSameOperandDestSrc1 Check for Add
* Use LEA if Dest != SRC1
* Update IsSameOperandDestSrc1 to account for Cases where Dest and Src1 can be same for add
* Fix error in logic
* Typo
* Add paranthesis for clarity
* Compare registers as requested.
* Cleanup if statement, use same comparison method as generateCopy
* Make change as recommended by gdk
* Perform check only when Add calls are made
* use ensure sametype for lea, fix else
* Update comment
* Update version #
* Add VCLZ fast path
* Add VCLZ.8B/16B SSSE3 fast path
* Add VCLZ.4H/8H SSSE3 fast path
* Add VCLZ.2S/4S SSE2 fast path
* Improve CLZ.4H/8H fast path
* Improve CLZ.2S/4S fast path
* Set PPTC version
* Added support for offline invalidation, via PPTC, of low cq translations replaced by high cq translations; both on a single run and between runs.
Added invalidation of .cache files in the event of reuse on a different user operating system.
Added .info and .cache files invalidation in case of a failed stream decompression.
Nits.
* InternalVersion = 1712;
* Nits.
* Address comment.
* Get rid of BinaryFormatter.
Nits.
* Move Ptc.LoadTranslations().
Nits.
* Nits.
* Fixed corner cases (in case backup copies have to be used). Added save logs.
* Not core fixes.
* Complement to the previous commit. Added load logs. Removed BinaryFormatter leftovers.
* Add LoadTranslations log.
* Nits.
* Removed the search and management of LowCq overlapping functions.
* Final increment of .info and .cache flags.
* Nit.
* Free up memory allocated by Pools during any PPTC translations at boot time.
* Nit due to rebase.
* Add a simple Pools Limiter.
* Nits.
* Fix missing JumpTable.RegisterFunction() due to rebase.
Clear MemoryStreams as soon as possible, when they are no longer needed.
* Code cleaning.
* Nit for retrigger Checks.
* Update Ptc.cs
* Contextual refactoring of Translator. Ignore resetting of pools for DirectCallStubs.
* Nit for retrigger Checks.
* Add Pmull_V Sse fast path only, both "8/16B -> 8H" and "1/2D -> 1Q" variants; with Test.
* Add Clmul fast path for the 128 bits variant.
* Small optimisation (save 60 instructions) for the Sse fast path about the 128 bits variant.
* Add slow path, both variants. Fix V128 Shl/Shr when shift = 0.
* A32: Add Vmull_I P64 variant (slow path); not tested.
* A32: Add Vmull_I_P8_P64 Test and fix P64 variant.
* Added support for offline invalidation, via PPTC, of low cq translations replaced by high cq translations; both on a single run and between runs.
Added invalidation of .cache files in the event of reuse on a different user operating system.
Added .info and .cache files invalidation in case of a failed stream decompression.
Nits.
* InternalVersion = 1712;
* Nits.
* Address comment.
* Get rid of BinaryFormatter.
Nits.
* Move Ptc.LoadTranslations().
Nits.
* Nits.
* Fixed corner cases (in case backup copies have to be used). Added save logs.
* Not core fixes.
* Complement to the previous commit. Added load logs. Removed BinaryFormatter leftovers.
* Add LoadTranslations log.
* Nits.
* Removed the search and management of LowCq overlapping functions.
* Final increment of .info and .cache flags.
* Nit.
* Free up memory allocated by Pools during any PPTC translations at boot time.
* Nit due to rebase.
* Added support for offline invalidation, via PPTC, of low cq translations replaced by high cq translations; both on a single run and between runs.
Added invalidation of .cache files in the event of reuse on a different user operating system.
Added .info and .cache files invalidation in case of a failed stream decompression.
Nits.
* InternalVersion = 1712;
* Nits.
* Address comment.
* Get rid of BinaryFormatter.
Nits.
* Move Ptc.LoadTranslations().
Nits.
* Nits.
* Fixed corner cases (in case backup copies have to be used). Added save logs.
* Not core fixes.
* Complement to the previous commit. Added load logs. Removed BinaryFormatter leftovers.
* Add LoadTranslations log.
* Nits.
* Removed the search and management of LowCq overlapping functions.
* Final increment of .info and .cache flags.
* Nit.
* GetIndirectFunctionAddress(): Validate that writing actually takes place in dynamic table memory range (and not elsewhere).
* Fix Ptc.UpdateInfo() due to rebase.
* Nit for retrigger Checks.
* Nit for retrigger Checks.
* Initial cache memory allocator implementation
* Get rid of CallFlag
* Perform cache cleanup on exit
* Basic cache invalidation
* Thats not how conditionals works in C# it seems
* Set PTC version to PR number
* Address PR feedback
* Update InstEmitFlowHelper.cs
* Flag clear on address is no longer needed
* Do not include exit block in function size calculation
* Dispose jump table
* For future use
* InternalVersion = 1519 (force retest).
Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>
* Implement VFNMA.F<32/64>
* Update PTC Version
* Update Implementation & Renames & Correct Order
* Fix alignment
* Update implementation to not trigger assert
* Actually use the intrinsic that makes sense :)
* Use backup when PTC compression is corrupt
* Apply suggestions from code review
Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>
Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>
* shader cache: Fix Linux boot issues
This rollback the init logic back to previous state, and replicate the
way PTC handle initialization.
* shader cache: set default state of ready for translation event to false
* Fix cpu unit tests
* WIP Range Tracking
- Texture invalidation seems to have large problems
- Buffer/Pool invalidation may have problems
- Mirror memory tracking puts an additional `add` in compiled code, we likely just want to make HLE access slower if this is the final solution.
- Native project is in the messiest possible location.
- [HACK] JIT memory access always uses native "fast" path
- [HACK] Trying some things with texture invalidation and views.
It works :)
Still a few hacks, messy things, slow things
More work in progress stuff (also move to memory project)
Quite a bit faster now.
- Unmapping GPU VA and CPU VA will now correctly update write tracking regions, and invalidate textures for the former.
- The Virtual range list is now non-overlapping like the physical one.
- Fixed some bugs where regions could leak.
- Introduced a weird bug that I still need to track down (consistent invalid buffer in MK8 ribbon road)
Move some stuff.
I think we'll eventually just put the dll and so for this in a nuget package.
Fix rebase.
[WIP] MultiRegionHandle variable size ranges
- Avoid reprotecting regions that change often (needs some tweaking)
- There's still a bug in buffers, somehow.
- Might want different api for minimum granularity
Fix rebase issue
Commit everything needed for software only tracking.
Remove native components.
Remove more native stuff.
Cleanup
Use a separate window for the background context, update opentk. (fixes linux)
Some experimental changes
Should get things working up to scratch - still need to try some things with flush/modification and res scale.
Include address with the region action.
Initial work to make range tracking work
Still a ton of bugs
Fix some issues with the new stuff.
* Fix texture flush instability
There's still some weird behaviour, but it's much improved without this. (textures with cpu modified data were flushing over it)
* Find the destination texture for Buffer->Texture full copy
Greatly improves performance for nvdec videos (with range tracking)
* Further improve texture tracking
* Disable Memory Tracking for view parents
This is a temporary approach to better match behaviour on master (where invalidations would be soaked up by views, rather than trigger twice)
The assumption is that when views are created to a texture, they will cover all of its data anyways. Of course, this can easily be improved in future.
* Introduce some tracking tests.
WIP
* Complete base tests.
* Add more tests for multiregion, fix existing test.
* Cleanup Part 1
* Remove unnecessary code from memory tracking
* Fix some inconsistencies with 3D texture rule.
* Add dispose tests.
* Use a background thread for the background context.
Rather than setting and unsetting a context as current, doing the work on a dedicated thread with signals seems to be a bit faster.
Also nerf the multithreading test a bit.
* Copy to texture with matching alignment
This extends the copy to work for some videos with unusual size, such as tutorial videos in SMO. It will only occur if the destination texture already exists at XCount size.
* Track reads for buffer copies. Synchronize new buffers before copying overlaps.
* Remove old texture flushing mechanisms.
Range tracking all the way, baby.
* Wake the background thread when disposing.
Avoids a deadlock when games are closed.
* Address Feedback 1
* Separate TextureCopy instance for background thread
Also `BackgroundContextWorker.InBackground` for a more sensible idenfifier for if we're in a background thread.
* Add missing XML docs.
* Address Feedback
* Maybe I should start drinking coffee.
* Some more feedback.
* Remove flush warning, Refocus window after making background context
* Changes to allow explicit management of service threads
* Remove now unused code
* Remove ThreadCounter, its no longer needed
* Allow and use separate server per service, also fix exit issues
* New policy change: PTC version now uses PR number
* Implement block placement
Implement a simple pass which re-orders cold blocks at the end of the
list of blocks in the CFG.
* Set PPTC version
* Use Array.Resize
Address gdkchan's feedback
* Relax block ordering constraints
Before `block.Next` had to follow `block.ListNext`, now it does not.
Instead `CodeGenerator` will now emit the necessary jump instructions
to ensure control flow.
This makes control flow and block order modifications easier. It also
eliminates some simple cases of redundant branches.
* Set PPTC version
* SIMD&FP load/store with scale > 4 should be undefined
* Catch more invalid encodings for FP&SIMD LDR/STR (reg variant)
* Set PTC version to PR number
* Allow launching with custom data directories
Don't load alternate keys when using custom directory
* Address gdkchan's comments
* Misc fixes to log levels
Added more enabled log levels by default
Moved successful config updation to Notice as
1. It's not a warning
2. Warnings could've been disabled by the config load and hence message
would be lost