Commit graph

201 commits

Author SHA1 Message Date
merry
9271abdff0 T16: Implement B (conditional) 2022-02-11 23:08:02 +00:00
merry
06f9a3dc60 T16: Implement SVC 2022-02-11 23:08:02 +00:00
merry
9c16e8695b T16: Implement LDM, STM 2022-02-11 23:08:02 +00:00
merry
108a6886f9 T16: Implement NOP 2022-02-11 23:08:02 +00:00
merry
41eab68113 T16: Implement REV, REV16, REVSH 2022-02-11 23:08:02 +00:00
merry
34290d38e2 T16: Implement PUSH, POP 2022-02-11 23:08:02 +00:00
merry
dabb5f2449 T16: Implement CBZ, CBNZ 2022-02-11 23:08:02 +00:00
merry
2055622c84 T16: Implement SXTH, SXTB, UXTH, UTXB 2022-02-11 23:08:02 +00:00
merry
e11cd2e50a T16: Implement ADD/SUB (SP) 2022-02-11 23:08:02 +00:00
merry
1a2ae16395 T16: Implement Add to SP (immediate) 2022-02-11 23:08:02 +00:00
merry
59e9c3d6b0 T16: Implement ADR 2022-02-11 23:08:02 +00:00
merry
baaf5e126e T16: Implement LDR/STR (SP) 2022-02-11 23:08:02 +00:00
merry
f3e068b94a T16: Implement {LDR,STR}{,B,H} (immediate) 2022-02-11 23:08:02 +00:00
merry
d3272c1498 T16: Implement {LDR,STR}{,H,B,SB,SH} (register) 2022-02-11 23:08:02 +00:00
merry
a9f952ad40 T16: Implement LDR (literal) 2022-02-11 23:08:02 +00:00
merry
d84c2417aa T16: Implement BLX (reg) 2022-02-11 23:08:02 +00:00
merry
2876344cca T16: Implement ADD, CMP, MOV (high reg) 2022-02-11 23:08:02 +00:00
merry
43feb68b11 T16: Implement ANDS, EORS, LSLS, LSRS, ASRS, ADCS, SBCS, RORS, TST, NEGS, CMP, CMN, ORRS, MULS, BICS, MVNS (low registers) 2022-02-11 23:08:02 +00:00
merry
284272854b T16: Implement MOVS, CMP, ADDS, SUBS (8-bit immediate) 2022-02-11 23:08:02 +00:00
merry
7a09aea0dc T16: Implement ADDS, SUBS (3-bit immediate) 2022-02-11 23:08:01 +00:00
merry
3d663a1c8c T16: Implement ADDS, SUBS (reg) 2022-02-11 23:08:01 +00:00
merry
15ccdff751 T16: Implement LSL/LSR/ASR (imm) 2022-02-11 23:08:01 +00:00
merry
cb4ccec421 T16: Implement BX 2022-02-11 23:08:01 +00:00
merry
e1bbf8d7b9 OpCodeTables: Improve thumb fast lookup 2022-02-11 23:08:01 +00:00
merry
19c6c1c11c OpCodeTable: Prepare for thumb instructions 2022-02-11 23:08:01 +00:00
merry
08e1e0c985 OpCodeTable: Remove existing thumb instruction implementations 2022-02-11 23:08:01 +00:00
merry
1379f41d5d OpCodeTable: Minor cleanup 2022-02-11 23:08:01 +00:00
merry
5c2e780d40 Decoders: Add InITBlock argument 2022-02-11 23:08:01 +00:00
merry
ce71f9144e
InstEmitMemory32: Literal loads always have word-aligned PC (#3104) 2022-02-11 17:51:03 -03:00
gdkchan
c3c3914ed3
Add a limit on the number of uses a constant may have (#3097) 2022-02-09 17:42:47 -03:00
merry
86b37d0ff7
ARMeilleure: A32: Implement SHSUB8 and UHSUB8 (#3089)
* ARMeilleure: A32: Implement UHSUB8

* ARMeilleure: A32: Implement SHSUB8
2022-02-08 10:46:42 +01:00
merry
88d3ffb97c
ARMeilleure: A32: Implement SHADD8 (#3086) 2022-02-06 12:25:45 -03:00
merry
222b1ad7da
ARMeilleure: OpCodeTable: Add CMN (RsReg) (#3087) 2022-02-06 02:01:05 +01:00
gdkchan
bd412afb9f
Fix small precision error on CPU reciprocal estimate instructions (#3061)
* Fix small precision error on CPU reciprocal estimate instructions

* PPTC version bump
2022-01-29 23:59:34 +01:00
gdkchan
f3bfd799e1
Fix calls passing V128 values on Linux (#3034)
* Fix calls passing V128 values on Linux

* PPTC version bump
2022-01-24 11:23:24 +01:00
gdkchan
f0824fde9f
Add host CPU memory barriers for DMB/DSB and ordered load/store (#3015)
* Add host CPU memory barriers for DMB/DSB and ordered load/store

* PPTC version bump

* Revert to old barrier order
2022-01-21 12:47:34 -03:00
sharmander
60f7cba30a
Implement FCVTNS (Scalar GP) (#2953)
* Implement FCVTNS (Scalar GP)

* Update Ptc Version
2022-01-19 22:21:44 -03:00
gdkchan
bd215e447d
Fix return type mismatch on 32-bit titles (#3000) 2022-01-16 08:39:43 -03:00
sharmander
e5f7ff1eee
CPU - Implement FCVTMS (Vector) (#2937)
* Add FCVTMS_V Implementation to Armeilleure

* Fix opcode designation

* Add tests

* Amend Ptc version

* Fix OpCode / Tests

* Create Math.Floor helper method + Update implementation

* Address gdk comments

* Re-address gdk comments

* Update ARMeilleure/Decoders/OpCodeTable.cs

Co-authored-by: gdkchan <gab.dark.100@gmail.com>

* Update Tests to use 2S (4S) and 2D

Co-authored-by: gdkchan <gab.dark.100@gmail.com>
2022-01-04 16:45:28 -03:00
gdkchan
e24949ca2c
Implement CSDB instruction (#2927) 2021-12-19 11:19:05 -03:00
Mary
00c69f2098
Remove usage of Mono.Posix.NETStandard accross all projects (#2906)
* Remove usage of Mono.Posix.NETStandard in Ryujinx project

* Remove usage of Mono.Posix.NETStandard in ARMeilleure project

* Remove usage of Mono.Posix.NETStandard in Ryujinx.Memory project

* Address gdkchan's comments
2021-12-08 18:24:26 -03:00
Piyachet Kanda
3e2f89b4fd
Implement UHADD8 instruction (#2908)
* Implement UHADD8 instruction along with a test unit

* Update PTC revision number
2021-12-08 17:05:59 -03:00
Mary
f39fce8f54
misc: Migrate usage of RuntimeInformation to OperatingSystem (#2901)
Very basic migration across the codebase.
2021-12-04 20:02:30 -03:00
Mary
57d3296ba4
infra: Migrate to .NET 6 (#2829)
* infra: Migrate to .NET 6

* Rollback version naming change

* Workaround .NET 6 ZipArchive API issues

* ci: Switch to VS 2022 for AppVeyor

CI is now ready for .NET 6

* Suppress WebClient warning in DoUpdateWithMultipleThreads

* Attempt to workaround System.Drawing.Common changes on 6.0.0

* Change keyboard rendering from System.Drawing to ImageSharp

* Make the software keyboard renderer multithreaded

* Bump ImageSharp version to 1.0.4 to fix a bug in Image.Load

* Add fallback fonts to the keyboard renderer

* Fix warnings

* Address caian's comment

* Clean up linux workaround as it's uneeded now

* Update readme

Co-authored-by: Caian Benedicto <caianbene@gmail.com>
2021-11-28 21:24:17 +01:00
FICTURE7
fbf40424f4
Add an early TailMerge pass (#2721)
* Add an early `TailMerge` pass

Some translations can have a lot of guest calls and since for each guest
call there is a call guard which may return. This can produce a lot of
epilogue code for returns. This pass merges the epilogue into a single
block.

```
Using filter 'hcq'.
Using metric 'code size'.

Total diff: -1648111 (-7.19 %) (bytes):
  Base: 22913847
  Diff: 21265736

Improved: 4567, regressed: 14, unchanged: 144
```

* Set PTC version

* Address feedback

* Handle `void` returning functions

* Actually handle `void` returning functions

* Fix `RegisterToLocal` logging
2021-10-18 19:51:22 -03:00
FICTURE7
69093cf2d6
Optimize LSRA (#2563)
* Optimize `TryAllocateRegWithtoutSpill` a bit

* Add a fast path for when all registers are live.
* Do not query `GetOverlapPosition` if the register is already in use
  (i.e: free position is 0).

* Do not allocate child split list if not parent

* Turn `LiveRange` into a reference struct

`LiveRange` is now a reference wrapping struct like `Operand` and
`Operation`.

It has also been changed into a singly linked-list. In micro-benchmarks
traversing the linked-list was faster than binary search on `List<T>`.
Even for quite large input sizes (e.g: 1,000,000), surprisingly.

Could be because the code gen for traversing the linked-list is much
much cleaner and there is no virtual dispatch happening when checking if
intervals overlaps.

* Turn `LiveInterval` into an iterator

The LSRA allocates in forward order and never inspect previous
`LiveInterval` once they are expired. Something similar can be done for
the `LiveRange`s within the `LiveInterval`s themselves.

The `LiveInterval` is turned into a iterator which expires `LiveRange`
within it. The iterator is moved forward along with interval walking
code, i.e: AllocateInterval(context, interval, cIndex).

* Remove `LinearScanAllocator.Sources`

Local methods are less susceptible to do allocations than lambdas.

* Optimize `GetOverlapPosition(interval)` a bit

Time complexity should be in O(n+m) instead of O(nm) now.

* Optimize `NumberLocals` a bit

Use the same idea as in `HybridAllocator` to store the visited state
in the MSB of the Operand's value instead of using a `HashSet<T>`.

* Optimize `InsertSplitCopies` a bit

Avoid allocating a redundant `CopyResolver`.

* Optimize `InsertSplitCopiesAtEdges` a bit

Avoid redundant allocations of `CopyResolver`.

* Use stack allocation for `freePositions`

Avoid redundant computations.

* Add `UseList`

Replace `SortedIntegerList` with an even more specialized data
structure. It allocates memory on the arena allocators and does not
require copying use positions when splitting it.

* Turn `LiveInterval` into a reference struct

`LiveInterval` is now a reference wrapping struct like `Operand` and
`Operation`.

The rationale behind turning this in a reference wrapping struct is
because a `LiveInterval` is associated with each local variable, and
these intervals may themselves be split further. I've seen translations
having up to 8000 local variables.

To make the `LiveInterval` unmanaged, a new data structure called
`LiveIntervalList` was added to store child splits. This differs from
`SortedList<,>` because it can contain intervals with the same start
position.

Really wished we got some more of C++ template in C#. :^(

* Optimize `GetChildSplit` a bit

No need to inspect the remaining ranges if we've reached a range which
starts after position, since the split list is ordered.

* Optimize `CopyResolver` a bit

Lazily allocate the fill, spill and parallel copy structures since most
of the time only one of them is needed.

* Optimize `BitMap.Enumerator` a bit

Marking `MoveNext` as `AggressiveInlining` allows RyuJIT to promote the
`Enumerator` struct into registers completely, reducing load/store code
a lot since it does not have to store the struct on the stack for ABI
purposes.

* Use stack allocation for `use/blockedPositions`

* Optimize `AllocateWithSpill` a bit

* Address feedback

* Make `LiveInterval.AddRange(,)` more conservative

Produces no diff against master, but just for good measure.
2021-10-08 18:15:44 -03:00
FICTURE7
ecc64c934d
Add Operand.Label support to Assembler (#2680)
* Add `Operand.Label` support to `Assembler`

This adds label support to `Assembler` and enables branch tightening
when compiling with relocatables. Jump management and patching has been
moved to the `Assembler`.

* Move instruction table to `Assembler.Table`

* Set PTC internal version

* Rename `Assembler.Table` to `AssemblerTable`
2021-10-05 14:04:55 -03:00
riperiperi
d92fff541b
Replace CacheResourceWrite with more general "precise" write (#2684)
* Replace CacheResourceWrite with more general "precise" write

The goal of CacheResourceWrite was to notify GPU resources when they were modified directly, by looking up the modified address/size in a structure and calling a method on each resource. The downside of this is that each resource cache has to be queried individually, they all have to implement their own way to do this, and it can only signal to resources using the same PhysicalMemory instance.

This PR adds the ability to signal a write as "precise" on the tracking, which signals a special handler (if present) which can be used to avoid unnecessary flush actions, or maybe even more. For buffers, precise writes specifically do not flush, and instead punch a hole in the modified range list to indicate that the data on GPU has been replaced.

The downside is that precise actions must ignore the page protection bits and always signal - as they need to notify the target resource to ignore the sequence number optimization.

I had to reintroduce the sequence number increment after I2M, as removing it was causing issues in rabbids kingdom battle. However - all resources modified by I2M are notified directly to lower their sequence number, so the problem is likely that another unrelated resource is not being properly updated. Thankfully, doing this does not affect performance in the games I tested.

This should fix regressions from #2624. Test any games that were broken by that. (RF4, rabbids kingdom battle)

I've also added a sequence number increment to ThreedClass.IncrementSyncpoint, as it seems to fix buffer corruption in OpenGL homebrew. (this was a regression from removing sequence number increment from constant buffer update - another unrelated resource thing)

* Add tests.

* Add XML docs for GpuRegionHandle

* Skip UpdateProtection if only precise actions were called

This allows precise actions to skip reprotection costs.
2021-09-29 02:27:03 +02:00
FICTURE7
312be74861
Optimize HybridAllocator (#2637)
* Store constant `Operand`s in the `LocalInfo`

Since the spill slot and register assigned is fixed, we can just store
the `Operand` reference in the `LocalInfo` struct. This allows skipping
hitting the intern-table for a look up.

* Skip `Uses`/`Assignments` management

Since the `HybridAllocator` is the last pass and we do not care about
uses/assignments we can skip managing that when setting destinations or
sources.

* Make `GetLocalInfo` inlineable

Also fix a possible issue where with numbered locals. See or-assignment
operator in `SetVisited(local)` before patch.

* Do not run `BlockPlacement` in LCQ

With the host mapped memory manager, there is a lot less cold code to
split from hot code. So disabling this in LCQ gives some extra
throughput - where we need it.

* Address Mou-Ikkai's feedback

* Apply suggestions from code review

Co-authored-by: VocalFan <45863583+Mou-Ikkai@users.noreply.github.com>

* Move check to an assert

Co-authored-by: VocalFan <45863583+Mou-Ikkai@users.noreply.github.com>
2021-09-29 01:38:37 +02:00
riperiperi
1ae690ba2f
Use normal memory store path for DC ZVA (#2693)
Seems like this is used as an optimized way to clear memory in homebrew applications. Unfortunately, calling the software fallback method every 8 bytes was not very optimal.

The existing EmitStore is used by passing in ZR as the register to get a 0 write.
2021-09-29 01:21:30 +02:00