Commit graph

205 commits

Author SHA1 Message Date
merry
e870d4e089 fixup! T16: Implement Add to SP (immediate) 2022-02-13 19:40:21 +00:00
merry
848ff31a19 fixup! T16: Implement ADD/SUB (SP) 2022-02-13 19:40:05 +00:00
merry
b6078d2c00 T16: Implement IT 2022-02-11 23:08:02 +00:00
merry
09c84d6e91 T16: Implement B (unconditional) 2022-02-11 23:08:02 +00:00
merry
9271abdff0 T16: Implement B (conditional) 2022-02-11 23:08:02 +00:00
merry
06f9a3dc60 T16: Implement SVC 2022-02-11 23:08:02 +00:00
merry
9c16e8695b T16: Implement LDM, STM 2022-02-11 23:08:02 +00:00
merry
108a6886f9 T16: Implement NOP 2022-02-11 23:08:02 +00:00
merry
41eab68113 T16: Implement REV, REV16, REVSH 2022-02-11 23:08:02 +00:00
merry
34290d38e2 T16: Implement PUSH, POP 2022-02-11 23:08:02 +00:00
merry
dabb5f2449 T16: Implement CBZ, CBNZ 2022-02-11 23:08:02 +00:00
merry
2055622c84 T16: Implement SXTH, SXTB, UXTH, UTXB 2022-02-11 23:08:02 +00:00
merry
e11cd2e50a T16: Implement ADD/SUB (SP) 2022-02-11 23:08:02 +00:00
merry
1a2ae16395 T16: Implement Add to SP (immediate) 2022-02-11 23:08:02 +00:00
merry
59e9c3d6b0 T16: Implement ADR 2022-02-11 23:08:02 +00:00
merry
baaf5e126e T16: Implement LDR/STR (SP) 2022-02-11 23:08:02 +00:00
merry
f3e068b94a T16: Implement {LDR,STR}{,B,H} (immediate) 2022-02-11 23:08:02 +00:00
merry
d3272c1498 T16: Implement {LDR,STR}{,H,B,SB,SH} (register) 2022-02-11 23:08:02 +00:00
merry
a9f952ad40 T16: Implement LDR (literal) 2022-02-11 23:08:02 +00:00
merry
d84c2417aa T16: Implement BLX (reg) 2022-02-11 23:08:02 +00:00
merry
2876344cca T16: Implement ADD, CMP, MOV (high reg) 2022-02-11 23:08:02 +00:00
merry
43feb68b11 T16: Implement ANDS, EORS, LSLS, LSRS, ASRS, ADCS, SBCS, RORS, TST, NEGS, CMP, CMN, ORRS, MULS, BICS, MVNS (low registers) 2022-02-11 23:08:02 +00:00
merry
284272854b T16: Implement MOVS, CMP, ADDS, SUBS (8-bit immediate) 2022-02-11 23:08:02 +00:00
merry
7a09aea0dc T16: Implement ADDS, SUBS (3-bit immediate) 2022-02-11 23:08:01 +00:00
merry
3d663a1c8c T16: Implement ADDS, SUBS (reg) 2022-02-11 23:08:01 +00:00
merry
15ccdff751 T16: Implement LSL/LSR/ASR (imm) 2022-02-11 23:08:01 +00:00
merry
cb4ccec421 T16: Implement BX 2022-02-11 23:08:01 +00:00
merry
e1bbf8d7b9 OpCodeTables: Improve thumb fast lookup 2022-02-11 23:08:01 +00:00
merry
19c6c1c11c OpCodeTable: Prepare for thumb instructions 2022-02-11 23:08:01 +00:00
merry
08e1e0c985 OpCodeTable: Remove existing thumb instruction implementations 2022-02-11 23:08:01 +00:00
merry
1379f41d5d OpCodeTable: Minor cleanup 2022-02-11 23:08:01 +00:00
merry
5c2e780d40 Decoders: Add InITBlock argument 2022-02-11 23:08:01 +00:00
merry
ce71f9144e
InstEmitMemory32: Literal loads always have word-aligned PC (#3104) 2022-02-11 17:51:03 -03:00
gdkchan
c3c3914ed3
Add a limit on the number of uses a constant may have (#3097) 2022-02-09 17:42:47 -03:00
merry
86b37d0ff7
ARMeilleure: A32: Implement SHSUB8 and UHSUB8 (#3089)
* ARMeilleure: A32: Implement UHSUB8

* ARMeilleure: A32: Implement SHSUB8
2022-02-08 10:46:42 +01:00
merry
88d3ffb97c
ARMeilleure: A32: Implement SHADD8 (#3086) 2022-02-06 12:25:45 -03:00
merry
222b1ad7da
ARMeilleure: OpCodeTable: Add CMN (RsReg) (#3087) 2022-02-06 02:01:05 +01:00
gdkchan
bd412afb9f
Fix small precision error on CPU reciprocal estimate instructions (#3061)
* Fix small precision error on CPU reciprocal estimate instructions

* PPTC version bump
2022-01-29 23:59:34 +01:00
gdkchan
f3bfd799e1
Fix calls passing V128 values on Linux (#3034)
* Fix calls passing V128 values on Linux

* PPTC version bump
2022-01-24 11:23:24 +01:00
gdkchan
f0824fde9f
Add host CPU memory barriers for DMB/DSB and ordered load/store (#3015)
* Add host CPU memory barriers for DMB/DSB and ordered load/store

* PPTC version bump

* Revert to old barrier order
2022-01-21 12:47:34 -03:00
sharmander
60f7cba30a
Implement FCVTNS (Scalar GP) (#2953)
* Implement FCVTNS (Scalar GP)

* Update Ptc Version
2022-01-19 22:21:44 -03:00
gdkchan
bd215e447d
Fix return type mismatch on 32-bit titles (#3000) 2022-01-16 08:39:43 -03:00
sharmander
e5f7ff1eee
CPU - Implement FCVTMS (Vector) (#2937)
* Add FCVTMS_V Implementation to Armeilleure

* Fix opcode designation

* Add tests

* Amend Ptc version

* Fix OpCode / Tests

* Create Math.Floor helper method + Update implementation

* Address gdk comments

* Re-address gdk comments

* Update ARMeilleure/Decoders/OpCodeTable.cs

Co-authored-by: gdkchan <gab.dark.100@gmail.com>

* Update Tests to use 2S (4S) and 2D

Co-authored-by: gdkchan <gab.dark.100@gmail.com>
2022-01-04 16:45:28 -03:00
gdkchan
e24949ca2c
Implement CSDB instruction (#2927) 2021-12-19 11:19:05 -03:00
Mary
00c69f2098
Remove usage of Mono.Posix.NETStandard accross all projects (#2906)
* Remove usage of Mono.Posix.NETStandard in Ryujinx project

* Remove usage of Mono.Posix.NETStandard in ARMeilleure project

* Remove usage of Mono.Posix.NETStandard in Ryujinx.Memory project

* Address gdkchan's comments
2021-12-08 18:24:26 -03:00
Piyachet Kanda
3e2f89b4fd
Implement UHADD8 instruction (#2908)
* Implement UHADD8 instruction along with a test unit

* Update PTC revision number
2021-12-08 17:05:59 -03:00
Mary
f39fce8f54
misc: Migrate usage of RuntimeInformation to OperatingSystem (#2901)
Very basic migration across the codebase.
2021-12-04 20:02:30 -03:00
Mary
57d3296ba4
infra: Migrate to .NET 6 (#2829)
* infra: Migrate to .NET 6

* Rollback version naming change

* Workaround .NET 6 ZipArchive API issues

* ci: Switch to VS 2022 for AppVeyor

CI is now ready for .NET 6

* Suppress WebClient warning in DoUpdateWithMultipleThreads

* Attempt to workaround System.Drawing.Common changes on 6.0.0

* Change keyboard rendering from System.Drawing to ImageSharp

* Make the software keyboard renderer multithreaded

* Bump ImageSharp version to 1.0.4 to fix a bug in Image.Load

* Add fallback fonts to the keyboard renderer

* Fix warnings

* Address caian's comment

* Clean up linux workaround as it's uneeded now

* Update readme

Co-authored-by: Caian Benedicto <caianbene@gmail.com>
2021-11-28 21:24:17 +01:00
FICTURE7
fbf40424f4
Add an early TailMerge pass (#2721)
* Add an early `TailMerge` pass

Some translations can have a lot of guest calls and since for each guest
call there is a call guard which may return. This can produce a lot of
epilogue code for returns. This pass merges the epilogue into a single
block.

```
Using filter 'hcq'.
Using metric 'code size'.

Total diff: -1648111 (-7.19 %) (bytes):
  Base: 22913847
  Diff: 21265736

Improved: 4567, regressed: 14, unchanged: 144
```

* Set PTC version

* Address feedback

* Handle `void` returning functions

* Actually handle `void` returning functions

* Fix `RegisterToLocal` logging
2021-10-18 19:51:22 -03:00
FICTURE7
69093cf2d6
Optimize LSRA (#2563)
* Optimize `TryAllocateRegWithtoutSpill` a bit

* Add a fast path for when all registers are live.
* Do not query `GetOverlapPosition` if the register is already in use
  (i.e: free position is 0).

* Do not allocate child split list if not parent

* Turn `LiveRange` into a reference struct

`LiveRange` is now a reference wrapping struct like `Operand` and
`Operation`.

It has also been changed into a singly linked-list. In micro-benchmarks
traversing the linked-list was faster than binary search on `List<T>`.
Even for quite large input sizes (e.g: 1,000,000), surprisingly.

Could be because the code gen for traversing the linked-list is much
much cleaner and there is no virtual dispatch happening when checking if
intervals overlaps.

* Turn `LiveInterval` into an iterator

The LSRA allocates in forward order and never inspect previous
`LiveInterval` once they are expired. Something similar can be done for
the `LiveRange`s within the `LiveInterval`s themselves.

The `LiveInterval` is turned into a iterator which expires `LiveRange`
within it. The iterator is moved forward along with interval walking
code, i.e: AllocateInterval(context, interval, cIndex).

* Remove `LinearScanAllocator.Sources`

Local methods are less susceptible to do allocations than lambdas.

* Optimize `GetOverlapPosition(interval)` a bit

Time complexity should be in O(n+m) instead of O(nm) now.

* Optimize `NumberLocals` a bit

Use the same idea as in `HybridAllocator` to store the visited state
in the MSB of the Operand's value instead of using a `HashSet<T>`.

* Optimize `InsertSplitCopies` a bit

Avoid allocating a redundant `CopyResolver`.

* Optimize `InsertSplitCopiesAtEdges` a bit

Avoid redundant allocations of `CopyResolver`.

* Use stack allocation for `freePositions`

Avoid redundant computations.

* Add `UseList`

Replace `SortedIntegerList` with an even more specialized data
structure. It allocates memory on the arena allocators and does not
require copying use positions when splitting it.

* Turn `LiveInterval` into a reference struct

`LiveInterval` is now a reference wrapping struct like `Operand` and
`Operation`.

The rationale behind turning this in a reference wrapping struct is
because a `LiveInterval` is associated with each local variable, and
these intervals may themselves be split further. I've seen translations
having up to 8000 local variables.

To make the `LiveInterval` unmanaged, a new data structure called
`LiveIntervalList` was added to store child splits. This differs from
`SortedList<,>` because it can contain intervals with the same start
position.

Really wished we got some more of C++ template in C#. :^(

* Optimize `GetChildSplit` a bit

No need to inspect the remaining ranges if we've reached a range which
starts after position, since the split list is ordered.

* Optimize `CopyResolver` a bit

Lazily allocate the fill, spill and parallel copy structures since most
of the time only one of them is needed.

* Optimize `BitMap.Enumerator` a bit

Marking `MoveNext` as `AggressiveInlining` allows RyuJIT to promote the
`Enumerator` struct into registers completely, reducing load/store code
a lot since it does not have to store the struct on the stack for ABI
purposes.

* Use stack allocation for `use/blockedPositions`

* Optimize `AllocateWithSpill` a bit

* Address feedback

* Make `LiveInterval.AddRange(,)` more conservative

Produces no diff against master, but just for good measure.
2021-10-08 18:15:44 -03:00