Commit graph

107 commits

Author SHA1 Message Date
LDj3SNuD 8a33e884f8
Fix Vnmls_S fast path (F64: losing input d value). Fix Vnmla_S & Vnmls_S slow paths (using fused inst.s). Fix Vfma_V slow path not using StandardFPSCRValue(). (#1775)
* Fix Vnmls_S fast path (F64: losing input d value). Fix Vnmla_S & Vnmls_S slow paths (using fused inst.s).

Add Vfma_S & Vfms_S Fma fast paths.
Add Vfnma_S inst. with Fma/Sse fast paths and slow path.
Add Vfnms_S Sse fast path.

Add Tests for affected inst.s.

Nits.

* InternalVersion = 1775

* Nits.

* Fix Vfma_V slow path not using StandardFPSCRValue().

* Nit: Fix Vfma_V order.

* Add Vfms_V Sse fast path and slow path.

* Add Vfma_V and Vfms_V Test.
2020-12-17 20:43:41 +01:00
sharmander e901b7850c
CPU: Implement VRINTX.F32 | VRINTX.F64 (#1776)
* Start implementation

* Draft

* Updated opcode.

Needs verification.

* Clean up code.

* Update implementation and tests.

* Update implemenation + tests

* Get RM from FPSCR + Do not use emit/addintrinsic

* Remove "fast" path, as recommended by gdk.

* Variable DELETED.

* Update ARMeilleure/Decoders/OpCodeTable.cs

Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>

* Update ARMeilleure/Instructions/InstEmitSimdCvt32.cs

Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>

* Update ARMeilleure/Instructions/InstEmitSimdCvt32.cs

Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>

* Update ARMeilleure/Instructions/InstEmitSimdCvt32.cs

Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>

* Move method

* stringing things together :)

Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>
2020-12-16 20:27:15 -03:00
sharmander 3332b29f01
CPU: Implement VFMA (Vector) (#1762)
* Implement VFMA.F64

* Simplify switch

* Simplify FMA Instructions into their own IntrinsicType.

* Remove whitespace

* Fix indentation

* Change tests for Vfnms -- disable inf / nan

* Move args up, not description ;)

* Implementation Complete.

All Tests Pass (Slow / Fast Path)

* Move location of function in assembler + test updates.

* Shift params upwards

* Remove unused function

* Update PTC version.

* Add comments / re-oreder opcode table.

* Remove whitespace

* Fix nit

* Fix nit.

* Fix whitespace

* Wrong opcode was used by a bad merge.

* Addressed rip's comments.
2020-12-15 00:01:52 -03:00
sharmander 36f6bbf5b9
CPU: Implement VFNMA.F32 | F.64 (#1783)
* Implement VFNMA.F<32/64>

* Update PTC Version

* Update Implementation & Renames & Correct Order

* Fix alignment

* Update implementation to not trigger assert

* Actually use the intrinsic that makes sense :)
2020-12-07 21:04:01 -03:00
LDj3SNuD 567ea726e1
Add support for guest Fz (Fpcr) mode through host Ftz and Daz (Mxcsr) modes (fast paths). (#1630)
* Add support for guest Fz (Fpcr) mode through host Ftz and Daz (Mxcsr) modes (fast paths).

* Ptc.InternalVersion = 1630

* Nits.

* Address comments.

* Update Ptc.cs

* Address comment.
2020-12-07 10:37:07 +01:00
sharmander b479a43939
CPU: Implement VFNMS.F32/64 (#1758)
* Add necessary methods / op-code

* Enable Support for FMA Instruction Set

* Add Intrinsics / Assembly Opcodes for VFMSUB231XX.

* Add X86 Instructions for VFMSUB231XX

* Implement VFNMS

* Implement VFNMS Tests

* Add special cases for FMA instructions.

* Update PPTC Version

* Remove unused Op

* Move Check into Assert / Cleanup

* Rename and cleanup

* Whitespace

* Whitespace / Rename

* Re-sort

* Address final requests

* Implement VFMA.F64

* Simplify switch

* Simplify FMA Instructions into their own IntrinsicType.

* Remove whitespace

* Fix indentation

* Change tests for Vfnms -- disable inf / nan

* Move args up, not description ;)

* Undo vfma

* Completely remove vfms code.,

* Fix order of instruction in assembler
2020-12-03 20:20:02 +01:00
LDj3SNuD 0679084f11
CPU (A64): Add FP16/FP32 fast paths (F16C Intrinsics) for Fcvt_S, Fcvtl_V & Fcvtn_V Instructions. Now HardwareCapabilities uses CpuId. (#1650)
* net5.0

* CPU (A64): Add FP16/FP32 fast paths (F16C Intrinsics) for Fcvt_S, Fcvtl_V & Fcvtn_V Instructions. Switch to .NET 5.0.

Nits.

Tests performed successfully in both debug and release mode (for all instructions involved).

* Address comment.

* Update appveyor.yml

* Revert "Update appveyor.yml"

This reverts commit 27cdd59e8b.

* Remove Assembler CpuId.

* Update appveyor.yml

* Address comment.
2020-11-18 19:35:54 +01:00
Mary 863edae328
shader cache: Fix Linux boot issues (#1709)
* shader cache: Fix Linux boot issues

This rollback the init logic back to previous state, and replicate the
way PTC handle initialization.

* shader cache: set default state of ready for translation event to false

* Fix cpu unit tests
2020-11-17 22:40:19 +01:00
LDj3SNuD 2cb8bd7006
CPU (A64): Add Scvtf_S_Fixed & Ucvtf_S_Fixed with Tests. (#1492) 2020-08-31 20:48:21 -03:00
LDj3SNuD 6938988427
Fix Vcvt_FI & Vcvt_RM; Add Vfma_S & Vfms_S. Add Tests. (#1471)
* Fix Vcvt_FI & Vcvt_RM; Add Vfma_S & Vfms_S. Add Tests.

* Address PR feedback & Nit.
2020-08-13 02:34:02 -03:00
LDj3SNuD e36e97c64d
CPU: This PR fixes Fpscr, among other things. (#1433)
* CPU: This PR fixes Fpscr, among other things.

* Add Fpscr.Qc = 1 if sat. for Vqrshrn & Vqrshrun.

* Fix Vcmp & Vcmpe opcode table.

* Revert "Fix Vcmp & Vcmpe opcode table."

This reverts commit c117d9410d.

* Address PR feedbacks.
2020-08-08 17:18:51 +02:00
Valentin PONS 3af2ce74ec
Implements some 32-bit instructions (VBIC, VTST, VSRA) (#1192)
* Added some 32 bits instructions:

* VBIC
* VTST
* VSRA

* Incremented the PTC

* Add tests and fix implementation

* Fixed VBIC immediate opcode mapping

* Hey hey!

* Nit.

Co-authored-by: gdkchan <gab.dark.100@gmail.com>
Co-authored-by: LDj3SNuD <dvitiello@gmail.com>
Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>
2020-07-19 15:11:58 -03:00
LDj3SNuD 56a61a5758
CPU: A32: Fix Vabs_V & Vneg_V (S8, S16, S32 & F32); add Tests. (#1394)
* Fix Vabs_V & Vneg_V (S8, S16, S32 & F32); add Tests.

* Update Ptc.cs
2020-07-17 10:57:49 -03:00
LDj3SNuD 88619d71b8
CPU: A32: Add Vadd & Vsub Wide (S/U_8/16/32) Inst.s with Test. (#1390) 2020-07-17 14:21:40 +10:00
LDj3SNuD a804db6eed
Add Fmax/minv_V & S/Ushl_S Inst.s with Tests. Fix Maxps/d & Minps/d d… (#1335)
* Add Fmax/minv_V & S/Ushl_S Inst.s with Tests. Fix Maxps/d & Minps/d double zero sign handling. Allows better handling of NaNs.

* Optimized EmitSse2VectorIsNaNOpF() for multiple uses per opF.
2020-07-13 21:08:47 +10:00
riperiperi d7044b10a2
Add SSE4.2 Path for CRC32, add A32 variant, add tests for non-castagnoli variants. (#1328)
* Add CRC32 A32 instructions.

* Fix CRC32 instructions.

* Add CRC intrinsic and fast path.

Loop is currently unrolled, will look into adding temp vars after tests are added.

* Begin work on Crc tests

* Fix SSE4.2 path for CRC32C, finialize tests.

* Remove unused IR path.

* Fix spacing between prefix checks.

* This should be Src.

* PTC Version

* OpCodeTable Order

* Integer check improvement. Value and Crc can be either 32 or 64 size.

* This wasn't necessary...

* If size is 3, value type must be I64.

* Fix same src+dest handling for non crc intrinsics.

* Pre-fix (ha) issue with vex encodings
2020-07-13 20:48:14 +10:00
riperiperi 9a49f8aec9
Fix VMVN (immediate), Add VPMIN, VPMAX, VMVN (register) (#1303)
* Add Vmvn (register), tests for both Vmvn variants.

* Add Vpmin, Vpmax, improve Non-FastFp accuracy for Vpadd

* Rebase on top of PTC.

* Add Nopcode

* Increment PTC version.

* Fix nits.
2020-06-24 10:43:44 +10:00
LDj3SNuD 5e724cf24e
Add Profiled Persistent Translation Cache. (#769)
* Delete DelegateTypes.cs

* Delete DelegateCache.cs

* Add files via upload

* Update Horizon.cs

* Update Program.cs

* Update MainWindow.cs

* Update Aot.cs

* Update RelocEntry.cs

* Update Translator.cs

* Update MemoryManager.cs

* Update InstEmitMemoryHelper.cs

* Update Delegates.cs

* Nit.

* Nit.

* Nit.

* 10 fewer MSIL bytes for us

* Add comment. Nits.

* Update Translator.cs

* Update Aot.cs

* Nits.

* Opt..

* Opt..

* Opt..

* Opt..

* Allow to change compression level.

* Update MemoryManager.cs

* Update Translator.cs

* Manage corner cases during the save phase. Nits.

* Update Aot.cs

* Translator response tweak for Aot disabled. Nit.

* Nit.

* Nits.

* Create DelegateHelpers.cs

* Update Delegates.cs

* Nit.

* Nit.

* Nits.

* Fix due to #784.

* Fixes due to #757 & #841.

* Fix due to #846.

* Fix due to #847.

* Use MethodInfo for managed method calls.

Use IR methods instead of managed methods about Max/Min (S/U).
Follow-ups & Nits.

* Add missing exception messages.

Reintroduce slow path for Fmov_Vi.
Implement slow path for Fmov_Si.

* Switch to the new folder structure.

Nits.

* Impl. index-based relocation information. Impl. cache file version field.

* Nit.

* Address gdkchan comments.

Mainly:
- fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI.

* Address AcK77 comment.

* Address Thealexbarney, jduncanator & emmauss comments.

Header magic, CpuId (FI) & Aot -> Ptc.

* Adaptation to the new application reloading system.

Improvements to the call system of managed methods.
Follow-ups.
Nits.

* Get the same boot times as on master when PTC is disabled.

* Profiled Aot.

* A32 support (#897).

* #975 support (1 of 2).

* #975 support (2 of 2).

* Rebase fix & nits.

* Some fixes and nits (still one bug left).

* One fix & nits.

* Tests fix (by gdk) & nits.

* Support translations not only in high quality and rejit.

Nits.

* Added possibility to skip translations and continue execution, using `ESC` key.

* Update SettingsWindow.cs

* Update GLRenderer.cs

* Update Ptc.cs

* Disabled Profiled PTC by default as requested in the past by gdk.

* Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2).

Nits.

* Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations.

* Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling.

Modifications due to rebase.
Nits.

* Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only.

* Nits.

* Nits.

* Update Delegates.cs

* Nit.

* Update InstEmitSimdArithmetic.cs

* Address riperiperi comments.

* Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions.

* Implemented a simple redundant load/save mechanism.

Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator.
Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log.
Nits.

* Nit.

Improved Logger.PrintError in TexturePool.cs to avoid log spawn.
Added missing code for FZ handling (in output) for fp max/min instructions (slow paths).

* Add configuration migration for PTC

Co-authored-by: Thog <me@thog.eu>
2020-06-16 20:28:02 +02:00
LDj3SNuD 83d94b21d0
Add FMaxNmV & FMinNmV Inst.s with Test. (#1279)
Successful unit testing on Windows (debug and release mode).
2020-05-27 18:51:59 +02:00
gdkchan f77694e4f7
Implement a new physical memory manager and replace DeviceMemory (#856)
* Implement a new physical memory manager and replace DeviceMemory

* Proper generic constraints

* Fix debug build

* Add memory tests

* New CPU memory manager and general code cleanup

* Remove host memory management from CPU project, use Ryujinx.Memory instead

* Fix tests

* Document exceptions on MemoryBlock

* Fix leak on unix memory allocation

* Proper disposal of some objects on tests

* Fix JitCache not being set as initialized

* GetRef without checks for 8-bits and 16-bits CAS

* Add MemoryBlock destructor

* Throw in separate method to improve codegen

* Address PR feedback

* QueryModified improvements

* Fix memory write tracking not marking all pages as modified in some cases

* Simplify MarkRegionAsModified

* Remove XML doc for ghost param

* Add back optimization to avoid useless buffer updates

* Add Ryujinx.Cpu project, move MemoryManager there and remove MemoryBlockWrapper

* Some nits

* Do not perform address translation when size is 0

* Address PR feedback and format NativeInterface class

* Remove ghost parameter description

* Update Ryujinx.Cpu to .NET Core 3.1

* Address PR feedback

* Fix build

* Return a well defined value for GetPhysicalAddress with invalid VA, and do not return unmapped ranges as modified

* Typo
2020-05-04 08:54:50 +10:00
Ficture Seven e4ee61d6c3
Improve V128 (#1097)
* Improve V128

* Use LayoutKind.Sequential instead

* Add As<T>, Get<T> & Set<T>

* Fix CpuTest

* Rename Get<T> & Set<T> to Extract<T> & Insert<T>

* Add XML documentation

* Nit
2020-04-17 08:19:20 +10:00
LDj3SNuD 1de16f7653
Add Fcvtas_S/V & Fcvtau_S/V. (#1018) 2020-03-24 22:53:49 +01:00
riperiperi dd433c1296
Implement AESMC, AESIMC, AESE, AESD and VEOR AArch32 instructions (#982)
* Add VEOR and AES instructions.

* Add tests for crypto instructions.

* Update ValueSource name.
2020-03-14 10:29:58 +11:00
gdkchan c26f3774bd
Implement VMULL, VMLSL, VRSHR, VQRSHRN, VQRSHRUN AArch32 instructions + other fixes (#977)
* Implement VMULL, VMLSL, VQRSHRN, VQRSHRUN AArch32 instructions plus other fixes

* Re-align opcode table

* Re-enable undefined, use subclasses to fix checks

* Add test and fix VRSHR instruction

* PR feedback
2020-03-11 11:49:27 +11:00
gdkchan 89ccec197e
Implement VMOVL and VORR.I32 AArch32 SIMD instructions (#960)
* Implement VMOVL and VORR.I32 AArch32 SIMD instructions

* Rename <dt> to <size> on test description

* Rename Widen to Long and improve VMOVL implementation a bit
2020-03-10 16:17:30 +11:00
gdkchan fb0939f9b6
Add SSAT, SSAT16, USAT and USAT16 ARM32 instructions (#954)
* Implement SMULWB, SMULWT, SMLAWB, SMLAWT, and add tests for some multiply instructions

* Improve test descriptions

* Rename SMULH to SMUL__

* Add SSAT, SSAT16, USAT and USAT16 ARM32 instructions

* Fix new tests

* Replace AND 0xFFFF with 16-bits zero extension (more efficient)
2020-03-01 07:51:55 +11:00
gdkchan b8ee5b15ab
Implement FACGE and FACGT (Scalar and Vector) AArch64 SIMD instructions (#956) 2020-03-01 07:51:17 +11:00
riperiperi b1b6f294f2
Add most of the A32 instruction set to ARMeilleure (#897)
* Implement TEQ and MOV (Imm16)

* Initial work on A32 instructions + SVC. No tests yet, hangs in rtld.

* Implement CLZ, fix BFI and BFC

Now stops on SIMD initialization.

* Exclusive access instructions, fix to mul, system instructions.

Now gets to a break after SignalProcessWideKey64.

* Better impl of UBFX, add UDIV and SDIV

Now boots way further - now stuck on VMOV instruction.

* Many more instructions, start on SIMD and testing framework.

* Fix build issues

* svc: Rework 32 bit codepath

Fixing once and for all argument ordering issues.

* Fix 32 bits stacktrace

* hle debug: Add 32 bits dynamic section parsing

* Fix highCq mode, add many tests, fix some instruction bugs

Still suffers from critical malloc failure 😩

* Fix incorrect opcode decoders and a few more instructions.

* Add a few instructions and fix others. re-disable highCq for now.

Disabled the svc memory clear since i'm not sure about it.

* Fix build

* Fix typo in ordered/exclusive stores.

* Implement some more instructions, fix others.

Uxtab16/Sxtab16 are untested.

* Begin impl of pairwise, some other instructions.

* Add a few more instructions, a quick hack to fix svcs for now.

* Add tests and fix issues with VTRN, VZIP, VUZP

* Add a few more instructions, fix Vmul_1 encoding.

* Fix way too many instruction bugs, add tests for some of the more important ones.

* Fix HighCq, enable FastFP paths for some floating point instructions

(not entirely sure why these were disabled, so important to note this
commit exists)

Branching has been removed in A32 shifts until I figure out if it's
worth it

* Cleanup Part 1

There should be no functional change between these next few commits.
Should is the key word. (except for removing break handler)

* Implement 32 bits syscalls

Co-authored-by: riperiperi <rhy3756547@hotmail.com>

Implement all 32 bits counterparts of the 64 bits syscalls we currently
have.

* Refactor part 2: Move index/subindex logic to Operand

May have inadvertently fixed one (1) bug

* Add FlushProcessDataCache32

* Address jd's comments

* Remove 16 bit encodings from OpCodeTable

Still need to catch some edge cases (operands that use the "F" flag) and
make Q encodings with non-even indexes undefined.

* Correct Fpscr handling for FP vector slow paths

WIP

* Add StandardFPSCRValue behaviour for all Arithmetic instructions

* Add StandardFPSCRValue behaviour to compare instructions.

* Force passing of fpcr to FPProcessException and FPUnpack.

Reduces potential for code error significantly

* OpCode cleanup

* Remove urgency from DMB comment in MRRC

DMB is currently a no-op via the instruction, so it should likely still
be a no-op here.

* Test Cleanup

* Fix FPDefaultNaN on Ryzen CPUs

* Improve some tests, fix some shift instructions, add slow path for Vadd

* Fix Typo

* More test cleanup

* Flip order of Fx and index, to indicate that the operand's is the "base"

* Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does.

* Reintroduce alignment to DecoderHelper (removed by accident)

* One more realign as reading diffs is hard

* Use I32 registers in A32 (part 2)

Swap default integer register type based on current execution mode.

* FPSCR flags as Registers (part 1)

Still need to change NativeContext and ExecutionContext to allow
getting/setting with the flag values.

* Use I32 registers in A32 (part 1)

* FPSCR flags as registers (part 2)

Only CMP flags are on the registers right now. It could be useful to use
more of the space in non-fast-float when implementing A32 flags
accurately in the fast path.

* Address Feedback

* Correct FP->Int behaviour (should saturate)

* Make branches made by writing to PC eligible for Rejit

Greatly improves performance in most games.

* Remove unused branching for Vtbl

* RejitRequest as a class rather than a tuple

Makes a lot more sense than storing tuples on a dictionary.

* Add VMOVN, VSHR (imm), VSHRN (imm) and related tests

* Re-order InstEmitSystem32

Alphabetical sorting.

* Address Feedback

Feedback from Ac_K, remove and sort usings.

* Address Feedback 2

* Address Feedback from LDj3SNuD

Opcode table reordered to have alphabetical sorting within groups,
Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits,
Test nits and Test simplification with ValueSource.

* Add Debug Asserts to A32 helpers

Mainly to prevent the shift ones from being used on I64 operands, as
they expect I32 input for most operations (eg. carry flag setting), and
expect I32 input for shift and boolean amounts. Most other helper
functions don't take Operands, throw on out of range values, and take
specific types of OpCode, so didn't need any asserts.

* Use ConstF rather than creating an operand.

(useful for pooling in future)

* Move exclusive load to helper, reference call flag rather than literal 1.

* Address LDj feedback (minus table flatten)

one final look before it's all gone. the world is so beautiful.

* Flatten OpCodeTable

oh no

* Address more table ordering

* Call Flag as int on A32

Co-authored-by: Natalie C. <cyuubiapps@gmail.com>
Co-authored-by: Thog <thog@protonmail.com>
2020-02-24 08:20:40 +11:00
LDj3SNuD 0915731a9d Implemented fast paths for: (#846)
* opt

* Nit.

* opt_p2

* Nit.
2019-12-29 22:22:47 -03:00
LDj3SNuD 7c111a3567 Add Mrs & Msr (Nzcv) Inst., with Tests. (#819)
* Add Mrs & Msr (Nzcv) Inst., with Tests.

* Don't use `NativeInterface`.
2019-11-14 13:08:07 +11:00
LDj3SNuD eff8379d2a Add Sli_S/V & Sri_S/V inst.s (fast & slow paths), with Tests. (#797)
* Add Sli & Sri.

* Add scalar variants.
2019-10-24 20:37:42 -03:00
LDj3SNuD 16869402bf Add Tbx Inst. (fast & slow paths), with Tests. (#782)
* Update OpCodeTable.cs

* Update InstName.cs

* Update InstEmitSimdMove.cs

* Update SoftFallback.cs

* Update DelegateTypes.cs

* Update CpuTestSimdTbl.cs

* Update CpuTest.cs

* Update Ryujinx.Tests.csproj

* Nit.
2019-10-04 11:43:20 -03:00
gdkchan a731ab3a2a Add a new JIT compiler for CPU code (#693)
* Start of the ARMeilleure project

* Refactoring around the old IRAdapter, now renamed to PreAllocator

* Optimize the LowestBitSet method

* Add CLZ support and fix CLS implementation

* Add missing Equals and GetHashCode overrides on some structs, misc small tweaks

* Implement the ByteSwap IR instruction, and some refactoring on the assembler

* Implement the DivideUI IR instruction and fix 64-bits IDIV

* Correct constant operand type on CSINC

* Move division instructions implementation to InstEmitDiv

* Fix destination type for the ConditionalSelect IR instruction

* Implement UMULH and SMULH, with new IR instructions

* Fix some issues with shift instructions

* Fix constant types for BFM instructions

* Fix up new tests using the new V128 struct

* Update tests

* Move DIV tests to a separate file

* Add support for calls, and some instructions that depends on them

* Start adding support for SIMD & FP types, along with some of the related ARM instructions

* Fix some typos and the divide instruction with FP operands

* Fix wrong method call on Clz_V

* Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes

* Implement SIMD logical instructions and more misc. fixes

* Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations

* Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes

* Implement SIMD shift instruction and fix Dup_V

* Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table

* Fix check with tolerance on tester

* Implement FP & SIMD comparison instructions, and some fixes

* Update FCVT (Scalar) encoding on the table to support the Half-float variants

* Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes

* Use old memory access methods, made a start on SIMD memory insts support, some fixes

* Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes

* Fix arguments count with struct return values, other fixes

* More instructions

* Misc. fixes and integrate LDj3SNuD fixes

* Update tests

* Add a faster linear scan allocator, unwinding support on windows, and other changes

* Update Ryujinx.HLE

* Update Ryujinx.Graphics

* Fix V128 return pointer passing, RCX is clobbered

* Update Ryujinx.Tests

* Update ITimeZoneService

* Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks

* Use generic GetFunctionPointerForDelegate method and other tweaks

* Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics

* Remove some unused code on the assembler

* Fix REX.W prefix regression on float conversion instructions, add some sort of profiler

* Add hardware capability detection

* Fix regression on Sha1h and revert Fcm** changes

* Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator

* Fix silly mistake introduced on last commit on CpuId

* Generate inline stack probes when the stack allocation is too large

* Initial support for the System-V ABI

* Support multiple destination operands

* Fix SSE2 VectorInsert8 path, and other fixes

* Change placement of XMM callee save and restore code to match other compilers

* Rename Dest to Destination and Inst to Instruction

* Fix a regression related to calls and the V128 type

* Add an extra space on comments to match code style

* Some refactoring

* Fix vector insert FP32 SSE2 path

* Port over the ARM32 instructions

* Avoid memory protection races on JIT Cache

* Another fix on VectorInsert FP32 (thanks to LDj3SNuD

* Float operands don't need to use the same register when VEX is supported

* Add a new register allocator, higher quality code for hot code (tier up), and other tweaks

* Some nits, small improvements on the pre allocator

* CpuThreadState is gone

* Allow changing CPU emulators with a config entry

* Add runtime identifiers on the ARMeilleure project

* Allow switching between CPUs through a config entry (pt. 2)

* Change win10-x64 to win-x64 on projects

* Update the Ryujinx project to use ARMeilleure

* Ensure that the selected register is valid on the hybrid allocator

* Allow exiting on returns to 0 (should fix test regression)

* Remove register assignments for most used variables on the hybrid allocator

* Do not use fixed registers as spill temp

* Add missing namespace and remove unneeded using

* Address PR feedback

* Fix types, etc

* Enable AssumeStrictAbiCompliance by default

* Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 21:56:22 +03:00
LDj3SNuD e5b88de22a Add Saddlv_V Inst. Improve Cnt_V, Dup_Gp & Ins_Gp Tests. Tuneup Cls_V & Clz_V Tests. (#720)
* Update PackageReferences.

* Improve Cnt_V Test. Tuneup Cls_V & Clz_V Tests.

Nit.

* Nit.

* Improve Dup_Gp & Ins_Gp Tests.

* Update for Saddlv_V Inst.

* Update for Saddlv_V Inst.

* Update for Saddlv_V Inst.
2019-07-08 11:55:37 -03:00
LDj3SNuD 10c74182ba Implement the remaining tests for Simd and Fp instructions of data processing type. Small opts. for Fmov_Ftoi/1 & Fmov_Itof/1 Insts. (#709)
* Update CpuTestSimdShImm.cs

* Update OpCodeTable.cs

* Update CpuTestSimdReg.cs

* Add Ins_Gp & Ins_V Tests.

Improve Smov_S & Umov_S Tests.

* Add Bic_Vi & Orr_Vi Tests.

* OpTable Fixes for Bic_Vi & Orr_Vi Insts.

* Add Saddlv_V & Uaddlv_V Tests.

* Nit.

* Add Smull_V & Umull_V Tests.

Improve Simd Permute Tests.

* Nit.

* Add Fcsel_S Test.

* Add Fnmadd_S, Fnmsub_S & Fnmul_S Tests.

* Fmov_V -> Fmov_Vi

* OpTable Fixes for Fmov_Si & Fmov_Vi Insts.

* Add Fmov_Vi Test.

* Add Fmov_S Test.

* Add Fmov_Si Test.

Add new test category SimdFmov.

* Nit.

* OpTable Fixes for Fmov_Ftoi/1 & Fmov_Itof/1 Insts.

* Small opts. for Fmov_Ftoi/1 & Fmov_Itof/1 Insts.

Small simpl. for Smov_S Inst.
Remove unnecessary method EmitIntZeroUpperIfNeeded.

* Add Fmov_Ftoi/1 & Fmov_Itof/1 Tests.
2019-06-29 20:02:48 -03:00
LDj3SNuD d87c5375f1 Implement a custom value generator for the Tests of the CLS and CLZ instructions (Base: 32, 64 bits. Simd: 8, 16, 32 bits). (#696)
* Update CpuTestAlu.cs

* Update CpuTestSimd.cs

* Update CpuTestMov.cs
2019-06-12 09:03:31 -03:00
LDj3SNuD ffbfbb5549 Add FCVT <Hd>, <Sn> and FCVT <Sd>, <Hn> Inst.; add Tests. (#692)
* Update OpCodeTable.cs

* Update InstEmitSimdCvt.cs

* Update CpuTestSimd.cs

* Address PR feedback.
2019-05-30 19:51:39 -03:00
LDj3SNuD 51ea6fa583 Add Smaxv_V, Sminv_V, Umaxv_V, Uminv_V Inst.; add Tests. (#691)
* Update InstEmitSimdHelper.cs

* Update InstEmitSimdArithmetic.cs

* Update OpCodeTable.cs

* Update CpuTestSimd.cs
2019-05-29 21:29:24 -03:00
LDj3SNuD 16de171c44 Sse optimized the Scalar & Vector fp-to-fp conversion instructions (MNPZ & IX); added the related Tests (AMNPZ & IX). Small refactoring of existing instructions. (#676)
* Nit.

* Update InstEmitSimdCvt.cs

* Update VectorHelper.cs

* Update InstEmitSimdArithmetic.cs

* Update CpuTestSimd.cs

* Superseded.
2019-04-26 08:58:29 +10:00
LDj3SNuD 74da8785a5 Sse optimized the 32-bit Vector & Scalar integer-to-fp conversion instructions (signed & unsigned); added the related Gp & V_Fixed Tests (signed & unsigned). (#662)
* Update CpuTestSimdCvt.cs

* Update CpuTestSimd.cs

* Update CpuTestSimdShImm.cs

* Update InstEmitSimdCvt.cs

* Update OpCodeTable.cs

* Update InstEmitSimdCvt.cs
2019-04-20 23:07:35 -03:00
LDj3SNuD 233fc95e1e Sse optimized the Vector & Scalar fp-to-integer conversion instructions (unsigned); improved the related Tests. (#656)
* Update InstEmitSimdCvt.cs

* Update CpuTestSimdCvt.cs

* Update CpuTestSimd.cs

* Update CpuTestSimdShImm.cs

* Update InstEmitSimdCvt.cs
2019-04-12 13:14:16 -03:00
LDj3SNuD febc2ad6f4 Sse optimized all the fp to integer conversion instructions (signed) with Tests (signed & unsigned). (#655)
* Update CpuTestSimdCvt.cs

* Update CpuTestSimd.cs

* Update CpuTestSimdShImm.cs

* Update InstEmitSimdCvt.cs

* Update InstEmitSimdMove.cs

* Update InstEmitSimdCmp.cs

* Update VectorHelper.cs

* Update InstEmitSimdHelper.cs

* Update OpCodeTable.cs

* Update InstEmitSimdCvt.cs

* Update InstEmitSimdHelper.cs

* Update InstEmitSimdMove.cs
2019-04-03 09:21:22 -03:00
LDj3SNuD c106ae9944 Add Tbl_V Sse opt. with Tests. (#651)
* Add v4, v5, v30, v31 required for Tbl_V Tests.

* Add Tests for Tbl_V.

* Add Tbl_V Sse opt..

* Nit.

* Small opt. on comparison constant vector.

* Nit.

* Add EmitLd/Stvectmp2/3.

* Nit.
2019-03-23 15:50:19 -03:00
LDj3SNuD 1bef70c068 Add Rshrn_V & Shrn_V Sse opt.. Add Mla_V, Mls_V & Mul_V Sse opt.; add Tests. (#614)
* Update CountLeadingZeros().

* Remove obsolete Tests.

* Follow-up.

* Follow-up.

* Follow-up.

* Add Mla_V, Mls_V & Mul_V Tests.

* Update PackageReferences.

* Remove EmitLd/Stvectmp2().

* Remove Dup. Nits.

* Remove EmitLd/Stvectmp2() & Dup; nits.

* Remove Tmp stuff & Dup; rework Fcvtz() as Fcvtn().

* Remove Tmp stuff, EmitLd/Stvectmp2() & Dup. Nits.

* Add (R)shrn_V Sse opt.; add "Part" & "Shift" opt..

Remove Tmp stuff; remove Dup.
Nits.

* Add Mla/Mls/Mul_V Sse opt.. Add "Part" opt..

Remove EmitLd/Stvectmp2(), remove Dup.
Nits.

* Nits.

* Nits.

* Nit.

* Add "Part" opt.. Nit.

* Nit.

* Nit.

* Add Cmhi_V & Cmhs_V Sse opt..
2019-03-13 19:23:52 +11:00
LDj3SNuD dbc105eafb Create CpuTestSimdImm.cs (#608) 2019-03-01 20:12:09 +11:00
LDj3SNuD a3d46e4133 Add Tests for instructions Fcvtzs_Gp_Fixed & Fcvtzu_Gp_Fixed, Scvtf_Gp_Fixed & Ucvtf_Gp_Fixed. (#603)
* Create CpuTestSimdCvt.cs

* Update CpuTestMisc.cs

* Update CpuTestSimdCvt.cs
2019-02-23 20:53:27 -03:00
LDj3SNuD 1b4809bde1
Update CpuTestMisc.cs 2019-02-18 00:52:01 +01:00
gdkchan a694420d11
Implement speculative translation on the CPU (#515)
* Implement speculative translation on the cpu, and change the way how branches to unknown or untranslated addresses works

* Port t0opt changes and other cleanups

* Change namespace from translation related classes to ChocolArm64.Translation, other minor tweaks

* Fix typo

* Translate higher quality code for indirect jumps aswell, and on some cases that were missed when lower quality (tier 0) code was available

* Remove debug print

* Remove direct argument passing optimization, and enable tail calls for BR instructions

* Call delegates directly with Callvirt rather than calling Execute, do not emit calls for tier 0 code

* Remove unused property

* Rename argument on ArmSubroutine delegate
2019-02-04 18:26:05 -03:00
LDj3SNuD 8f7fcede7f Add Smlal_Ve, Smlsl_Ve, Smull_Ve, Umlal_Ve, Umlsl_Ve, Umull_Ve Inst.; add Tests. Add Sse Opt. for Trn1/2_V and Uzp1/2_V Inst. Nits. (#566)
* Update OpCodeTable.cs

* Update InstEmitSimdArithmetic.cs

* Update InstEmitSimdHelper.cs

* Update CpuTestSimdRegElem.cs

* Update InstEmitSimdMove.cs

* Update InstEmitSimdCvt.cs

* Update SoftFallback.cs

* Update InstEmitSimdHelper.cs

* Update SoftFloat.cs

* Update CryptoHelper.cs

* Update InstEmitSimdArithmetic.cs

* Update InstEmitSimdCmp.cs

* Address PR feedback.

* Address PR feedback.
2019-01-29 10:54:39 -03:00
LDj3SNuD 0f5b6dfbe8 Fix Frecpe_S/V and Frsqrte_S/V (full FP emu.). Add Sse Opt. & SoftFloat Impl. for Fcmeq/ge/gt/le/lt_S/V (Reg & Zero), Faddp_S/V, Fmaxp_V, Fminp_V Inst.; add Sse Opt. for Shll_V, S/Ushll_V Inst.; improve Sse Opt. for Xtn_V Inst.. Add Tests. (#543)
* Update Optimizations.cs

* Update InstEmitSimdShift.cs

* Update InstEmitSimdHelper.cs

* Update InstEmitSimdArithmetic.cs

* Update InstEmitSimdMove.cs

* Update SoftFloat.cs

* Update InstEmitSimdCmp.cs

* Update CpuTestSimdShImm.cs

* Update CpuTestSimd.cs

* Update CpuTestSimdReg.cs

* Nit.

* Update SoftFloat.cs

* Update InstEmitSimdArithmetic.cs

* Update InstEmitSimdHelper.cs

* Update CpuTestSimd.cs

* Explicit some implicit casts.

* Simplify some powers; nits.

* Update OpCodeTable.cs

* Update InstEmitSimdArithmetic.cs

* Update CpuTestSimdReg.cs

* Update InstEmitSimdArithmetic.cs
2018-12-26 15:11:36 -02:00