* Add EntryTable<TEntry>
* Add on translation call counting
* Add Counter
* Add PPTC support
* Make Counter a generic & use a 32-bit counter instead
* Return false on overflow
* Set PPTC version
* Print more information about the rejit queue
* Make Counter<T> disposable
* Remove Block.TailCall since it is not used anymore
* Apply suggestions from code review
Address gdkchan's feedback
Co-authored-by: gdkchan <gab.dark.100@gmail.com>
* Fix more stale docs
* Remove rejit requests queue logging
* Make Counter<T> finalizable
Most certainly quite an odd use case.
* Make EntryTable<T>.TryAllocate set entry to default
* Re-trigger CI
* Dispose Counters before they hit the finalizer queue
* Re-trigger CI
Just for good measure...
* Make EntryTable<T> expandable
* EntryTable is now expandable instead of being a fixed slab.
* Remove EntryTable<T>.TryAllocate
* Remove Counter<T>.TryCreate
Address LDj3SNuD's feedback
* Apply suggestions from code review
Address LDj3SNuD's feedback
Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>
* Remove useless return
* POH approach, but the sequel
* Revert "POH approach, but the sequel"
This reverts commit 5f5abaa247.
The sequel got shelved
* Add extra documentation
Co-authored-by: gdkchan <gab.dark.100@gmail.com>
Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>
* Implement VCNT based on AArch64 CNT
Add tests
* Update PTC version
* Address LDj's comments
* Explicit size in encoding
* Tighter tests
* Replace SoftFallback with IR helper
Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>
* Reduce one BitwiseAnd from IR fallback
Based on popcount64b from https://en.wikipedia.org/wiki/Hamming_weight#Efficient_implementation
* Rename parameter and add assert
Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>
Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>
* Add Pmull_V Sse fast path only, both "8/16B -> 8H" and "1/2D -> 1Q" variants; with Test.
* Add Clmul fast path for the 128 bits variant.
* Small optimisation (save 60 instructions) for the Sse fast path about the 128 bits variant.
* Add slow path, both variants. Fix V128 Shl/Shr when shift = 0.
* A32: Add Vmull_I P64 variant (slow path); not tested.
* A32: Add Vmull_I_P8_P64 Test and fix P64 variant.