Compare commits

...

7 Commits

Author SHA1 Message Date
TSRBerry
e187a8870a HLE: Deal with empty title names properly (#4643)
* hle: Deal with empty titleNames in some languages

* gui: Fix displaying the wrong title name

* Remove unnecessary bounds check

* Fix a NRE when getting the version string

* Restore empty string logic
2023-04-12 01:09:47 +00:00
riperiperi
a64fee29dc Vulkan: add situational "Fast Flush" mode (#4667)
* Flush in the middle of long command buffers.

* Vulkan: add situational "Fast Flush" mode

The AutoFlushCounter class was added to periodically flush Vulkan command buffers throughout a frame, which reduces latency to the GPU as commands are submitted and processed much sooner. This was done by allowing command buffers to flush when framebuffer attachments changed.

However, some games have incredibly long render passes with a large number of draws, and really aggressive data access that forces GPU sync.

The Vulkan backend could potentially end up building a single command buffer for 4-5ms if a pass has enough draws, such as in BOTW. In the scenario where sync is waited on immediately after submission, this would have to wait for the completion of a much longer command buffer than usual.

The solution is to force command buffer submission periodically in a "fast flush" mode. This will end up splitting render passes, but it will only enable if sync is aggressive enough.

This should improve performance in GPU limited scenarios, or in games that aggressively wait on synchronization. In some games, it may only kick in when res scaling. It won't trigger in games like SMO where sync is not an issue.

Improves performance in Pokemon Scarlet/Violet (res scaled) and BOTW (in general).

* Add conversions in milliseconds next to flush timers.
2023-04-11 09:23:41 +02:00
riperiperi
9ef94c8292 ARMeilleure: Move TPIDR_EL0 and TPIDRRO_EL0 to NativeContext (#4661)
* ARMeilleure: Move TPIDR_EL0 and TPIDRRO_EL0 to NativeContext

Some games access these system registers several tens of thousands of times in a second from many different threads. While this isn't really crippling, it is a lot of wasted time spent in a reverse pinvoke transition.

Example games are Pokemon Scarlet/Violet and BOTW. These games have a lot of different potential bottlenecks so it's unlikely you will see a consistent improvement, but it definitely disappears from the cpu profile.

* Remove unreachable code.

* Add ulong conversion for offsets

* Nit
2023-04-11 08:55:04 +02:00
riperiperi
915d6d044c OpenGL: Fix OBS/Overlays again by binding FB before present (#4668)
This seems to have been removed by the Post-Processing PR, but it is required for the display in OBS to be the right way up and properly scaled.

I've tested this with AA and FSR on MK8D and it seems to behave properly. Testing is welcome.
2023-04-11 08:32:31 +02:00
MutantAura
a4780ab33b Force activate parent window before dialog is shown (#4663) 2023-04-11 00:04:31 +02:00
TSRBerry
a947a45d81 gtk: Fix a NRE when disposing OpenGL (#4648) 2023-04-10 17:00:23 +02:00
riperiperi
9db73f74cf ARMeilleure: Respect FZ/RM flags for all floating point operations (#4618)
* ARMeilleure: Respect Fz flag for all floating point operations.

This is a change in strategy for emulating the Fz FPCR flag. Before, it was set before instructions that "needed it" and reset after. However, this missed a few hot instructions like the multiplication instruction, and the entirety of A32.

The new strategy is to set the Fz flag only in the following circumstances:

- Set to match FPCR before translated functions/loop are executed.
- Reset when calling SoftFloat methods, set when returning.
- Reset when exiting execution.

This allows us to remove the code around the existing Fz aware instructions, and get the accuracy benefits on all floating point instructions executed while in translated code.

Single step executions now need to be called with a context wrapper - right now it just contains the Fz flag initialization, and won't actually do anything on ARM.

This fixes a bug in Breath of the Wild where some physics interactions could randomly crash the game due to subnormal values not flushing to zero.

This is draft right now because I need to answer the questions:
- Does dotnet avoid changing the value of Mxcsr?
- Is it a good idea to assume that? Or should the flag set/restore be done on every managed method call, not just softfloat?
- If we assume that, do we want a unit test to verify the behaviour?

I recommend testing a bunch of games, especially games affected when this was originally added, such as #1611.

* Remove unused method

* Use FMA for Fmadd, Fmsub, Fnmadd, Fnmsub, Fmla, Fmls

...when available.

Similar implementation to A32

* Use FMA for Frecps, Frsqrts

* Don't set DAZ.

* Add round mode to ARM FP mode

* Fix mistakes

* Add test for FP state when calling managed methods

* Add explanatory comment to test.

* Cleanup

* Add A64 FPCR flags

* Vrintx_S A32 fast path on A64 backend

* Address feedback 1, re-enable DAZ

* Fix FMA instructions By Elem

* Address feedback
2023-04-10 12:22:58 +02:00
36 changed files with 1046 additions and 331 deletions

View File

@@ -226,6 +226,8 @@ namespace ARMeilleure.CodeGen.Arm64
Add(Intrinsic.Arm64MlsVe, new IntrinsicInfo(0x2f004000u, IntrinsicType.VectorTernaryRdByElem));
Add(Intrinsic.Arm64MlsV, new IntrinsicInfo(0x2e209400u, IntrinsicType.VectorTernaryRd));
Add(Intrinsic.Arm64MoviV, new IntrinsicInfo(0x0f000400u, IntrinsicType.VectorMovi));
Add(Intrinsic.Arm64MrsFpcr, new IntrinsicInfo(0xd53b4400u, IntrinsicType.GetRegister));
Add(Intrinsic.Arm64MsrFpcr, new IntrinsicInfo(0xd51b4400u, IntrinsicType.SetRegister));
Add(Intrinsic.Arm64MrsFpsr, new IntrinsicInfo(0xd53b4420u, IntrinsicType.GetRegister));
Add(Intrinsic.Arm64MsrFpsr, new IntrinsicInfo(0xd51b4420u, IntrinsicType.SetRegister));
Add(Intrinsic.Arm64MulVe, new IntrinsicInfo(0x0f008000u, IntrinsicType.VectorBinaryByElem));

View File

@@ -268,11 +268,13 @@ namespace ARMeilleure.CodeGen.X86
Add(X86Instruction.Vblendvps, new InstructionInfo(BadOp, BadOp, BadOp, BadOp, 0x000f3a4a, InstructionFlags.Vex | InstructionFlags.Prefix66));
Add(X86Instruction.Vcvtph2ps, new InstructionInfo(BadOp, BadOp, BadOp, BadOp, 0x000f3813, InstructionFlags.Vex | InstructionFlags.Prefix66));
Add(X86Instruction.Vcvtps2ph, new InstructionInfo(0x000f3a1d, BadOp, BadOp, BadOp, BadOp, InstructionFlags.Vex | InstructionFlags.Prefix66));
Add(X86Instruction.Vfmadd231pd, new InstructionInfo(BadOp, BadOp, BadOp, BadOp, 0x000f38b8, InstructionFlags.Vex | InstructionFlags.Prefix66 | InstructionFlags.RexW));
Add(X86Instruction.Vfmadd231ps, new InstructionInfo(BadOp, BadOp, BadOp, BadOp, 0x000f38b8, InstructionFlags.Vex | InstructionFlags.Prefix66));
Add(X86Instruction.Vfmadd231sd, new InstructionInfo(BadOp, BadOp, BadOp, BadOp, 0x000f38b9, InstructionFlags.Vex | InstructionFlags.Prefix66 | InstructionFlags.RexW));
Add(X86Instruction.Vfmadd231ss, new InstructionInfo(BadOp, BadOp, BadOp, BadOp, 0x000f38b9, InstructionFlags.Vex | InstructionFlags.Prefix66));
Add(X86Instruction.Vfmsub231sd, new InstructionInfo(BadOp, BadOp, BadOp, BadOp, 0x000f38bb, InstructionFlags.Vex | InstructionFlags.Prefix66 | InstructionFlags.RexW));
Add(X86Instruction.Vfmsub231ss, new InstructionInfo(BadOp, BadOp, BadOp, BadOp, 0x000f38bb, InstructionFlags.Vex | InstructionFlags.Prefix66));
Add(X86Instruction.Vfnmadd231pd, new InstructionInfo(BadOp, BadOp, BadOp, BadOp, 0x000f38bc, InstructionFlags.Vex | InstructionFlags.Prefix66 | InstructionFlags.RexW));
Add(X86Instruction.Vfnmadd231ps, new InstructionInfo(BadOp, BadOp, BadOp, BadOp, 0x000f38bc, InstructionFlags.Vex | InstructionFlags.Prefix66));
Add(X86Instruction.Vfnmadd231sd, new InstructionInfo(BadOp, BadOp, BadOp, BadOp, 0x000f38bd, InstructionFlags.Vex | InstructionFlags.Prefix66 | InstructionFlags.RexW));
Add(X86Instruction.Vfnmadd231ss, new InstructionInfo(BadOp, BadOp, BadOp, BadOp, 0x000f38bd, InstructionFlags.Vex | InstructionFlags.Prefix66));

View File

@@ -249,10 +249,9 @@ namespace ARMeilleure.CodeGen.X86
case IntrinsicType.Mxcsr:
{
Operand offset = operation.GetSource(0);
Operand bits = operation.GetSource(1);
Debug.Assert(offset.Kind == OperandKind.Constant && bits.Kind == OperandKind.Constant);
Debug.Assert(offset.Type == OperandType.I32 && bits.Type == OperandType.I32);
Debug.Assert(offset.Kind == OperandKind.Constant);
Debug.Assert(offset.Type == OperandType.I32);
int offs = offset.AsInt32() + context.CallArgsRegionSize;
@@ -261,20 +260,22 @@ namespace ARMeilleure.CodeGen.X86
Debug.Assert(HardwareCapabilities.SupportsSse || HardwareCapabilities.SupportsVexEncoding);
context.Assembler.Stmxcsr(memOp);
if (operation.Intrinsic == Intrinsic.X86Mxcsrmb)
if (operation.Intrinsic == Intrinsic.X86Ldmxcsr)
{
context.Assembler.Or(memOp, bits, OperandType.I32);
}
else /* if (intrinOp.Intrinsic == Intrinsic.X86Mxcsrub) */
{
Operand notBits = Const(~bits.AsInt32());
context.Assembler.And(memOp, notBits, OperandType.I32);
}
Operand bits = operation.GetSource(1);
Debug.Assert(bits.Type == OperandType.I32);
context.Assembler.Mov(memOp, bits, OperandType.I32);
context.Assembler.Ldmxcsr(memOp);
}
else if (operation.Intrinsic == Intrinsic.X86Stmxcsr)
{
Operand dest = operation.Destination;
Debug.Assert(dest.Type == OperandType.I32);
context.Assembler.Stmxcsr(memOp);
context.Assembler.Mov(dest, memOp, OperandType.I32);
}
break;
}

View File

@@ -60,6 +60,7 @@ namespace ARMeilleure.CodeGen.X86
Add(Intrinsic.X86Haddpd, new IntrinsicInfo(X86Instruction.Haddpd, IntrinsicType.Binary));
Add(Intrinsic.X86Haddps, new IntrinsicInfo(X86Instruction.Haddps, IntrinsicType.Binary));
Add(Intrinsic.X86Insertps, new IntrinsicInfo(X86Instruction.Insertps, IntrinsicType.TernaryImm));
Add(Intrinsic.X86Ldmxcsr, new IntrinsicInfo(X86Instruction.None, IntrinsicType.Mxcsr));
Add(Intrinsic.X86Maxpd, new IntrinsicInfo(X86Instruction.Maxpd, IntrinsicType.Binary));
Add(Intrinsic.X86Maxps, new IntrinsicInfo(X86Instruction.Maxps, IntrinsicType.Binary));
Add(Intrinsic.X86Maxsd, new IntrinsicInfo(X86Instruction.Maxsd, IntrinsicType.Binary));
@@ -75,8 +76,6 @@ namespace ARMeilleure.CodeGen.X86
Add(Intrinsic.X86Mulps, new IntrinsicInfo(X86Instruction.Mulps, IntrinsicType.Binary));
Add(Intrinsic.X86Mulsd, new IntrinsicInfo(X86Instruction.Mulsd, IntrinsicType.Binary));
Add(Intrinsic.X86Mulss, new IntrinsicInfo(X86Instruction.Mulss, IntrinsicType.Binary));
Add(Intrinsic.X86Mxcsrmb, new IntrinsicInfo(X86Instruction.None, IntrinsicType.Mxcsr)); // Mask bits.
Add(Intrinsic.X86Mxcsrub, new IntrinsicInfo(X86Instruction.None, IntrinsicType.Mxcsr)); // Unmask bits.
Add(Intrinsic.X86Paddb, new IntrinsicInfo(X86Instruction.Paddb, IntrinsicType.Binary));
Add(Intrinsic.X86Paddd, new IntrinsicInfo(X86Instruction.Paddd, IntrinsicType.Binary));
Add(Intrinsic.X86Paddq, new IntrinsicInfo(X86Instruction.Paddq, IntrinsicType.Binary));
@@ -160,6 +159,7 @@ namespace ARMeilleure.CodeGen.X86
Add(Intrinsic.X86Sqrtps, new IntrinsicInfo(X86Instruction.Sqrtps, IntrinsicType.Unary));
Add(Intrinsic.X86Sqrtsd, new IntrinsicInfo(X86Instruction.Sqrtsd, IntrinsicType.Unary));
Add(Intrinsic.X86Sqrtss, new IntrinsicInfo(X86Instruction.Sqrtss, IntrinsicType.Unary));
Add(Intrinsic.X86Stmxcsr, new IntrinsicInfo(X86Instruction.None, IntrinsicType.Mxcsr));
Add(Intrinsic.X86Subpd, new IntrinsicInfo(X86Instruction.Subpd, IntrinsicType.Binary));
Add(Intrinsic.X86Subps, new IntrinsicInfo(X86Instruction.Subps, IntrinsicType.Binary));
Add(Intrinsic.X86Subsd, new IntrinsicInfo(X86Instruction.Subsd, IntrinsicType.Binary));
@@ -170,11 +170,13 @@ namespace ARMeilleure.CodeGen.X86
Add(Intrinsic.X86Unpcklps, new IntrinsicInfo(X86Instruction.Unpcklps, IntrinsicType.Binary));
Add(Intrinsic.X86Vcvtph2ps, new IntrinsicInfo(X86Instruction.Vcvtph2ps, IntrinsicType.Unary));
Add(Intrinsic.X86Vcvtps2ph, new IntrinsicInfo(X86Instruction.Vcvtps2ph, IntrinsicType.BinaryImm));
Add(Intrinsic.X86Vfmadd231pd, new IntrinsicInfo(X86Instruction.Vfmadd231pd, IntrinsicType.Fma));
Add(Intrinsic.X86Vfmadd231ps, new IntrinsicInfo(X86Instruction.Vfmadd231ps, IntrinsicType.Fma));
Add(Intrinsic.X86Vfmadd231sd, new IntrinsicInfo(X86Instruction.Vfmadd231sd, IntrinsicType.Fma));
Add(Intrinsic.X86Vfmadd231ss, new IntrinsicInfo(X86Instruction.Vfmadd231ss, IntrinsicType.Fma));
Add(Intrinsic.X86Vfmsub231sd, new IntrinsicInfo(X86Instruction.Vfmsub231sd, IntrinsicType.Fma));
Add(Intrinsic.X86Vfmsub231ss, new IntrinsicInfo(X86Instruction.Vfmsub231ss, IntrinsicType.Fma));
Add(Intrinsic.X86Vfnmadd231pd, new IntrinsicInfo(X86Instruction.Vfnmadd231pd, IntrinsicType.Fma));
Add(Intrinsic.X86Vfnmadd231ps, new IntrinsicInfo(X86Instruction.Vfnmadd231ps, IntrinsicType.Fma));
Add(Intrinsic.X86Vfnmadd231sd, new IntrinsicInfo(X86Instruction.Vfnmadd231sd, IntrinsicType.Fma));
Add(Intrinsic.X86Vfnmadd231ss, new IntrinsicInfo(X86Instruction.Vfnmadd231ss, IntrinsicType.Fma));

View File

@@ -0,0 +1,15 @@
using System;
namespace ARMeilleure.CodeGen.X86
{
[Flags]
enum Mxcsr
{
Ftz = 1 << 15, // Flush To Zero.
Rhi = 1 << 14, // Round Mode high bit.
Rlo = 1 << 13, // Round Mode low bit.
Um = 1 << 11, // Underflow Mask.
Dm = 1 << 8, // Denormal Mask.
Daz = 1 << 6 // Denormals Are Zero.
}
}

View File

@@ -120,12 +120,18 @@ namespace ARMeilleure.CodeGen.X86
break;
case Instruction.Extended:
if (node.Intrinsic == Intrinsic.X86Mxcsrmb || node.Intrinsic == Intrinsic.X86Mxcsrub)
if (node.Intrinsic == Intrinsic.X86Ldmxcsr)
{
int stackOffset = stackAlloc.Allocate(OperandType.I32);
node.SetSources(new Operand[] { Const(stackOffset), node.GetSource(0) });
}
else if (node.Intrinsic == Intrinsic.X86Stmxcsr)
{
int stackOffset = stackAlloc.Allocate(OperandType.I32);
node.SetSources(new Operand[] { Const(stackOffset) });
}
break;
}
}

View File

@@ -208,11 +208,13 @@ namespace ARMeilleure.CodeGen.X86
Vblendvps,
Vcvtph2ps,
Vcvtps2ph,
Vfmadd231pd,
Vfmadd231ps,
Vfmadd231sd,
Vfmadd231ss,
Vfmsub231sd,
Vfmsub231ss,
Vfnmadd231pd,
Vfnmadd231ps,
Vfnmadd231sd,
Vfnmadd231ss,

View File

@@ -614,8 +614,6 @@ namespace ARMeilleure.Instructions
EmitSse2VectorPairwiseOpF(context, (op1, op2) =>
{
return EmitSse41ProcessNaNsOpF(context, (op1, op2) =>
{
return EmitSseOrAvxHandleFzModeOpF(context, (op1, op2) =>
{
IOpCodeSimd op = (IOpCodeSimd)context.CurrOp;
@@ -623,7 +621,6 @@ namespace ARMeilleure.Instructions
return context.AddIntrinsic(addInst, op1, op2);
}, scalar: false, op1, op2);
}, scalar: false, op1, op2);
});
}
else
@@ -696,17 +693,33 @@ namespace ARMeilleure.Instructions
Operand n = GetVec(op.Rn);
Operand m = GetVec(op.Rm);
Operand res;
if (op.Size == 0)
{
Operand res = context.AddIntrinsic(Intrinsic.X86Mulss, n, m);
if (Optimizations.UseFma)
{
res = context.AddIntrinsic(Intrinsic.X86Vfmadd231ss, a, n, m);
}
else
{
res = context.AddIntrinsic(Intrinsic.X86Mulss, n, m);
res = context.AddIntrinsic(Intrinsic.X86Addss, a, res);
}
context.Copy(d, context.VectorZeroUpper96(res));
}
else /* if (op.Size == 1) */
{
Operand res = context.AddIntrinsic(Intrinsic.X86Mulsd, n, m);
if (Optimizations.UseFma)
{
res = context.AddIntrinsic(Intrinsic.X86Vfmadd231sd, a, n, m);
}
else
{
res = context.AddIntrinsic(Intrinsic.X86Mulsd, n, m);
res = context.AddIntrinsic(Intrinsic.X86Addsd, a, res);
}
context.Copy(d, context.VectorZeroUpper64(res));
}
@@ -729,11 +742,8 @@ namespace ARMeilleure.Instructions
else if (Optimizations.FastFP && Optimizations.UseSse41)
{
EmitSse41ProcessNaNsOpF(context, (op1, op2) =>
{
return EmitSseOrAvxHandleFzModeOpF(context, (op1, op2) =>
{
return EmitSse2VectorMaxMinOpF(context, op1, op2, isMax: true);
}, scalar: true, op1, op2);
}, scalar: true);
}
else
@@ -754,11 +764,8 @@ namespace ARMeilleure.Instructions
else if (Optimizations.FastFP && Optimizations.UseSse41)
{
EmitSse41ProcessNaNsOpF(context, (op1, op2) =>
{
return EmitSseOrAvxHandleFzModeOpF(context, (op1, op2) =>
{
return EmitSse2VectorMaxMinOpF(context, op1, op2, isMax: true);
}, scalar: false, op1, op2);
}, scalar: false);
}
else
@@ -885,12 +892,9 @@ namespace ARMeilleure.Instructions
EmitSse2VectorPairwiseOpF(context, (op1, op2) =>
{
return EmitSse41ProcessNaNsOpF(context, (op1, op2) =>
{
return EmitSseOrAvxHandleFzModeOpF(context, (op1, op2) =>
{
return EmitSse2VectorMaxMinOpF(context, op1, op2, isMax: true);
}, scalar: false, op1, op2);
}, scalar: false, op1, op2);
});
}
else
@@ -913,12 +917,9 @@ namespace ARMeilleure.Instructions
EmitSse2VectorAcrossVectorOpF(context, (op1, op2) =>
{
return EmitSse41ProcessNaNsOpF(context, (op1, op2) =>
{
return EmitSseOrAvxHandleFzModeOpF(context, (op1, op2) =>
{
return EmitSse2VectorMaxMinOpF(context, op1, op2, isMax: true);
}, scalar: false, op1, op2);
}, scalar: false, op1, op2);
});
}
else
@@ -939,11 +940,8 @@ namespace ARMeilleure.Instructions
else if (Optimizations.FastFP && Optimizations.UseSse41)
{
EmitSse41ProcessNaNsOpF(context, (op1, op2) =>
{
return EmitSseOrAvxHandleFzModeOpF(context, (op1, op2) =>
{
return EmitSse2VectorMaxMinOpF(context, op1, op2, isMax: false);
}, scalar: true, op1, op2);
}, scalar: true);
}
else
@@ -964,11 +962,8 @@ namespace ARMeilleure.Instructions
else if (Optimizations.FastFP && Optimizations.UseSse41)
{
EmitSse41ProcessNaNsOpF(context, (op1, op2) =>
{
return EmitSseOrAvxHandleFzModeOpF(context, (op1, op2) =>
{
return EmitSse2VectorMaxMinOpF(context, op1, op2, isMax: false);
}, scalar: false, op1, op2);
}, scalar: false);
}
else
@@ -1095,12 +1090,9 @@ namespace ARMeilleure.Instructions
EmitSse2VectorPairwiseOpF(context, (op1, op2) =>
{
return EmitSse41ProcessNaNsOpF(context, (op1, op2) =>
{
return EmitSseOrAvxHandleFzModeOpF(context, (op1, op2) =>
{
return EmitSse2VectorMaxMinOpF(context, op1, op2, isMax: false);
}, scalar: false, op1, op2);
}, scalar: false, op1, op2);
});
}
else
@@ -1123,12 +1115,9 @@ namespace ARMeilleure.Instructions
EmitSse2VectorAcrossVectorOpF(context, (op1, op2) =>
{
return EmitSse41ProcessNaNsOpF(context, (op1, op2) =>
{
return EmitSseOrAvxHandleFzModeOpF(context, (op1, op2) =>
{
return EmitSse2VectorMaxMinOpF(context, op1, op2, isMax: false);
}, scalar: false, op1, op2);
}, scalar: false, op1, op2);
});
}
else
@@ -1146,6 +1135,37 @@ namespace ARMeilleure.Instructions
{
InstEmitSimdHelperArm64.EmitScalarTernaryOpFRdByElem(context, Intrinsic.Arm64FmlaSe);
}
else if (Optimizations.UseFma)
{
OpCodeSimdRegElemF op = (OpCodeSimdRegElemF)context.CurrOp;
Operand d = GetVec(op.Rd);
Operand n = GetVec(op.Rn);
Operand m = GetVec(op.Rm);
int sizeF = op.Size & 1;
if (sizeF == 0)
{
int shuffleMask = op.Index | op.Index << 2 | op.Index << 4 | op.Index << 6;
Operand res = context.AddIntrinsic(Intrinsic.X86Shufps, m, m, Const(shuffleMask));
res = context.AddIntrinsic(Intrinsic.X86Vfmadd231ss, d, n, res);
context.Copy(d, context.VectorZeroUpper96(res));
}
else /* if (sizeF == 1) */
{
int shuffleMask = op.Index | op.Index << 1;
Operand res = context.AddIntrinsic(Intrinsic.X86Shufpd, m, m, Const(shuffleMask));
res = context.AddIntrinsic(Intrinsic.X86Vfmadd231sd, d, n, res);
context.Copy(d, context.VectorZeroUpper64(res));
}
}
else
{
EmitScalarTernaryOpByElemF(context, (op1, op2, op3) =>
@@ -1171,11 +1191,19 @@ namespace ARMeilleure.Instructions
int sizeF = op.Size & 1;
Operand res;
if (sizeF == 0)
{
Operand res = context.AddIntrinsic(Intrinsic.X86Mulps, n, m);
if (Optimizations.UseFma)
{
res = context.AddIntrinsic(Intrinsic.X86Vfmadd231ps, d, n, m);
}
else
{
res = context.AddIntrinsic(Intrinsic.X86Mulps, n, m);
res = context.AddIntrinsic(Intrinsic.X86Addps, d, res);
}
if (op.RegisterSize == RegisterSize.Simd64)
{
@@ -1186,9 +1214,15 @@ namespace ARMeilleure.Instructions
}
else /* if (sizeF == 1) */
{
Operand res = context.AddIntrinsic(Intrinsic.X86Mulpd, n, m);
if (Optimizations.UseFma)
{
res = context.AddIntrinsic(Intrinsic.X86Vfmadd231pd, d, n, m);
}
else
{
res = context.AddIntrinsic(Intrinsic.X86Mulpd, n, m);
res = context.AddIntrinsic(Intrinsic.X86Addpd, d, res);
}
context.Copy(d, res);
}
@@ -1224,8 +1258,15 @@ namespace ARMeilleure.Instructions
Operand res = context.AddIntrinsic(Intrinsic.X86Shufps, m, m, Const(shuffleMask));
if (Optimizations.UseFma)
{
res = context.AddIntrinsic(Intrinsic.X86Vfmadd231ps, d, n, res);
}
else
{
res = context.AddIntrinsic(Intrinsic.X86Mulps, n, res);
res = context.AddIntrinsic(Intrinsic.X86Addps, d, res);
}
if (op.RegisterSize == RegisterSize.Simd64)
{
@@ -1240,8 +1281,15 @@ namespace ARMeilleure.Instructions
Operand res = context.AddIntrinsic(Intrinsic.X86Shufpd, m, m, Const(shuffleMask));
if (Optimizations.UseFma)
{
res = context.AddIntrinsic(Intrinsic.X86Vfmadd231pd, d, n, res);
}
else
{
res = context.AddIntrinsic(Intrinsic.X86Mulpd, n, res);
res = context.AddIntrinsic(Intrinsic.X86Addpd, d, res);
}
context.Copy(d, res);
}
@@ -1261,6 +1309,37 @@ namespace ARMeilleure.Instructions
{
InstEmitSimdHelperArm64.EmitScalarTernaryOpFRdByElem(context, Intrinsic.Arm64FmlsSe);
}
else if (Optimizations.UseFma)
{
OpCodeSimdRegElemF op = (OpCodeSimdRegElemF)context.CurrOp;
Operand d = GetVec(op.Rd);
Operand n = GetVec(op.Rn);
Operand m = GetVec(op.Rm);
int sizeF = op.Size & 1;
if (sizeF == 0)
{
int shuffleMask = op.Index | op.Index << 2 | op.Index << 4 | op.Index << 6;
Operand res = context.AddIntrinsic(Intrinsic.X86Shufps, m, m, Const(shuffleMask));
res = context.AddIntrinsic(Intrinsic.X86Vfnmadd231ss, d, n, res);
context.Copy(d, context.VectorZeroUpper96(res));
}
else /* if (sizeF == 1) */
{
int shuffleMask = op.Index | op.Index << 1;
Operand res = context.AddIntrinsic(Intrinsic.X86Shufpd, m, m, Const(shuffleMask));
res = context.AddIntrinsic(Intrinsic.X86Vfnmadd231sd, d, n, res);
context.Copy(d, context.VectorZeroUpper64(res));
}
}
else
{
EmitScalarTernaryOpByElemF(context, (op1, op2, op3) =>
@@ -1286,11 +1365,19 @@ namespace ARMeilleure.Instructions
int sizeF = op.Size & 1;
Operand res;
if (sizeF == 0)
{
Operand res = context.AddIntrinsic(Intrinsic.X86Mulps, n, m);
if (Optimizations.UseFma)
{
res = context.AddIntrinsic(Intrinsic.X86Vfnmadd231ps, d, n, m);
}
else
{
res = context.AddIntrinsic(Intrinsic.X86Mulps, n, m);
res = context.AddIntrinsic(Intrinsic.X86Subps, d, res);
}
if (op.RegisterSize == RegisterSize.Simd64)
{
@@ -1301,9 +1388,15 @@ namespace ARMeilleure.Instructions
}
else /* if (sizeF == 1) */
{
Operand res = context.AddIntrinsic(Intrinsic.X86Mulpd, n, m);
if (Optimizations.UseFma)
{
res = context.AddIntrinsic(Intrinsic.X86Vfnmadd231pd, d, n, m);
}
else
{
res = context.AddIntrinsic(Intrinsic.X86Mulpd, n, m);
res = context.AddIntrinsic(Intrinsic.X86Subpd, d, res);
}
context.Copy(d, res);
}
@@ -1339,8 +1432,15 @@ namespace ARMeilleure.Instructions
Operand res = context.AddIntrinsic(Intrinsic.X86Shufps, m, m, Const(shuffleMask));
if (Optimizations.UseFma)
{
res = context.AddIntrinsic(Intrinsic.X86Vfnmadd231ps, d, n, res);
}
else
{
res = context.AddIntrinsic(Intrinsic.X86Mulps, n, res);
res = context.AddIntrinsic(Intrinsic.X86Subps, d, res);
}
if (op.RegisterSize == RegisterSize.Simd64)
{
@@ -1355,8 +1455,15 @@ namespace ARMeilleure.Instructions
Operand res = context.AddIntrinsic(Intrinsic.X86Shufpd, m, m, Const(shuffleMask));
if (Optimizations.UseFma)
{
res = context.AddIntrinsic(Intrinsic.X86Vfnmadd231pd, d, n, res);
}
else
{
res = context.AddIntrinsic(Intrinsic.X86Mulpd, n, res);
res = context.AddIntrinsic(Intrinsic.X86Subpd, d, res);
}
context.Copy(d, res);
}
@@ -1385,17 +1492,33 @@ namespace ARMeilleure.Instructions
Operand n = GetVec(op.Rn);
Operand m = GetVec(op.Rm);
Operand res;
if (op.Size == 0)
{
Operand res = context.AddIntrinsic(Intrinsic.X86Mulss, n, m);
if (Optimizations.UseFma)
{
res = context.AddIntrinsic(Intrinsic.X86Vfnmadd231ss, a, n, m);
}
else
{
res = context.AddIntrinsic(Intrinsic.X86Mulss, n, m);
res = context.AddIntrinsic(Intrinsic.X86Subss, a, res);
}
context.Copy(d, context.VectorZeroUpper96(res));
}
else /* if (op.Size == 1) */
{
Operand res = context.AddIntrinsic(Intrinsic.X86Mulsd, n, m);
if (Optimizations.UseFma)
{
res = context.AddIntrinsic(Intrinsic.X86Vfnmadd231sd, a, n, m);
}
else
{
res = context.AddIntrinsic(Intrinsic.X86Mulsd, n, m);
res = context.AddIntrinsic(Intrinsic.X86Subsd, a, res);
}
context.Copy(d, context.VectorZeroUpper64(res));
}
@@ -1669,25 +1792,39 @@ namespace ARMeilleure.Instructions
Operand n = GetVec(op.Rn);
Operand m = GetVec(op.Rm);
Operand res;
if (op.Size == 0)
{
if (Optimizations.UseFma)
{
res = context.AddIntrinsic(Intrinsic.X86Vfnmsub231ss, a, n, m);
}
else
{
Operand mask = X86GetScalar(context, -0f);
Operand aNeg = context.AddIntrinsic(Intrinsic.X86Xorps, mask, a);
Operand res = context.AddIntrinsic(Intrinsic.X86Mulss, n, m);
res = context.AddIntrinsic(Intrinsic.X86Mulss, n, m);
res = context.AddIntrinsic(Intrinsic.X86Subss, aNeg, res);
}
context.Copy(d, context.VectorZeroUpper96(res));
}
else /* if (op.Size == 1) */
{
if (Optimizations.UseFma)
{
res = context.AddIntrinsic(Intrinsic.X86Vfnmsub231sd, a, n, m);
}
else
{
Operand mask = X86GetScalar(context, -0d);
Operand aNeg = context.AddIntrinsic(Intrinsic.X86Xorpd, mask, a);
Operand res = context.AddIntrinsic(Intrinsic.X86Mulsd, n, m);
res = context.AddIntrinsic(Intrinsic.X86Mulsd, n, m);
res = context.AddIntrinsic(Intrinsic.X86Subsd, aNeg, res);
}
context.Copy(d, context.VectorZeroUpper64(res));
}
@@ -1716,25 +1853,39 @@ namespace ARMeilleure.Instructions
Operand n = GetVec(op.Rn);
Operand m = GetVec(op.Rm);
Operand res;
if (op.Size == 0)
{
if (Optimizations.UseFma)
{
res = context.AddIntrinsic(Intrinsic.X86Vfmsub231ss, a, n, m);
}
else
{
Operand mask = X86GetScalar(context, -0f);
Operand aNeg = context.AddIntrinsic(Intrinsic.X86Xorps, mask, a);
Operand res = context.AddIntrinsic(Intrinsic.X86Mulss, n, m);
res = context.AddIntrinsic(Intrinsic.X86Mulss, n, m);
res = context.AddIntrinsic(Intrinsic.X86Addss, aNeg, res);
}
context.Copy(d, context.VectorZeroUpper96(res));
}
else /* if (op.Size == 1) */
{
if (Optimizations.UseFma)
{
res = context.AddIntrinsic(Intrinsic.X86Vfmsub231sd, a, n, m);
}
else
{
Operand mask = X86GetScalar(context, -0d);
Operand aNeg = context.AddIntrinsic(Intrinsic.X86Xorpd, mask, a);
Operand res = context.AddIntrinsic(Intrinsic.X86Mulsd, n, m);
res = context.AddIntrinsic(Intrinsic.X86Mulsd, n, m);
res = context.AddIntrinsic(Intrinsic.X86Addsd, aNeg, res);
}
context.Copy(d, context.VectorZeroUpper64(res));
}
@@ -1830,13 +1981,22 @@ namespace ARMeilleure.Instructions
int sizeF = op.Size & 1;
Operand res;
if (sizeF == 0)
{
Operand mask = X86GetScalar(context, 2f);
Operand res = context.AddIntrinsic(Intrinsic.X86Mulss, n, m);
if (Optimizations.UseFma)
{
res = context.AddIntrinsic(Intrinsic.X86Vfnmadd231ss, mask, n, m);
}
else
{
res = context.AddIntrinsic(Intrinsic.X86Mulss, n, m);
res = context.AddIntrinsic(Intrinsic.X86Subss, mask, res);
}
res = EmitSse41RecipStepSelectOpF(context, n, m, res, mask, scalar: true, sizeF);
context.Copy(GetVec(op.Rd), context.VectorZeroUpper96(res));
@@ -1845,9 +2005,16 @@ namespace ARMeilleure.Instructions
{
Operand mask = X86GetScalar(context, 2d);
Operand res = context.AddIntrinsic(Intrinsic.X86Mulsd, n, m);
if (Optimizations.UseFma)
{
res = context.AddIntrinsic(Intrinsic.X86Vfnmadd231sd, mask, n, m);
}
else
{
res = context.AddIntrinsic(Intrinsic.X86Mulsd, n, m);
res = context.AddIntrinsic(Intrinsic.X86Subsd, mask, res);
}
res = EmitSse41RecipStepSelectOpF(context, n, m, res, mask, scalar: true, sizeF);
context.Copy(GetVec(op.Rd), context.VectorZeroUpper64(res));
@@ -1877,14 +2044,23 @@ namespace ARMeilleure.Instructions
int sizeF = op.Size & 1;
Operand res;
if (sizeF == 0)
{
Operand mask = X86GetAllElements(context, 2f);
Operand res = context.AddIntrinsic(Intrinsic.X86Mulps, n, m);
res = EmitSse41RecipStepSelectOpF(context, n, m, res, mask, scalar: false, sizeF);
if (Optimizations.UseFma)
{
res = context.AddIntrinsic(Intrinsic.X86Vfnmadd231ps, mask, n, m);
}
else
{
res = context.AddIntrinsic(Intrinsic.X86Mulps, n, m);
res = context.AddIntrinsic(Intrinsic.X86Subps, mask, res);
}
res = EmitSse41RecipStepSelectOpF(context, n, m, res, mask, scalar: false, sizeF);
if (op.RegisterSize == RegisterSize.Simd64)
{
@@ -1897,10 +2073,17 @@ namespace ARMeilleure.Instructions
{
Operand mask = X86GetAllElements(context, 2d);
Operand res = context.AddIntrinsic(Intrinsic.X86Mulpd, n, m);
res = EmitSse41RecipStepSelectOpF(context, n, m, res, mask, scalar: false, sizeF);
if (Optimizations.UseFma)
{
res = context.AddIntrinsic(Intrinsic.X86Vfnmadd231pd, mask, n, m);
}
else
{
res = context.AddIntrinsic(Intrinsic.X86Mulpd, n, m);
res = context.AddIntrinsic(Intrinsic.X86Subpd, mask, res);
}
res = EmitSse41RecipStepSelectOpF(context, n, m, res, mask, scalar: false, sizeF);
context.Copy(GetVec(op.Rd), res);
}
@@ -2113,21 +2296,33 @@ namespace ARMeilleure.Instructions
public static void Frintx_S(ArmEmitterContext context)
{
// TODO Arm64: Fast path. Should we set host FPCR?
if (Optimizations.UseAdvSimd)
{
InstEmitSimdHelperArm64.EmitScalarUnaryOpF(context, Intrinsic.Arm64FrintxS);
}
else
{
EmitScalarUnaryOpF(context, (op1) =>
{
return EmitRoundByRMode(context, op1);
});
}
}
public static void Frintx_V(ArmEmitterContext context)
{
// TODO Arm64: Fast path. Should we set host FPCR?
if (Optimizations.UseAdvSimd)
{
InstEmitSimdHelperArm64.EmitVectorUnaryOpF(context, Intrinsic.Arm64FrintxV);
}
else
{
EmitVectorUnaryOpF(context, (op1) =>
{
return EmitRoundByRMode(context, op1);
});
}
}
public static void Frintz_S(ArmEmitterContext context)
{
@@ -2237,15 +2432,24 @@ namespace ARMeilleure.Instructions
int sizeF = op.Size & 1;
Operand res;
if (sizeF == 0)
{
Operand maskHalf = X86GetScalar(context, 0.5f);
Operand maskThree = X86GetScalar(context, 3f);
Operand maskOneHalf = X86GetScalar(context, 1.5f);
Operand res = context.AddIntrinsic(Intrinsic.X86Mulss, n, m);
if (Optimizations.UseFma)
{
res = context.AddIntrinsic(Intrinsic.X86Vfnmadd231ss, maskThree, n, m);
}
else
{
res = context.AddIntrinsic(Intrinsic.X86Mulss, n, m);
res = context.AddIntrinsic(Intrinsic.X86Subss, maskThree, res);
}
res = context.AddIntrinsic(Intrinsic.X86Mulss, maskHalf, res);
res = EmitSse41RecipStepSelectOpF(context, n, m, res, maskOneHalf, scalar: true, sizeF);
@@ -2257,9 +2461,16 @@ namespace ARMeilleure.Instructions
Operand maskThree = X86GetScalar(context, 3d);
Operand maskOneHalf = X86GetScalar(context, 1.5d);
Operand res = context.AddIntrinsic(Intrinsic.X86Mulsd, n, m);
if (Optimizations.UseFma)
{
res = context.AddIntrinsic(Intrinsic.X86Vfnmadd231sd, maskThree, n, m);
}
else
{
res = context.AddIntrinsic(Intrinsic.X86Mulsd, n, m);
res = context.AddIntrinsic(Intrinsic.X86Subsd, maskThree, res);
}
res = context.AddIntrinsic(Intrinsic.X86Mulsd, maskHalf, res);
res = EmitSse41RecipStepSelectOpF(context, n, m, res, maskOneHalf, scalar: true, sizeF);
@@ -2290,15 +2501,24 @@ namespace ARMeilleure.Instructions
int sizeF = op.Size & 1;
Operand res;
if (sizeF == 0)
{
Operand maskHalf = X86GetAllElements(context, 0.5f);
Operand maskThree = X86GetAllElements(context, 3f);
Operand maskOneHalf = X86GetAllElements(context, 1.5f);
Operand res = context.AddIntrinsic(Intrinsic.X86Mulps, n, m);
if (Optimizations.UseFma)
{
res = context.AddIntrinsic(Intrinsic.X86Vfnmadd231ps, maskThree, n, m);
}
else
{
res = context.AddIntrinsic(Intrinsic.X86Mulps, n, m);
res = context.AddIntrinsic(Intrinsic.X86Subps, maskThree, res);
}
res = context.AddIntrinsic(Intrinsic.X86Mulps, maskHalf, res);
res = EmitSse41RecipStepSelectOpF(context, n, m, res, maskOneHalf, scalar: false, sizeF);
@@ -2315,9 +2535,16 @@ namespace ARMeilleure.Instructions
Operand maskThree = X86GetAllElements(context, 3d);
Operand maskOneHalf = X86GetAllElements(context, 1.5d);
Operand res = context.AddIntrinsic(Intrinsic.X86Mulpd, n, m);
if (Optimizations.UseFma)
{
res = context.AddIntrinsic(Intrinsic.X86Vfnmadd231pd, maskThree, n, m);
}
else
{
res = context.AddIntrinsic(Intrinsic.X86Mulpd, n, m);
res = context.AddIntrinsic(Intrinsic.X86Subpd, maskThree, res);
}
res = context.AddIntrinsic(Intrinsic.X86Mulpd, maskHalf, res);
res = EmitSse41RecipStepSelectOpF(context, n, m, res, maskOneHalf, scalar: false, sizeF);
@@ -4728,53 +4955,6 @@ namespace ARMeilleure.Instructions
}
}
public static Operand EmitSseOrAvxHandleFzModeOpF(
ArmEmitterContext context,
Func2I emit,
bool scalar,
Operand n = default,
Operand m = default)
{
Operand nCopy = n == default ? context.Copy(GetVec(((OpCodeSimdReg)context.CurrOp).Rn)) : n;
Operand mCopy = m == default ? context.Copy(GetVec(((OpCodeSimdReg)context.CurrOp).Rm)) : m;
EmitSseOrAvxEnterFtzAndDazModesOpF(context, out Operand isTrue);
Operand res = emit(nCopy, mCopy);
EmitSseOrAvxExitFtzAndDazModesOpF(context, isTrue);
if (n != default || m != default)
{
return res;
}
int sizeF = ((IOpCodeSimd)context.CurrOp).Size & 1;
if (sizeF == 0)
{
if (scalar)
{
res = context.VectorZeroUpper96(res);
}
else if (((OpCodeSimdReg)context.CurrOp).RegisterSize == RegisterSize.Simd64)
{
res = context.VectorZeroUpper64(res);
}
}
else /* if (sizeF == 1) */
{
if (scalar)
{
res = context.VectorZeroUpper64(res);
}
}
context.Copy(GetVec(((OpCodeSimdReg)context.CurrOp).Rd), res);
return default;
}
private static Operand EmitSse2VectorMaxMinOpF(ArmEmitterContext context, Operand n, Operand m, bool isMax)
{
IOpCodeSimd op = (IOpCodeSimd)context.CurrOp;
@@ -4833,11 +5013,8 @@ namespace ARMeilleure.Instructions
mCopy = context.AddIntrinsic(Intrinsic.X86Blendvps, mCopy, negInfMask, mMask);
Operand res = EmitSse41ProcessNaNsOpF(context, (op1, op2) =>
{
return EmitSseOrAvxHandleFzModeOpF(context, (op1, op2) =>
{
return EmitSse2VectorMaxMinOpF(context, op1, op2, isMax: isMaxNum);
}, scalar: scalar, op1, op2);
}, scalar: scalar, nCopy, mCopy);
if (n != default || m != default)
@@ -4871,11 +5048,8 @@ namespace ARMeilleure.Instructions
mCopy = context.AddIntrinsic(Intrinsic.X86Blendvpd, mCopy, negInfMask, mMask);
Operand res = EmitSse41ProcessNaNsOpF(context, (op1, op2) =>
{
return EmitSseOrAvxHandleFzModeOpF(context, (op1, op2) =>
{
return EmitSse2VectorMaxMinOpF(context, op1, op2, isMax: isMaxNum);
}, scalar: scalar, op1, op2);
}, scalar: scalar, nCopy, mCopy);
if (n != default || m != default)

View File

@@ -356,9 +356,11 @@ namespace ARMeilleure.Instructions
? typeof(SoftFloat64_16).GetMethod(nameof(SoftFloat64_16.FPConvert))
: typeof(SoftFloat32_16).GetMethod(nameof(SoftFloat32_16.FPConvert));
context.ExitArmFpMode();
context.StoreToContext();
Operand res = context.Call(method, src);
context.LoadFromContext();
context.EnterArmFpMode();
InsertScalar16(context, op.Vd, op.T, res);
}
@@ -372,9 +374,11 @@ namespace ARMeilleure.Instructions
? typeof(SoftFloat16_64).GetMethod(nameof(SoftFloat16_64.FPConvert))
: typeof(SoftFloat16_32).GetMethod(nameof(SoftFloat16_32.FPConvert));
context.ExitArmFpMode();
context.StoreToContext();
Operand res = context.Call(method, src);
context.LoadFromContext();
context.EnterArmFpMode();
InsertScalar(context, op.Vd, res);
}
@@ -541,12 +545,19 @@ namespace ARMeilleure.Instructions
// VRINTX (floating-point).
public static void Vrintx_S(ArmEmitterContext context)
{
if (Optimizations.UseAdvSimd)
{
InstEmitSimdHelper32Arm64.EmitScalarUnaryOpF32(context, Intrinsic.Arm64FrintxS);
}
else
{
EmitScalarUnaryOpF32(context, (op1) =>
{
return EmitRoundByRMode(context, op1);
});
}
}
private static Operand EmitFPConvert(ArmEmitterContext context, Operand value, OperandType type, bool signed)
{

View File

@@ -1,3 +1,4 @@
using ARMeilleure.CodeGen.X86;
using ARMeilleure.Decoders;
using ARMeilleure.IntermediateRepresentation;
using ARMeilleure.State;
@@ -158,6 +159,75 @@ namespace ARMeilleure.Instructions
};
#endregion
public static void EnterArmFpMode(EmitterContext context, Func<FPState, Operand> getFpFlag)
{
if (Optimizations.UseSse2)
{
Operand mxcsr = context.AddIntrinsicInt(Intrinsic.X86Stmxcsr);
Operand fzTrue = getFpFlag(FPState.FzFlag);
Operand r0True = getFpFlag(FPState.RMode0Flag);
Operand r1True = getFpFlag(FPState.RMode1Flag);
mxcsr = context.BitwiseAnd(mxcsr, Const(~(int)(Mxcsr.Ftz | Mxcsr.Daz | Mxcsr.Rhi | Mxcsr.Rlo)));
mxcsr = context.BitwiseOr(mxcsr, context.ConditionalSelect(fzTrue, Const((int)(Mxcsr.Ftz | Mxcsr.Daz | Mxcsr.Um | Mxcsr.Dm)), Const(0)));
// X86 round modes in order: nearest, negative, positive, zero
// ARM round modes in order: nearest, positive, negative, zero
// Read the bits backwards to correct this.
mxcsr = context.BitwiseOr(mxcsr, context.ConditionalSelect(r0True, Const((int)Mxcsr.Rhi), Const(0)));
mxcsr = context.BitwiseOr(mxcsr, context.ConditionalSelect(r1True, Const((int)Mxcsr.Rlo), Const(0)));
context.AddIntrinsicNoRet(Intrinsic.X86Ldmxcsr, mxcsr);
}
else if (Optimizations.UseAdvSimd)
{
Operand fpcr = context.AddIntrinsicInt(Intrinsic.Arm64MrsFpcr);
Operand fzTrue = getFpFlag(FPState.FzFlag);
Operand r0True = getFpFlag(FPState.RMode0Flag);
Operand r1True = getFpFlag(FPState.RMode1Flag);
fpcr = context.BitwiseAnd(fpcr, Const(~(int)(FPCR.Fz | FPCR.RMode0 | FPCR.RMode1)));
fpcr = context.BitwiseOr(fpcr, context.ConditionalSelect(fzTrue, Const((int)FPCR.Fz), Const(0)));
fpcr = context.BitwiseOr(fpcr, context.ConditionalSelect(r0True, Const((int)FPCR.RMode0), Const(0)));
fpcr = context.BitwiseOr(fpcr, context.ConditionalSelect(r1True, Const((int)FPCR.RMode1), Const(0)));
context.AddIntrinsicNoRet(Intrinsic.Arm64MsrFpcr, fpcr);
// TODO: Restore FPSR
}
}
public static void ExitArmFpMode(EmitterContext context, Action<FPState, Operand> setFpFlag)
{
if (Optimizations.UseSse2)
{
Operand mxcsr = context.AddIntrinsicInt(Intrinsic.X86Stmxcsr);
// Unset round mode (to nearest) and ftz.
mxcsr = context.BitwiseAnd(mxcsr, Const(~(int)(Mxcsr.Ftz | Mxcsr.Daz | Mxcsr.Rhi | Mxcsr.Rlo)));
context.AddIntrinsicNoRet(Intrinsic.X86Ldmxcsr, mxcsr);
// Status flags would be stored here if they were used.
}
else if (Optimizations.UseAdvSimd)
{
Operand fpcr = context.AddIntrinsicInt(Intrinsic.Arm64MrsFpcr);
// Unset round mode (to nearest) and fz.
fpcr = context.BitwiseAnd(fpcr, Const(~(int)(FPCR.Fz | FPCR.RMode0 | FPCR.RMode1)));
context.AddIntrinsicNoRet(Intrinsic.Arm64MsrFpcr, fpcr);
// TODO: Store FPSR
}
}
public static int GetImmShl(OpCodeSimdShImm op)
{
return op.Imm - (8 << op.Size);
@@ -465,9 +535,11 @@ namespace ARMeilleure.Instructions
? typeof(SoftFloat32).GetMethod(name)
: typeof(SoftFloat64).GetMethod(name);
context.ExitArmFpMode();
context.StoreToContext();
Operand res = context.Call(info, callArgs);
context.LoadFromContext();
context.EnterArmFpMode();
return res;
}
@@ -1358,39 +1430,6 @@ namespace ARMeilleure.Instructions
}
}
[Flags]
public enum Mxcsr
{
Ftz = 1 << 15, // Flush To Zero.
Um = 1 << 11, // Underflow Mask.
Dm = 1 << 8, // Denormal Mask.
Daz = 1 << 6 // Denormals Are Zero.
}
public static void EmitSseOrAvxEnterFtzAndDazModesOpF(ArmEmitterContext context, out Operand isTrue)
{
isTrue = GetFpFlag(FPState.FzFlag);
Operand lblTrue = Label();
context.BranchIfFalse(lblTrue, isTrue);
context.AddIntrinsicNoRet(Intrinsic.X86Mxcsrmb, Const((int)(Mxcsr.Ftz | Mxcsr.Um | Mxcsr.Dm | Mxcsr.Daz)));
context.MarkLabel(lblTrue);
}
public static void EmitSseOrAvxExitFtzAndDazModesOpF(ArmEmitterContext context, Operand isTrue = default)
{
isTrue = isTrue == default ? GetFpFlag(FPState.FzFlag) : isTrue;
Operand lblTrue = Label();
context.BranchIfFalse(lblTrue, isTrue);
context.AddIntrinsicNoRet(Intrinsic.X86Mxcsrub, Const((int)(Mxcsr.Ftz | Mxcsr.Daz)));
context.MarkLabel(lblTrue);
}
public enum CmpCondition
{
// Legacy Sse.

View File

@@ -1197,9 +1197,11 @@ namespace ARMeilleure.Instructions
Array.Resize(ref callArgs, callArgs.Length + 1);
callArgs[callArgs.Length - 1] = Const(1);
context.ExitArmFpMode();
context.StoreToContext();
Operand res = context.Call(info, callArgs);
context.LoadFromContext();
context.EnterArmFpMode();
return res;
}

View File

@@ -33,8 +33,8 @@ namespace ARMeilleure.Instructions
case 0b11_011_0100_0010_000: EmitGetNzcv(context); return;
case 0b11_011_0100_0100_000: EmitGetFpcr(context); return;
case 0b11_011_0100_0100_001: EmitGetFpsr(context); return;
case 0b11_011_1101_0000_010: info = typeof(NativeInterface).GetMethod(nameof(NativeInterface.GetTpidrEl0)); break;
case 0b11_011_1101_0000_011: info = typeof(NativeInterface).GetMethod(nameof(NativeInterface.GetTpidrroEl0)); break;
case 0b11_011_1101_0000_010: EmitGetTpidrEl0(context); return;
case 0b11_011_1101_0000_011: EmitGetTpidrroEl0(context); return;
case 0b11_011_1110_0000_000: info = typeof(NativeInterface).GetMethod(nameof(NativeInterface.GetCntfrqEl0)); break;
case 0b11_011_1110_0000_001: info = typeof(NativeInterface).GetMethod(nameof(NativeInterface.GetCntpctEl0)); break;
case 0b11_011_1110_0000_010: info = typeof(NativeInterface).GetMethod(nameof(NativeInterface.GetCntvctEl0)); break;
@@ -49,19 +49,15 @@ namespace ARMeilleure.Instructions
{
OpCodeSystem op = (OpCodeSystem)context.CurrOp;
MethodInfo info;
switch (GetPackedId(op))
{
case 0b11_011_0100_0010_000: EmitSetNzcv(context); return;
case 0b11_011_0100_0100_000: EmitSetFpcr(context); return;
case 0b11_011_0100_0100_001: EmitSetFpsr(context); return;
case 0b11_011_1101_0000_010: info = typeof(NativeInterface).GetMethod(nameof(NativeInterface.SetTpidrEl0)); break;
case 0b11_011_1101_0000_010: EmitSetTpidrEl0(context); return;
default: throw new NotImplementedException($"Unknown MSR 0x{op.RawOpCode:X8} at 0x{op.Address:X16}.");
}
context.Call(info, GetIntOrZR(context, op.Rt));
}
public static void Nop(ArmEmitterContext context)
@@ -165,6 +161,28 @@ namespace ARMeilleure.Instructions
SetIntOrZR(context, op.Rt, fpsr);
}
private static void EmitGetTpidrEl0(ArmEmitterContext context)
{
OpCodeSystem op = (OpCodeSystem)context.CurrOp;
Operand nativeContext = context.LoadArgument(OperandType.I64, 0);
Operand result = context.Load(OperandType.I64, context.Add(nativeContext, Const((ulong)NativeContext.GetTpidrEl0Offset())));
SetIntOrZR(context, op.Rt, result);
}
private static void EmitGetTpidrroEl0(ArmEmitterContext context)
{
OpCodeSystem op = (OpCodeSystem)context.CurrOp;
Operand nativeContext = context.LoadArgument(OperandType.I64, 0);
Operand result = context.Load(OperandType.I64, context.Add(nativeContext, Const((ulong)NativeContext.GetTpidrroEl0Offset())));
SetIntOrZR(context, op.Rt, result);
}
private static void EmitSetNzcv(ArmEmitterContext context)
{
OpCodeSystem op = (OpCodeSystem)context.CurrOp;
@@ -192,6 +210,8 @@ namespace ARMeilleure.Instructions
SetFpFlag(context, (FPState)flag, context.BitwiseAnd(context.ShiftRightUI(fpcr, Const(flag)), Const(1)));
}
}
context.UpdateArmFpMode();
}
private static void EmitSetFpsr(ArmEmitterContext context)
@@ -210,6 +230,19 @@ namespace ARMeilleure.Instructions
SetFpFlag(context, (FPState)flag, context.BitwiseAnd(context.ShiftRightUI(fpsr, Const(flag)), Const(1)));
}
}
context.UpdateArmFpMode();
}
private static void EmitSetTpidrEl0(ArmEmitterContext context)
{
OpCodeSystem op = (OpCodeSystem)context.CurrOp;
Operand value = GetIntOrZR(context, op.Rt);
Operand nativeContext = context.LoadArgument(OperandType.I64, 0);
context.Store(context.Add(nativeContext, Const((ulong)NativeContext.GetTpidrEl0Offset())), value);
}
}
}

View File

@@ -23,8 +23,6 @@ namespace ARMeilleure.Instructions
return;
}
MethodInfo info;
switch (op.CRn)
{
case 13: // Process and Thread Info.
@@ -36,14 +34,12 @@ namespace ARMeilleure.Instructions
switch (op.Opc2)
{
case 2:
info = typeof(NativeInterface).GetMethod(nameof(NativeInterface.SetTpidrEl032)); break;
EmitSetTpidrEl0(context); return;
default:
throw new NotImplementedException($"Unknown MRC Opc2 0x{op.Opc2:X} at 0x{op.Address:X} (0x{op.RawOpCode:X}).");
}
break;
case 7:
switch (op.CRm) // Cache and Memory barrier.
{
@@ -64,8 +60,6 @@ namespace ARMeilleure.Instructions
default:
throw new NotImplementedException($"Unknown MRC 0x{op.RawOpCode:X8} at 0x{op.Address:X16}.");
}
context.Call(info, GetIntA32(context, op.Rt));
}
public static void Mrc(ArmEmitterContext context)
@@ -79,7 +73,7 @@ namespace ARMeilleure.Instructions
return;
}
MethodInfo info;
Operand result;
switch (op.CRn)
{
@@ -92,10 +86,10 @@ namespace ARMeilleure.Instructions
switch (op.Opc2)
{
case 2:
info = typeof(NativeInterface).GetMethod(nameof(NativeInterface.GetTpidrEl032)); break;
result = EmitGetTpidrEl0(context); break;
case 3:
info = typeof(NativeInterface).GetMethod(nameof(NativeInterface.GetTpidr32)); break;
result = EmitGetTpidrroEl0(context); break;
default:
throw new NotImplementedException($"Unknown MRC Opc2 0x{op.Opc2:X} at 0x{op.Address:X} (0x{op.RawOpCode:X}).");
@@ -110,13 +104,13 @@ namespace ARMeilleure.Instructions
if (op.Rt == RegisterAlias.Aarch32Pc)
{
// Special behavior: copy NZCV flags into APSR.
EmitSetNzcv(context, context.Call(info));
EmitSetNzcv(context, result);
return;
}
else
{
SetIntA32(context, op.Rt, context.Call(info));
SetIntA32(context, op.Rt, result);
}
}
@@ -321,6 +315,37 @@ namespace ARMeilleure.Instructions
SetFpFlag(context, (FPState)flag, context.BitwiseAnd(context.ShiftRightUI(fpscr, Const(flag)), Const(1)));
}
}
context.UpdateArmFpMode();
}
private static Operand EmitGetTpidrEl0(ArmEmitterContext context)
{
OpCode32System op = (OpCode32System)context.CurrOp;
Operand nativeContext = context.LoadArgument(OperandType.I64, 0);
return context.Load(OperandType.I64, context.Add(nativeContext, Const((ulong)NativeContext.GetTpidrEl0Offset())));
}
private static Operand EmitGetTpidrroEl0(ArmEmitterContext context)
{
OpCode32System op = (OpCode32System)context.CurrOp;
Operand nativeContext = context.LoadArgument(OperandType.I64, 0);
return context.Load(OperandType.I64, context.Add(nativeContext, Const((ulong)NativeContext.GetTpidrroEl0Offset())));
}
private static void EmitSetTpidrEl0(ArmEmitterContext context)
{
OpCode32System op = (OpCode32System)context.CurrOp;
Operand value = GetIntA32(context, op.Rt);
Operand nativeContext = context.LoadArgument(OperandType.I64, 0);
context.Store(context.Add(nativeContext, Const((ulong)NativeContext.GetTpidrEl0Offset())), context.ZeroExtend32(OperandType.I64, value));
}
}
}

View File

@@ -72,26 +72,6 @@ namespace ARMeilleure.Instructions
return (ulong)GetContext().DczidEl0;
}
public static ulong GetTpidrEl0()
{
return (ulong)GetContext().TpidrEl0;
}
public static uint GetTpidrEl032()
{
return (uint)GetContext().TpidrEl0;
}
public static ulong GetTpidrroEl0()
{
return (ulong)GetContext().TpidrroEl0;
}
public static uint GetTpidr32()
{
return (uint)GetContext().TpidrroEl0;
}
public static ulong GetCntfrqEl0()
{
return GetContext().CntfrqEl0;
@@ -106,16 +86,6 @@ namespace ARMeilleure.Instructions
{
return GetContext().CntvctEl0;
}
public static void SetTpidrEl0(ulong value)
{
GetContext().TpidrEl0 = (long)value;
}
public static void SetTpidrEl032(uint value)
{
GetContext().TpidrEl0 = (long)value;
}
#endregion
#region "Read"

View File

@@ -53,6 +53,7 @@ namespace ARMeilleure.IntermediateRepresentation
X86Haddpd,
X86Haddps,
X86Insertps,
X86Ldmxcsr,
X86Maxpd,
X86Maxps,
X86Maxsd,
@@ -68,8 +69,6 @@ namespace ARMeilleure.IntermediateRepresentation
X86Mulps,
X86Mulsd,
X86Mulss,
X86Mxcsrmb,
X86Mxcsrub,
X86Paddb,
X86Paddd,
X86Paddq,
@@ -153,6 +152,7 @@ namespace ARMeilleure.IntermediateRepresentation
X86Sqrtps,
X86Sqrtsd,
X86Sqrtss,
X86Stmxcsr,
X86Subpd,
X86Subps,
X86Subsd,
@@ -163,11 +163,13 @@ namespace ARMeilleure.IntermediateRepresentation
X86Unpcklps,
X86Vcvtph2ps,
X86Vcvtps2ph,
X86Vfmadd231pd,
X86Vfmadd231ps,
X86Vfmadd231sd,
X86Vfmadd231ss,
X86Vfmsub231sd,
X86Vfmsub231ss,
X86Vfnmadd231pd,
X86Vfnmadd231ps,
X86Vfnmadd231sd,
X86Vfnmadd231ss,
@@ -394,6 +396,8 @@ namespace ARMeilleure.IntermediateRepresentation
Arm64MlsVe,
Arm64MlsV,
Arm64MoviV,
Arm64MrsFpcr,
Arm64MsrFpcr,
Arm64MrsFpsr,
Arm64MsrFpsr,
Arm64MulVe,

View File

@@ -27,8 +27,17 @@ namespace ARMeilleure.State
// Since EL2 isn't implemented, CNTVOFF_EL2 = 0
public ulong CntvctEl0 => CntpctEl0;
public long TpidrEl0 { get; set; }
public long TpidrroEl0 { get; set; }
public long TpidrEl0
{
get => _nativeContext.GetTpidrEl0();
set => _nativeContext.SetTpidrEl0(value);
}
public long TpidrroEl0
{
get => _nativeContext.GetTpidrroEl0();
set => _nativeContext.SetTpidrroEl0(value);
}
public uint Pstate
{

View File

@@ -13,6 +13,8 @@ namespace ARMeilleure.State
public fixed ulong V[RegisterConsts.VecRegsCount * 2];
public fixed uint Flags[RegisterConsts.FlagsCount];
public fixed uint FpFlags[RegisterConsts.FpFlagsCount];
public long TpidrEl0;
public long TpidrroEl0;
public int Counter;
public ulong DispatchAddress;
public ulong ExclusiveAddress;
@@ -168,6 +170,12 @@ namespace ARMeilleure.State
}
}
public long GetTpidrEl0() => GetStorage().TpidrEl0;
public void SetTpidrEl0(long value) => GetStorage().TpidrEl0 = value;
public long GetTpidrroEl0() => GetStorage().TpidrroEl0;
public void SetTpidrroEl0(long value) => GetStorage().TpidrroEl0 = value;
public int GetCounter() => GetStorage().Counter;
public void SetCounter(int value) => GetStorage().Counter = value;
@@ -214,6 +222,16 @@ namespace ARMeilleure.State
}
}
public static int GetTpidrEl0Offset()
{
return StorageOffset(ref _dummyStorage, ref _dummyStorage.TpidrEl0);
}
public static int GetTpidrroEl0Offset()
{
return StorageOffset(ref _dummyStorage, ref _dummyStorage.TpidrroEl0);
}
public static int GetCounterOffset()
{
return StorageOffset(ref _dummyStorage, ref _dummyStorage.Counter);

View File

@@ -188,6 +188,21 @@ namespace ARMeilleure.Translation
}
}
public void EnterArmFpMode()
{
InstEmitSimdHelper.EnterArmFpMode(this, InstEmitHelper.GetFpFlag);
}
public void UpdateArmFpMode()
{
EnterArmFpMode();
}
public void ExitArmFpMode()
{
InstEmitSimdHelper.ExitArmFpMode(this, (flag, value) => InstEmitHelper.SetFpFlag(this, flag, value));
}
public Operand TryGetComparisonResult(Condition condition)
{
if (_optOpLastCompare == null || _optOpLastCompare != _optOpLastFlagSet)

View File

@@ -105,17 +105,11 @@ namespace ARMeilleure.Translation
SetDelegateInfo(typeof(NativeInterface).GetMethod(nameof(NativeInterface.GetDczidEl0)));
SetDelegateInfo(typeof(NativeInterface).GetMethod(nameof(NativeInterface.GetFunctionAddress)));
SetDelegateInfo(typeof(NativeInterface).GetMethod(nameof(NativeInterface.InvalidateCacheLine)));
SetDelegateInfo(typeof(NativeInterface).GetMethod(nameof(NativeInterface.GetTpidrroEl0)));
SetDelegateInfo(typeof(NativeInterface).GetMethod(nameof(NativeInterface.GetTpidr32))); // A32 only.
SetDelegateInfo(typeof(NativeInterface).GetMethod(nameof(NativeInterface.GetTpidrEl0)));
SetDelegateInfo(typeof(NativeInterface).GetMethod(nameof(NativeInterface.GetTpidrEl032))); // A32 only.
SetDelegateInfo(typeof(NativeInterface).GetMethod(nameof(NativeInterface.ReadByte)));
SetDelegateInfo(typeof(NativeInterface).GetMethod(nameof(NativeInterface.ReadUInt16)));
SetDelegateInfo(typeof(NativeInterface).GetMethod(nameof(NativeInterface.ReadUInt32)));
SetDelegateInfo(typeof(NativeInterface).GetMethod(nameof(NativeInterface.ReadUInt64)));
SetDelegateInfo(typeof(NativeInterface).GetMethod(nameof(NativeInterface.ReadVector128)));
SetDelegateInfo(typeof(NativeInterface).GetMethod(nameof(NativeInterface.SetTpidrEl0)));
SetDelegateInfo(typeof(NativeInterface).GetMethod(nameof(NativeInterface.SetTpidrEl032))); // A32 only.
SetDelegateInfo(typeof(NativeInterface).GetMethod(nameof(NativeInterface.SignalMemoryTracking)));
SetDelegateInfo(typeof(NativeInterface).GetMethod(nameof(NativeInterface.SupervisorCall)));
SetDelegateInfo(typeof(NativeInterface).GetMethod(nameof(NativeInterface.ThrowInvalidMemoryAccess)));

View File

@@ -3,4 +3,5 @@
namespace ARMeilleure.Translation
{
delegate void DispatcherFunction(IntPtr nativeContext, ulong startAddress);
delegate ulong WrapperFunction(IntPtr nativeContext, ulong startAddress);
}

View File

@@ -30,7 +30,7 @@ namespace ARMeilleure.Translation.PTC
private const string OuterHeaderMagicString = "PTCohd\0\0";
private const string InnerHeaderMagicString = "PTCihd\0\0";
private const uint InternalVersion = 4485; //! To be incremented manually for each change to the ARMeilleure project.
private const uint InternalVersion = 4661; //! To be incremented manually for each change to the ARMeilleure project.
private const string ActualDir = "0";
private const string BackupDir = "1";

View File

@@ -25,5 +25,10 @@ namespace ARMeilleure.Translation
{
return _func(context.NativeContextPtr);
}
public ulong Execute(WrapperFunction dispatcher, State.ExecutionContext context)
{
return dispatcher(context.NativeContextPtr, (ulong)FuncPointer);
}
}
}

View File

@@ -183,7 +183,7 @@ namespace ARMeilleure.Translation
Statistics.StartTimer();
ulong nextAddr = func.Execute(context);
ulong nextAddr = func.Execute(Stubs.ContextWrapper, context);
Statistics.StopTimer(address);
@@ -194,7 +194,7 @@ namespace ARMeilleure.Translation
{
TranslatedFunction func = Translate(address, context.ExecutionMode, highCq: false, singleStep: true);
address = func.Execute(context);
address = func.Execute(Stubs.ContextWrapper, context);
EnqueueForDeletion(address, func);

View File

@@ -21,6 +21,7 @@ namespace ARMeilleure.Translation
private readonly Translator _translator;
private readonly Lazy<IntPtr> _dispatchStub;
private readonly Lazy<DispatcherFunction> _dispatchLoop;
private readonly Lazy<WrapperFunction> _contextWrapper;
/// <summary>
/// Gets the dispatch stub.
@@ -64,6 +65,20 @@ namespace ARMeilleure.Translation
}
}
/// <summary>
/// Gets the context wrapper function.
/// </summary>
/// <exception cref="ObjectDisposedException"><see cref="TranslatorStubs"/> instance was disposed</exception>
public WrapperFunction ContextWrapper
{
get
{
ObjectDisposedException.ThrowIf(_disposed, this);
return _contextWrapper.Value;
}
}
/// <summary>
/// Initializes a new instance of the <see cref="TranslatorStubs"/> class with the specified
/// <see cref="Translator"/> instance.
@@ -77,6 +92,7 @@ namespace ARMeilleure.Translation
_translator = translator;
_dispatchStub = new(GenerateDispatchStub, isThreadSafe: true);
_dispatchLoop = new(GenerateDispatchLoop, isThreadSafe: true);
_contextWrapper = new(GenerateContextWrapper, isThreadSafe: true);
}
/// <summary>
@@ -202,6 +218,32 @@ namespace ARMeilleure.Translation
return Marshal.GetFunctionPointerForDelegate(func);
}
/// <summary>
/// Emits code that syncs FP state before executing guest code, or returns it to normal.
/// </summary>
/// <param name="context">Emitter context for the method</param>
/// <param name="nativeContext">Pointer to the native context</param>
/// <param name="enter">True if entering guest code, false otherwise</param>
private void EmitSyncFpContext(EmitterContext context, Operand nativeContext, bool enter)
{
if (enter)
{
InstEmitSimdHelper.EnterArmFpMode(context, (flag) =>
{
Operand flagAddress = context.Add(nativeContext, Const((ulong)NativeContext.GetRegisterOffset(new Register((int)flag, RegisterType.FpFlag))));
return context.Load(OperandType.I32, flagAddress);
});
}
else
{
InstEmitSimdHelper.ExitArmFpMode(context, (flag, value) =>
{
Operand flagAddress = context.Add(nativeContext, Const((ulong)NativeContext.GetRegisterOffset(new Register((int)flag, RegisterType.FpFlag))));
context.Store(flagAddress, value);
});
}
}
/// <summary>
/// Generates a <see cref="DispatchLoop"/> function.
/// </summary>
@@ -221,6 +263,8 @@ namespace ARMeilleure.Translation
Operand runningAddress = context.Add(nativeContext, Const((ulong)NativeContext.GetRunningOffset()));
Operand dispatchAddress = context.Add(nativeContext, Const((ulong)NativeContext.GetDispatchAddressOffset()));
EmitSyncFpContext(context, nativeContext, true);
context.MarkLabel(beginLbl);
context.Store(dispatchAddress, guestAddress);
context.Copy(guestAddress, context.Call(Const((ulong)DispatchStub), OperandType.I64, nativeContext));
@@ -229,6 +273,9 @@ namespace ARMeilleure.Translation
context.Branch(beginLbl);
context.MarkLabel(endLbl);
EmitSyncFpContext(context, nativeContext, false);
context.Return();
var cfg = context.GetControlFlowGraph();
@@ -237,5 +284,29 @@ namespace ARMeilleure.Translation
return Compiler.Compile(cfg, argTypes, retType, CompilerOptions.HighCq, RuntimeInformation.ProcessArchitecture).Map<DispatcherFunction>();
}
/// <summary>
/// Generates a <see cref="ContextWrapper"/> function.
/// </summary>
/// <returns><see cref="ContextWrapper"/> function</returns>
private WrapperFunction GenerateContextWrapper()
{
var context = new EmitterContext();
Operand nativeContext = context.LoadArgument(OperandType.I64, 0);
Operand guestMethod = context.LoadArgument(OperandType.I64, 1);
EmitSyncFpContext(context, nativeContext, true);
Operand returnValue = context.Call(guestMethod, OperandType.I64, nativeContext);
EmitSyncFpContext(context, nativeContext, false);
context.Return(returnValue);
var cfg = context.GetControlFlowGraph();
var retType = OperandType.I64;
var argTypes = new[] { OperandType.I64, OperandType.I64 };
return Compiler.Compile(cfg, argTypes, retType, CompilerOptions.HighCq, RuntimeInformation.ProcessArchitecture).Map<WrapperFunction>();
}
}
}

View File

@@ -0,0 +1,148 @@
using ARMeilleure.CodeGen.X86;
using ARMeilleure.IntermediateRepresentation;
using ARMeilleure.State;
using ARMeilleure.Translation;
using System;
using System.Runtime.InteropServices;
using static ARMeilleure.IntermediateRepresentation.Operand.Factory;
namespace ARMeilleure.Translation
{
public static class TranslatorTestMethods
{
public delegate int FpFlagsPInvokeTest(IntPtr managedMethod);
private static bool SetPlatformFtz(EmitterContext context, bool ftz)
{
if (Optimizations.UseSse2)
{
Operand mxcsr = context.AddIntrinsicInt(Intrinsic.X86Stmxcsr);
if (ftz)
{
mxcsr = context.BitwiseOr(mxcsr, Const((int)(Mxcsr.Ftz | Mxcsr.Um | Mxcsr.Dm)));
}
else
{
mxcsr = context.BitwiseAnd(mxcsr, Const(~(int)Mxcsr.Ftz));
}
context.AddIntrinsicNoRet(Intrinsic.X86Ldmxcsr, mxcsr);
return true;
}
else if (Optimizations.UseAdvSimd)
{
Operand fpcr = context.AddIntrinsicInt(Intrinsic.Arm64MrsFpcr);
if (ftz)
{
fpcr = context.BitwiseOr(fpcr, Const((int)FPCR.Fz));
}
else
{
fpcr = context.BitwiseAnd(fpcr, Const(~(int)FPCR.Fz));
}
context.AddIntrinsicNoRet(Intrinsic.Arm64MsrFpcr, fpcr);
return true;
}
else
{
return false;
}
}
private static Operand FpBitsToInt(EmitterContext context, Operand fp)
{
Operand vec = context.VectorInsert(context.VectorZero(), fp, 0);
return context.VectorExtract(OperandType.I32, vec, 0);
}
public static FpFlagsPInvokeTest GenerateFpFlagsPInvokeTest()
{
EmitterContext context = new EmitterContext();
Operand methodAddress = context.Copy(context.LoadArgument(OperandType.I64, 0));
// Verify that default dotnet fp state does not flush to zero.
// This is required for SoftFloat to function.
// Denormal + zero != 0
Operand denormal = ConstF(BitConverter.Int32BitsToSingle(1)); // 1.40129846432e-45
Operand zeroF = ConstF(0f);
Operand zero = Const(0);
Operand result = context.Add(zeroF, denormal);
// Must not be zero.
Operand correct1Label = Label();
context.BranchIfFalse(correct1Label, context.ICompareEqual(FpBitsToInt(context, result), zero));
context.Return(Const(1));
context.MarkLabel(correct1Label);
// Set flush to zero flag. If unsupported by the backend, just return true.
if (!SetPlatformFtz(context, true))
{
context.Return(Const(0));
}
// Denormal + zero == 0
Operand resultFz = context.Add(zeroF, denormal);
// Must equal zero.
Operand correct2Label = Label();
context.BranchIfTrue(correct2Label, context.ICompareEqual(FpBitsToInt(context, resultFz), zero));
SetPlatformFtz(context, false);
context.Return(Const(2));
context.MarkLabel(correct2Label);
// Call a managed method. This method should not change Fz state.
context.Call(methodAddress, OperandType.None);
// Denormal + zero == 0
Operand resultFz2 = context.Add(zeroF, denormal);
// Must equal zero.
Operand correct3Label = Label();
context.BranchIfTrue(correct3Label, context.ICompareEqual(FpBitsToInt(context, resultFz2), zero));
SetPlatformFtz(context, false);
context.Return(Const(3));
context.MarkLabel(correct3Label);
// Success.
SetPlatformFtz(context, false);
context.Return(Const(0));
// Compile and return the function.
ControlFlowGraph cfg = context.GetControlFlowGraph();
OperandType[] argTypes = new OperandType[] { OperandType.I64 };
return Compiler.Compile(cfg, argTypes, OperandType.I32, CompilerOptions.HighCq, RuntimeInformation.ProcessArchitecture).Map<FpFlagsPInvokeTest>();
}
}
}

View File

@@ -321,17 +321,15 @@ namespace Ryujinx.Ava
_viewModel.IsGameRunning = true;
var activeProcess = Device.Processes.ActiveApplication;
var nacp = activeProcess.ApplicationControlProperties;
int desiredLanguage = (int)Device.System.State.DesiredTitleLanguage;
string titleNameSection = string.IsNullOrWhiteSpace(nacp.Title[desiredLanguage].NameString.ToString()) ? string.Empty : $" - {nacp.Title[desiredLanguage].NameString.ToString()}";
string titleVersionSection = string.IsNullOrWhiteSpace(nacp.DisplayVersionString.ToString()) ? string.Empty : $" v{nacp.DisplayVersionString.ToString()}";
string titleIdSection = string.IsNullOrWhiteSpace(activeProcess.ProgramIdText) ? string.Empty : $" ({activeProcess.ProgramIdText.ToUpper()})";
string titleNameSection = string.IsNullOrWhiteSpace(activeProcess.Name) ? string.Empty : $" {activeProcess.Name}";
string titleVersionSection = string.IsNullOrWhiteSpace(activeProcess.DisplayVersion) ? string.Empty : $" v{activeProcess.DisplayVersion}";
string titleIdSection = $" ({activeProcess.ProgramIdText.ToUpper()})";
string titleArchSection = activeProcess.Is64Bit ? " (64-bit)" : " (32-bit)";
Dispatcher.UIThread.InvokeAsync(() =>
{
_viewModel.Title = $"Ryujinx {Program.Version}{titleNameSection}{titleVersionSection}{titleIdSection}{titleArchSection}";
_viewModel.Title = $"Ryujinx {Program.Version} -{titleNameSection}{titleVersionSection}{titleIdSection}{titleArchSection}";
});
_viewModel.SetUIProgressHandlers(Device);

View File

@@ -53,6 +53,8 @@ namespace Ryujinx.Ava.UI.Applet
bool opened = false;
_parent.Activate();
UserResult response = await ContentDialogHelper.ShowDeferredContentDialog(_parent,
title,
message,

View File

@@ -226,6 +226,7 @@ namespace Ryujinx.Graphics.OpenGL
// Set clip control, viewport and the framebuffer to the output to placate overlays and OBS capture.
GL.ClipControl(ClipOrigin.LowerLeft, ClipDepthMode.NegativeOneToOne);
GL.Viewport(0, 0, _width, _height);
GL.BindFramebuffer(FramebufferTarget.Framebuffer, drawFramebuffer);
swapBuffersCallback();

View File

@@ -1,4 +1,5 @@
using System;
using Ryujinx.Common.Logging;
using System;
using System.Diagnostics;
using System.Linq;
@@ -7,12 +8,26 @@ namespace Ryujinx.Graphics.Vulkan
internal class AutoFlushCounter
{
// How often to flush on framebuffer change.
private readonly static long FramebufferFlushTimer = Stopwatch.Frequency / 1000;
private readonly static long FramebufferFlushTimer = Stopwatch.Frequency / 1000; // (1ms)
// How often to flush on draw when fast flush mode is enabled.
private readonly static long DrawFlushTimer = Stopwatch.Frequency / 666; // (1.5ms)
// Average wait time that triggers fast flush mode to be entered.
private readonly static long FastFlushEnterThreshold = Stopwatch.Frequency / 666; // (1.5ms)
// Average wait time that triggers fast flush mode to be exited.
private readonly static long FastFlushExitThreshold = Stopwatch.Frequency / 10000; // (0.1ms)
// Number of frames to average waiting times over.
private const int SyncWaitAverageCount = 20;
private const int MinDrawCountForFlush = 10;
private const int MinConsecutiveQueryForFlush = 10;
private const int InitialQueryCountForFlush = 32;
private readonly VulkanRenderer _gd;
private long _lastFlush;
private ulong _lastDrawCount;
private bool _hasPendingQuery;
@@ -23,6 +38,16 @@ namespace Ryujinx.Graphics.Vulkan
private int _queryCountHistoryIndex;
private int _remainingQueries;
private long[] _syncWaitHistory = new long[SyncWaitAverageCount];
private int _syncWaitHistoryIndex;
private bool _fastFlushMode;
public AutoFlushCounter(VulkanRenderer gd)
{
_gd = gd;
}
public void RegisterFlush(ulong drawCount)
{
_lastFlush = Stopwatch.GetTimestamp();
@@ -69,6 +94,32 @@ namespace Ryujinx.Graphics.Vulkan
return _hasPendingQuery;
}
public bool ShouldFlushDraw(ulong drawCount)
{
if (_fastFlushMode)
{
long draws = (long)(drawCount - _lastDrawCount);
if (draws < MinDrawCountForFlush)
{
if (draws == 0)
{
_lastFlush = Stopwatch.GetTimestamp();
}
return false;
}
long flushTimeout = DrawFlushTimer;
long now = Stopwatch.GetTimestamp();
return now > _lastFlush + flushTimeout;
}
return false;
}
public bool ShouldFlushAttachmentChange(ulong drawCount)
{
_queryCount = 0;
@@ -102,11 +153,27 @@ namespace Ryujinx.Graphics.Vulkan
public void Present()
{
// Query flush prediction.
_queryCountHistoryIndex = (_queryCountHistoryIndex + 1) % 3;
_remainingQueries = _queryCountHistory.Max() + 10;
_queryCountHistory[_queryCountHistoryIndex] = 0;
// Fast flush mode toggle.
_syncWaitHistory[_syncWaitHistoryIndex] = _gd.SyncManager.GetAndResetWaitTicks();
_syncWaitHistoryIndex = (_syncWaitHistoryIndex + 1) % SyncWaitAverageCount;
long averageWait = (long)_syncWaitHistory.Average();
if (_fastFlushMode ? averageWait < FastFlushExitThreshold : averageWait > FastFlushEnterThreshold)
{
_fastFlushMode = !_fastFlushMode;
Logger.Debug?.PrintMsg(LogClass.Gpu, $"Switched fast flush mode: ({_fastFlushMode})");
}
}
}
}

View File

@@ -86,7 +86,7 @@ namespace Ryujinx.Graphics.Vulkan
Gd = gd;
Device = device;
AutoFlush = new AutoFlushCounter();
AutoFlush = new AutoFlushCounter(gd);
var pipelineCacheCreateInfo = new PipelineCacheCreateInfo()
{
@@ -1562,6 +1562,11 @@ namespace Ryujinx.Graphics.Vulkan
private void RecreatePipelineIfNeeded(PipelineBindPoint pbp)
{
if (AutoFlush.ShouldFlushDraw(DrawCount))
{
Gd.FlushAllCommands();
}
DynamicState.ReplayIfDirty(Gd.Api, CommandBuffer);
// Commit changes to the support buffer before drawing.

View File

@@ -1,6 +1,7 @@
using Ryujinx.Common.Logging;
using Silk.NET.Vulkan;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
namespace Ryujinx.Graphics.Vulkan
@@ -26,6 +27,7 @@ namespace Ryujinx.Graphics.Vulkan
private readonly Device _device;
private List<SyncHandle> _handles;
private ulong FlushId;
private long WaitTicks;
public SyncManager(VulkanRenderer gd, Device device)
{
@@ -130,6 +132,8 @@ namespace Ryujinx.Graphics.Vulkan
return;
}
long beforeTicks = Stopwatch.GetTimestamp();
if (result.NeedsFlush(FlushId))
{
_gd.InterruptAction(() =>
@@ -142,12 +146,14 @@ namespace Ryujinx.Graphics.Vulkan
}
bool signaled = result.Signalled || result.Waitable.WaitForFences(_gd.Api, _device, 1000000000);
if (!signaled)
{
Logger.Error?.PrintMsg(LogClass.Gpu, $"VK Sync Object {result.ID} failed to signal within 1000ms. Continuing...");
}
else
{
WaitTicks += Stopwatch.GetTimestamp() - beforeTicks;
result.Signalled = true;
}
}
@@ -188,5 +194,13 @@ namespace Ryujinx.Graphics.Vulkan
}
}
}
public long GetAndResetWaitTicks()
{
long result = WaitTicks;
WaitTicks = 0;
return result;
}
}
}

View File

@@ -49,6 +49,7 @@ namespace Ryujinx.Graphics.Vulkan
internal PipelineLayoutCache PipelineLayoutCache { get; private set; }
internal BackgroundResources BackgroundResources { get; private set; }
internal Action<Action> InterruptAction { get; private set; }
internal SyncManager SyncManager { get; private set; }
internal BufferManager BufferManager { get; private set; }
@@ -58,7 +59,6 @@ namespace Ryujinx.Graphics.Vulkan
private VulkanDebugMessenger _debugMessenger;
private Counters _counters;
private SyncManager _syncManager;
private PipelineFull _pipeline;
@@ -327,7 +327,7 @@ namespace Ryujinx.Graphics.Vulkan
BufferManager = new BufferManager(this, _device);
_syncManager = new SyncManager(this, _device);
SyncManager = new SyncManager(this, _device);
_pipeline = new PipelineFull(this, _device);
_pipeline.Initialize();
@@ -436,7 +436,7 @@ namespace Ryujinx.Graphics.Vulkan
internal void RegisterFlush()
{
_syncManager.RegisterFlush();
SyncManager.RegisterFlush();
}
public PinnedSpan<byte> GetBufferData(BufferHandle buffer, int offset, int size)
@@ -696,7 +696,7 @@ namespace Ryujinx.Graphics.Vulkan
public void PreFrame()
{
_syncManager.Cleanup();
SyncManager.Cleanup();
}
public ICounterEvent ReportCounter(CounterType type, EventHandler<ulong> resultHandler, bool hostReserved)
@@ -736,7 +736,7 @@ namespace Ryujinx.Graphics.Vulkan
public void CreateSync(ulong id, bool strict)
{
_syncManager.Create(id, strict);
SyncManager.Create(id, strict);
}
public IProgram LoadProgramBinary(byte[] programBinary, bool isFragment, ShaderInfo info)
@@ -746,12 +746,12 @@ namespace Ryujinx.Graphics.Vulkan
public void WaitSync(ulong id)
{
_syncManager.Wait(id);
SyncManager.Wait(id);
}
public ulong GetCurrentSync()
{
return _syncManager.GetCurrent();
return SyncManager.GetCurrent();
}
public void SetInterruptAction(Action<Action> interruptAction)

View File

@@ -5,6 +5,7 @@ using Ryujinx.Cpu;
using Ryujinx.HLE.HOS.SystemState;
using Ryujinx.HLE.Loaders.Processes.Extensions;
using Ryujinx.Horizon.Common;
using System.Linq;
namespace Ryujinx.HLE.Loaders.Processes
{
@@ -21,8 +22,9 @@ namespace Ryujinx.HLE.Loaders.Processes
public readonly ApplicationControlProperty ApplicationControlProperties;
public readonly ulong ProcessId;
public string Name;
public ulong ProgramId;
public readonly string Name;
public readonly string DisplayVersion;
public readonly ulong ProgramId;
public readonly string ProgramIdText;
public readonly bool Is64Bit;
public readonly bool DiskCacheEnabled;
@@ -52,17 +54,14 @@ namespace Ryujinx.HLE.Loaders.Processes
{
ulong programId = metaLoader.GetProgramId();
if (ApplicationControlProperties.Title.ItemsRo.Length > 0)
{
var langIndex = ApplicationControlProperties.Title.ItemsRo.Length > (int)titleLanguage ? (int)titleLanguage : 0;
Name = ApplicationControlProperties.Title[(int)titleLanguage].NameString.ToString();
Name = ApplicationControlProperties.Title[langIndex].NameString.ToString();
}
else
if (string.IsNullOrWhiteSpace(Name))
{
Name = metaLoader.GetProgramName();
Name = ApplicationControlProperties.Title.ItemsRo.ToArray().FirstOrDefault(x => x.Name[0] != 0).NameString.ToString();
}
DisplayVersion = ApplicationControlProperties.DisplayVersionString.ToString();
ProgramId = programId;
ProgramIdText = $"{programId:x16}";
Is64Bit = metaLoader.IsProgram64Bit();
@@ -85,16 +84,7 @@ namespace Ryujinx.HLE.Loaders.Processes
}
// TODO: LibHac npdm currently doesn't support version field.
string version;
if (ProgramId > 0x0100000000007FFF)
{
version = ApplicationControlProperties.DisplayVersionString.ToString();
}
else
{
version = device.System.ContentManager.GetCurrentFirmwareVersion().VersionString;
}
string version = ProgramId > 0x0100000000007FFF ? DisplayVersion : device.System.ContentManager.GetCurrentFirmwareVersion()?.VersionString ?? "?";
Logger.Info?.Print(LogClass.Loader, $"Application Loaded: {Name} v{version} [{ProgramIdText}] [{(Is64Bit ? "64-bit" : "32-bit")}]");

View File

@@ -0,0 +1,91 @@
using ARMeilleure.Translation;
using NUnit.Framework;
using Ryujinx.Cpu.Jit;
using Ryujinx.Tests.Memory;
using System;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;
namespace Ryujinx.Tests.Cpu
{
internal class EnvironmentTests
{
private static Translator _translator;
private void EnsureTranslator()
{
// Create a translator, as one is needed to register the signal handler or emit methods.
_translator ??= new Translator(new JitMemoryAllocator(), new MockMemoryManager(), true);
}
[MethodImpl(MethodImplOptions.NoInlining | MethodImplOptions.NoOptimization)]
private float GetDenormal()
{
return BitConverter.Int32BitsToSingle(1);
}
[MethodImpl(MethodImplOptions.NoInlining | MethodImplOptions.NoOptimization)]
private float GetZero()
{
return BitConverter.Int32BitsToSingle(0);
}
/// <summary>
/// This test ensures that managed methods do not reset floating point control flags.
/// This is used to avoid changing control flags when running methods that don't require it, such as SVC calls, software memory...
/// </summary>
[Test]
public void FpFlagsPInvoke()
{
EnsureTranslator();
// Subnormal results are not flushed to zero by default.
// This operation should not be allowed to do constant propagation, hence the methods that explicitly disallow inlining.
Assert.AreNotEqual(GetDenormal() + GetZero(), 0f);
bool methodCalled = false;
bool isFz = false;
var managedMethod = () =>
{
// Floating point math should not modify fp flags.
float test = 2f * 3.5f;
if (test < 4f)
{
throw new System.Exception("Sanity check.");
}
isFz = GetDenormal() + GetZero() == 0f;
try
{
if (test >= 4f)
{
throw new System.Exception("Always throws.");
}
}
catch
{
// Exception handling should not modify fp flags.
methodCalled = true;
}
};
var method = TranslatorTestMethods.GenerateFpFlagsPInvokeTest();
// This method sets flush-to-zero and then calls the managed method.
// Before and after setting the flags, it ensures subnormal addition works as expected.
// It returns a positive result if any tests fail, and 0 on success (or if the platform cannot change FP flags)
int result = method(Marshal.GetFunctionPointerForDelegate(managedMethod));
// Subnormal results are not flushed to zero by default, which we should have returned to exiting the method.
Assert.AreNotEqual(GetDenormal() + GetZero(), 0f);
Assert.True(result == 0);
Assert.True(methodCalled);
Assert.True(isFz);
}
}
}

View File

@@ -119,7 +119,7 @@ namespace Ryujinx.Ui
}
catch (Exception) { }
Device.DisposeGpu();
Device?.DisposeGpu();
NpadManager.Dispose();
// Unbind context and destroy everything
@@ -129,7 +129,7 @@ namespace Ryujinx.Ui
}
catch (Exception) { }
_openGLContext.Dispose();
_openGLContext?.Dispose();
}
}
}

View File

@@ -496,15 +496,13 @@ namespace Ryujinx.Ui
parent.Present();
var activeProcess = Device.Processes.ActiveApplication;
var nacp = activeProcess.ApplicationControlProperties;
int desiredLanguage = (int)Device.System.State.DesiredTitleLanguage;
string titleNameSection = string.IsNullOrWhiteSpace(nacp.Title[desiredLanguage].NameString.ToString()) ? string.Empty : $" - {nacp.Title[desiredLanguage].NameString.ToString()}";
string titleVersionSection = string.IsNullOrWhiteSpace(nacp.DisplayVersionString.ToString()) ? string.Empty : $" v{nacp.DisplayVersionString.ToString()}";
string titleIdSection = string.IsNullOrWhiteSpace(activeProcess.ProgramIdText) ? string.Empty : $" ({activeProcess.ProgramIdText.ToUpper()})";
string titleNameSection = string.IsNullOrWhiteSpace(activeProcess.Name) ? string.Empty : $" {activeProcess.Name}";
string titleVersionSection = string.IsNullOrWhiteSpace(activeProcess.DisplayVersion) ? string.Empty : $" v{activeProcess.DisplayVersion}";
string titleIdSection = $" ({activeProcess.ProgramIdText.ToUpper()})";
string titleArchSection = activeProcess.Is64Bit ? " (64-bit)" : " (32-bit)";
parent.Title = $"Ryujinx {Program.Version}{titleNameSection}{titleVersionSection}{titleIdSection}{titleArchSection}";
parent.Title = $"Ryujinx {Program.Version} -{titleNameSection}{titleVersionSection}{titleIdSection}{titleArchSection}";
});
Thread renderLoopThread = new Thread(Render)