Commit Graph

504 Commits

Author SHA1 Message Date
8d9d508dc7 Shader: Bias textureGather instructions on AMD/Intel (#4703)
* Experimental (GLSL, forced)

* SPIR-V attempt

* Add capability

* Fix pCount == 1 on glsl

* Fix typo
2023-04-22 18:02:39 -03:00
d9b63353b0 Support copy between multisample and non-multisample depth textures (#4676)
* Support copy between multisample and non-multisample depth textures

* PR feedback
2023-04-17 08:13:53 +00:00
c532118d94 Use index fragment shader output when dual source blend is enabled (#4404)
* Use index fragment shader output when dual source blend is enabled

* Shader cache version bump

* Actually set DualSourceBlendEnabled to true

* Fix XML doc

---------

Co-authored-by: Ac_K <Acoustik666@gmail.com>
2023-04-05 05:25:19 +02:00
9ecbee8032 Batch inline index buffer update (#4587) 2023-03-24 14:19:54 +01:00
80519af67d Update short cache textures if modified (#4586) 2023-03-24 12:54:58 +01:00
9f1cf6458c Vulkan: Migrate buffers between memory types to improve GPU performance (#4540)
* Initial implementation of migration between memory heaps

- Missing OOM handling
- Missing `_map` data safety when remapping
  - Copy may not have completed yet (needs some kind of fence)
  - Map may be unmapped before it is done being used. (needs scoped access)
- SSBO accesses are all "writes" - maybe pass info in another way.
- Missing keeping map type when resizing buffers (should this be done?)

* Ensure migrated data is in place before flushing.

* Fix issue where old waitable would be signalled.

- There is a real issue where existing Auto<> references need to be replaced.

* Swap bound Auto<> instances when swapping buffer backing

* Fix conversion buffers

* Don't try move buffers if the host has shared memory.

* Make GPU methods return PinnedSpan with scope

* Storage Hint

* Fix stupidity

* Fix rebase

* Tweak rules

Attempt to sidestep BOTW slowdown

* Remove line

* Migrate only when command buffers flush

* Change backing swap log to debug

* Address some feedback

* Disallow backing swap when the flush lock is held by the current thread

* Make PinnedSpan from ReadOnlySpan explicitly unsafe

* Fix some small issues

- Index buffer swap fixed
- Allocate DeviceLocal buffers using a separate block list to images.

* Remove alternative flags

* Address feedback
2023-03-19 17:56:48 -03:00
67b4e63cff Remove MultiRange Min/MaxAddress and rename GetSlice to Slice (#4566)
* Delete MinAddress and MaxAddress from MultiRange

* Rename MultiRange.GetSlice to MultiRange.Slice
2023-03-19 17:31:35 +01:00
5131b71437 Reducing memory allocations (#4537)
* add RecyclableMemoryStream dependency and MemoryStreamManager

* organize BinaryReader/BinaryWriter extensions

* add StreamExtensions to reduce need for BinaryWriter

* simple replacments of MemoryStream with RecyclableMemoryStream

* add write ReadOnlySequence<byte> support to IVirtualMemoryManager

* avoid 0-length array creation

* rework IpcMessage and related types to greatly reduce memory allocation by using RecylableMemoryStream, keeping streams around longer, avoiding their creation when possible, and avoiding creation of BinaryReader and BinaryWriter when possible

* reduce LINQ-induced memory allocations with custom methods to query KPriorityQueue

* use RecyclableMemoryStream in StreamUtils, and use StreamUtils in EmbeddedResources

* add constants for nanosecond/millisecond conversions

* code formatting

* XML doc adjustments

* fix: StreamExtension.WriteByte not writing non-zero values for lengths <= 16

* XML Doc improvements. Implement StreamExtensions.WriteByte() block writes for large-enough count values.

* add copyless path for StreamExtension.Write(ReadOnlySpan<int>)

* add default implementation of IVirtualMemoryManager.Write(ulong, ReadOnlySequence<byte>); remove previous explicit implementations

* code style fixes

* remove LINQ completely from KScheduler/KPriorityQueue by implementing a custom struct-based enumerator
2023-03-17 13:14:50 +01:00
da073fce61 GPU: Fast path for adding one texture view to a group (#4528)
* GPU: Fast path for adding one texture view to a group

Texture group handles must store a list of their overlapping views, so they can be properly notified when a write is detected, and a few other things relating to texture readback. This is generally created when the group is established, with each handle looping over all views to find its overlaps. This whole process was also done when only a single view was added (and no handles were changed), however...

Sonic Frontiers had a huge cubemap array with 7350 faces (175 cubemaps * 6 faces * 7 levels), so iterating over both handles and existing views added up very fast. Since we are only adding a single view, we only need to _add_ that view to the existing overlaps, rather than recalculate them all.

This greatly improves performance during loading screens and a few seconds into gameplay on the "open zone" sections of Sonic Frontiers. May improve loading times or stutters on some other games.

Note that the current texture cache rules will cause these views to fall out of the cache, as there are more than the hard cap, so the cost will be repaid when reloading the open zone.

I also added some code to properly remove overlaps when texture views are removed, since it seems that was missing.

This can be improved further by only iterating handles that overlap the view (filter by range), but so can a few places in TextureGroup, so better to do all at once. The full generation of overlaps could probably be improved in a similar way.

I recommend testing a few games to make sure nothing breaks.

* Address feedback
2023-03-14 17:33:44 -03:00
1fc90e57d2 Update range for remapped sparse textures instead of recreating them (#4442)
* Update sparsely mapped texture ranges without recreating

Important TODO in TexturePool. Smaller TODO: should I look into making textures with views also do this? It needs to be able to detect if the views can be instantly deleted without issue if they're now remapped.

* Actually do partial updates

* Signal group dirty after mappings changed

* Fix various issues (should work now)

* Further optimisation

Should load a lot less data (16x) when partial updating 3d textures.

* Improve stability

* Allow granular uploads on large textures, improve rules

* Actually avoid updating slices that aren't modified.

* Address some feedback, minor optimisation

* Small tweak

* Refactor DereferenceRequest

More specific initialization methods.

* Improve code for resetting handles

* Explain data loading a bit more

* Add some safety for setting null from different threads.

All texture sets come from the one thread, but null sets can come from multiple. Only decrement ref count if we succeeded the null set first.

* Address feedback 1

* Make a bit safer
2023-03-14 17:08:44 -03:00
6e9bd4de13 GPU: Scale counter results before addition (#4471)
* GPU: Scale counter results before addition

Counter results were being scaled on ReportCounter, which meant that the _total_ value of the counter was being scaled. Not only could this result in very large numbers and weird overflows if the game doesn't clear the counter, but it also caused the result to change drastically.

This PR changes scaling to be done when the value is added to the counter on the backend. This should evaluate the scale at the same time as before, on report counter, but avoiding the issue with scaling the total.

Fixes scaling in Warioware, at least in the demo, where it seems to compare old/new counters and broke down when scaling was enabled.

* Fix issues when result is partially uploaded.

Drivers tend to write the low half first, then the high half. Retry if the high half is FFFFFFFF.
2023-03-12 18:01:15 +01:00
4f3af839be Minor code formatting (#4498) 2023-03-04 14:43:08 +01:00
cedd200745 Move gl_Layer to vertex shader if geometry is not supported (#4368)
* Set gl_Layer on vertex shader if it's set on the geometry shader and it does nothing else

* Shader cache version bump

* PR feedback

* Fix typo
2023-02-25 10:39:51 +00:00
095ad923ad Account for multisample when calculating render target size hint (#4467) 2023-02-23 10:08:54 +01:00
c3a5716a95 Add copy dependency for some incompatible texture formats (#4380)
* Add copy dependency for some incompatible texture formats

* Simplify compatibility check
2023-02-21 19:21:57 -03:00
58d7a1fe97 Mark texture as modified and sync on I2M fast path (#4449) 2023-02-21 10:40:23 +01:00
7aa430f1a5 Add support for advanced blend (part 1/2) (#2801)
* Add blend microcode registers

* Add advanced blend support using host extension

* Remove debug message

* Use pre-generated table for blend functions

* XML docs

* Rename AdvancedBlendMode to AdvancedBlendOp for consistency

* Remove redundant code

* Fix some advanced blend related issues on Vulkan

* Formatting
2023-02-19 22:37:37 -03:00
efb135b74c Clear CPU side data on GPU buffer clears (#4125)
* Clear CPU side data on GPU buffer clears

* Implement tracked fill operation that can signal other resource types except buffer

* Fix tests, add missing XML doc

* PR feedback
2023-02-16 18:28:49 -03:00
a707842e14 Validate dimensions before creating texture (#4430) 2023-02-16 11:16:31 -03:00
e4f68592c3 Fix partial updates for textures. (#4401)
I was forcing some types of texture to partially update when investigating performance with games that stream in data, and noticed that partially loading texture data was really broken on both backends.

Fixes Vulkan texture set by getting the correct expected size for the texture. Fixes partial upload on both backends for both Texture 2D Array and Cubemap using the wrong offset and uploading to the first layer/level for a handle. 3D might also be affected.

This might fix textures randomly having incorrect data in games that render to it - jumbled in the case of OpenGL, and outdated/black in the case of Vulkan. This case typically happens in UE4 games.
2023-02-12 10:30:26 +01:00
61b1ce252f Allow partially mapped textures with unmapped start (#4394) 2023-02-10 11:47:59 -03:00
26bf13a65d Limit texture cache based on total texture size (#4350)
* Limit texture cache based on total texture size

* Formatting
2023-02-08 14:19:43 +01:00
96cf242bcf Handle mismatching texture size with copy dependencies (#4364)
* Handle mismatching texture size with copy dependencies

* Create copy and render textures with the minimum possible size

* Only align width for comparisons, assume that height is always exact

* Fix IsExactMatch size check

* Allow sampler and copy textures to match textures with larger width

* Delete texture ChangeSize related code

* Move AdjustSize to TextureInfo and give it a better name, adjust usages

* Fix GetMinimumWidthInGob when minimumWidth > width

* Only update render targets that are actually cleared for clear

Avoids creating textures with incorrect sizes

* Delete UpdateRenderTargetState method that is not needed anymore

Clears now only ever sets the render targets that will be cleared rather than all of them
2023-02-08 08:48:09 +01:00
43081c16c4 Insert bitcast for assignment of fragment integer outputs on GLSL (#4369)
* Insert bitcast for assignment of fragment integer outputs on GLSL

* Shader cache version bump
2023-02-05 18:52:57 -03:00
2fd819613f SPIR-V: Change BitfieldExtract and BitfieldInsert for SPIRV-Cross (#4336)
* SPIR-V: Change BitfieldExtract and BitfieldInsert types to make Metal MSL compiler happy

* Shader cache version bump
2023-01-23 19:20:40 -03:00
e3d0ccf8d5 Allow setting texture data from 1x to fix some textures resetting randomly (#2860)
* Allow setting texture data from 1x to fix some textures resetting randomly

Expected targets:

- Deltarune 1+2
- Crash Team Racing
- Those new pokemon games idk

* Allow scaling of MSAA textures, propagate scale on copy.

* Fix Rebase

Oops

* Automatic disable

* A bit more aggressive

* Without the debug log

* Actually decrement the score when writing.
2023-01-22 02:03:30 +00:00
6adf15e479 Implement CSET and CSETP shader instructions (#4318)
* Implement CSET and CSETP shader instructions

* Shader cache version bump

* Fix CC.HI
2023-01-21 12:18:05 -03:00
86fd0643c2 Implement support for page sizes > 4KB (#4252)
* Implement support for page sizes > 4KB

* Check and work around more alignment issues

* Was not meant to change this

* Use MemoryBlock.GetPageSize() value for signal handler code

* Do not take the path for private allocations if host supports 4KB pages

* Add Flags attribute on MemoryMapFlags

* Fix dirty region size with 16kb pages

Would accidentally report a size that was too high (generally 16k instead of 4k, uploading 4x as much data)

Co-authored-by: riperiperi <rhy3756547@hotmail.com>
2023-01-17 05:13:24 +01:00
f0e27a23a5 Add short duration texture cache (#3754)
* Add short duration texture cache

This texture cache takes textures that lose their last pool reference and keeps them alive until the next frame, or until an incompatible overlap removes it. This is done since under certain circumstances, a texture's reference can be wiped from a pool despite it still being in use - though typically the reference will return when rendering the next frame.

While this may slightly increase texture memory usage when quickly going through a bunch of temporary textures, it's still bounded due to the overlap removal rule.

This greatly increases performance in Hyrule Warriors: Age of Calamity. It may positively affect some UE4 games which dip framerate severely under certain circumstances.

* Small optimization

* Don't forget this.

* Add short cache dictionary

* Address feedback

* Address some feedback
2023-01-17 04:39:46 +01:00
93df366b2c Fix texture flush from CPU WaitSync regression on OpenGL (#4289) 2023-01-14 11:23:57 -03:00
cd3a15aea5 Fix NRE when MemoryUnmappedHandler is called for a destroyed channel (#4285) 2023-01-14 00:16:06 -03:00
070136b3f7 Fix texture modified on CPU from GPU thread after being modified on GPU not being updated (#4284) 2023-01-13 23:46:45 -03:00
8fa248ceb4 Vulkan: Add workarounds for MoltenVK (#4202)
* Add MVK basics.

* Use appropriate output attribute types

* 4kb vertex alignment, bunch of fixes

* Add reduced shader precision mode for mvk.

* Disable ASTC on MVK for now

* Only request robustnes2 when it is available.

* It's just the one feature actually

* Add triangle fan conversion

* Allow NullDescriptor on MVK for some reason.

* Force safe blit on MoltenVK

* Use ASTC only when formats are all available.

* Disable multilevel 3d texture views

* Filter duplicate render targets (on backend)

* Add Automatic MoltenVK Configuration

* Do not create color attachment views with formats that are not RT compatible

* Make sure that the host format matches the vertex shader input types for invalid/unknown guest formats

* FIx rebase for Vertex Attrib State

* Fix 4b alignment for vertex

* Use asynchronous queue submits for MVK

* Ensure color clear shader has correct output type

* Update MoltenVK config

* Always use MoltenVK workarounds on MacOS

* Make MVK supersede all vendors

* Fix rebase

* Various fixes on rebase

* Get portability flags from extension

* Fix some minor rebasing issues

* Style change

* Use LibraryImport for MVKConfiguration

* Rename MoltenVK vendor to Apple

Intel and AMD GPUs on moltenvk report with the those vendors - only apple silicon reports with vendor 0x106B.

* Fix features2 rebase conflict

* Rename fragment output type

* Add missing check for fragment output types

Might have caused the crash in MK8

* Only do fragment output specialization on MoltenVK

* Avoid copy when passing capabilities

* Self feedback

* Address feedback

Co-authored-by: gdk <gab.dark.100@gmail.com>
Co-authored-by: nastys <nastys@users.noreply.github.com>
2023-01-13 01:31:21 +01:00
94a64f2aea Remove textures from cache on unmap if not mapped and modified (#4211) 2023-01-11 01:53:56 +00:00
9dfe81770a Use vector outputs for texture operations (#3939)
* Change AggregateType to include vector type counts

* Replace VariableType uses with AggregateType and delete VariableType

* Support new local vector types on SPIR-V and GLSL

* Start using vector outputs for texture operations

* Use vectors on more texture operations

* Use vector output for ImageLoad operations

* Replace all uses of single destination texture constructors with multi destination ones

* Update textureGatherOffsets replacement to split vector operations

* Shader cache version bump

Co-authored-by: Ac_K <Acoustik666@gmail.com>
2022-12-29 16:09:34 +01:00
e20abbf9cc Vulkan: Don't flush commands when creating most sync (#4087)
* Vulkan: Don't flush commands when creating most sync

When the WaitForIdle method is called, we create sync as some internal GPU method may read back written buffer data. Some games randomly intersperse compute dispatch into their render passes, which result in this happening an unbounded number of times depending on how many times they run compute.

Creating sync in Vulkan is expensive, as we need to flush the current command buffer so that it can be waited on. We have a limited number of active command buffers due to how we track resource usage, so submitting too many command buffers will force us to wait for them to return to the pool.

This PR allows less "important" sync (things which are less likely to be waited on) to wait on a command buffer's result without submitting it, instead relying on AutoFlush or another, more important sync to flush it later on.

Because of the possibility of us waiting for a command buffer that hasn't submitted yet, any thread needs to be able to force the active command buffer to submit. The ability to do this has been added to the backend multithreading via an "Interrupt", though it is not supported without multithreading.

OpenGL drivers should already be doing something similar so they don't blow up when creating lots of sync, which is why this hasn't been a problem for these games over there.

Improves Vulkan performance on Xenoblade DE, Pokemon Scarlet/Violet, and Zelda BOTW (still another large issue here)

* Add strict argument

This is technically a separate concern from whether the sync is a host syncpoint.

* Remove _interrupted variable

* Actually wait for the invoke

This is required by AMD GPUs, and also may have caused some issues on other GPUs.

* Remove unused using.

* I don't know why it added these ones.

* Address Feedback

* Fix typo
2022-12-29 15:39:04 +01:00
470be03c2f GPU: Add fallback when 16-bit formats are not supported (#4108)
* Add conversion for 16 bit RGBA formats (not supported in Rosetta)

* Rebase fix

Rebase fix

* Forgot to remove this

* Fix RGBA16 format conversion

* Add RGBA4 -> RGBA8 conversion

* Handle host stride alignment

* Address Feedback Part 1

* Can't count

* Don't zero out rgb when alpha is 0

* Separate RGBA4 and 5-bit component formats

Not sure of a better way to name them...

* Add A1B5G5R5 conversion

* Put this in the right place.

* Make format naming consistent for capabilities

* Change method names
2022-12-26 15:50:27 -03:00
c963b3c804 Added Generic Math to BitUtils (#3929)
* Generic Math Update

Updated Several functions in Ryujinx.Common/Utilities/BitUtils to use generic math

* Updated BitUtil calls

* Removed Whitespace

* Switched decrement

* Fixed changed method calls.

The method calls were originally changed on accident due to me relying too much on intellisense doing stuff for me

* Update Ryujinx.Common/Utilities/BitUtils.cs

Co-authored-by: gdkchan <gab.dark.100@gmail.com>

Co-authored-by: gdkchan <gab.dark.100@gmail.com>
2022-12-26 14:11:05 +00:00
f906eb06c2 Implement a software ETC2 texture decoder (#4121)
* Implement a software ETC2 texture decoder

* Fix output size calculation for non-2D textures

* Address PR feedback
2022-12-21 20:39:58 -03:00
1cca3e99ab GPU: Force rebind when pool changes (#4129) 2022-12-21 17:35:28 -03:00
cb70e7bb30 Fix DrawArrays vertex buffer size (#4141) 2022-12-21 19:08:12 +01:00
ec4cd57ccf Implement another non-indexed draw method on GPU (#4123) 2022-12-16 12:06:38 -03:00
5a085cba0f GPU: Fix layered attachment write (#4131)
Fixes a regression caused by #4003 where the code that writes `_vtgWritesRtLayer` was removed, breaking the crowd in mario strikers.
2022-12-16 09:40:01 -03:00
f4d731ae20 Fix NRE when loading Vulkan shader cache with Vertex A shaders (#4124) 2022-12-15 17:52:12 +01:00
8ac53c66b4 Remove Half Conversion (#4106)
* Remove HalfConversion

* Update `CodeGenVersion`
2022-12-14 21:13:23 -03:00
edf7e628ca Use method overloads that support trimming. Mark some types to be trimming friendly (#4083)
* Use method overloads that support trimming. Mark some types to be trimming friendly

* Use generic version of marshalling method
2022-12-12 15:10:05 +01:00
851d81d24a Fix Redundant Qualifer Warnings (#4091)
* Fix Redundant Qualifer Warnings

* Remove unnecessary using
2022-12-10 21:21:13 +01:00
459c4caeba Fix HasUnalignedStorageBuffers value when buffers are always unaligned (#4078) 2022-12-09 17:41:40 -03:00
8428bb6541 Fix shader FSWZADD instruction (#4069)
* Fix shader FSWZADD instruction

* Shader cache version bump
2022-12-08 14:08:07 -03:00
9a0330f7f8 Shader: Implement PrimitiveID (#4067)
* Shader: Implement PrimitiveID

* Shader cache version bump
2022-12-08 10:55:03 +01:00