unity中framebufferfetch在arm mali GPU设备上崩溃

最近在优化游戏带宽，发现URP流程下的copy depth/copy color导致打断渲染产生片上带宽的消耗非常大，[数据]

我们深度图只用于水的深度渐变、decal、软粒子，在这些场景中没有随机读的需求，只需要知道当前像素的深度值。

可以用移动芯片framebufferfetch的技术做优化，这个功能在unity中是内置的。但unity只支持GL_EXT_shader_framebuffer_fetch 。这个官方扩展只支持读颜色，那简单的，我们可以把深度数据encode到alpha通道，然后在需要读取深度的fragment decode出深度值。

#pragma only_renderers framebufferfetch

void frag(v2f i, inout float4 ocol : SV_Target)
{
	ocol.rgb = col.rgb*col.a + ocol.rgb*(1.0-col.a);
}

但考虑带宽，现在一般android的默认RT是RGBA8888(32b),8位的深度值在使用中是不够的，如果再上一层就只能用RGBA16(64b), 带宽消耗又会多一倍。我们考虑在encode的时候主要关心主角周围的，效果还能接受。

使用这个技术，带宽可以降大概1000MB/s, 耗电量可以减少100ma,效果拔群。

但有个很坑的点是，unity这套方案在arm mali GPU上不支持

Cannot fetch framebuffer color data in Mali-G72 GPU

原以为所谓不支持，我只需要判断下arm机型，做下shader的lod就可以，没有想到会在shader的编译阶段就crash，非常难用。

这样我们就只能在unity内手写glsl，非常难写，unity本身基本上不会给出shader写法正确性。不像原生的cg语法，unity会给出错误提示。

手动转换hlsl代码取glsl，大致的框架:

Shader "GLSLFetch" { // defines the name of the shader 
    Properties{
    _MainTex("MainTex", 2D) = "white" {}
    }
        GLSLINCLUDE
#include "UnityCG.glslinc"
        ENDGLSL
    SubShader{ // Unity chooses the subshader that fits the GPU best
       Pass { // some shaders require multiple passes
         GLSLPROGRAM // here begins the part in Unity's GLSL

         #ifdef VERTEX // here begins the vertex shader

         void main() // all vertex shaders define a main() function
         {
             gl_Position = gl_ModelViewProjectionMatrix * gl_Vertex;
             textureCoordinates = gl_MultiTexCoord0.xy;
	       }

	       #endif // here ends the definition of the vertex shader

	       #ifdef FRAGMENT // here begins the fragment shader

	       void main() // all fragment shaders define a main() function
	       {
           gl_FragColor = vec4(gl_LastFragColorARM, 1, 0, 1);
			    }

			    #endif // here ends the definition of the fragment shader

		    ENDGLSL // here ends the part in GLSL 
			 }
    }

原生的glsl语法，非常优美。这样我们就可以用glsl的扩展

#extension GL_ARM_shader_framebuffer_fetch : enable

然后就可以用 gl_LastFragColorARM 读当前像素的值，其余做法就和之前一致了。