GPU assembler programming or assembly like programming, possible?

I researched this bit after reading through opencl manual and one of the example analyzed the C opencl kernel code and its corresponding conversion to GPU ISA equivalent. I barely understood anything in the assembler code. I have done extensive assembler code for CPU in the past.

This led me to vega10 ISA as well as other like RDNA etc. availebl in similar links:
https://developer.amd.com/wp-content/resources/Vega_Shader_ISA.pdf

I did some web search and found GPU assembly is largely impossible and closes thing I can do is
https://computergraphics.stackexchange.com/questions/7809/what-does-gpu-assembly-look-like
openGL ARB assembly.

I understand both NVIDIA and AMD will have drastically different ISA (assuming) vs. CPU for AMD and INTEL should be very similar save for advanced instructions.

It would be nice if I can do some mental basic exercise writing some assembly in native GPU ISA to get more insight into GPU’s internal architecture.
Plus ARB assembly seems to be limited to graphics shader not sure it can do anything in regards to computer shader?

1 Like

They don’t make their GPUs like their CPUs. It isn’t x86 or a variant thereof.

That is sort of side statement, really looking for assembler

This is not strictly true. ARB assembly language support languished once GLSL took off. NVIDIA kept extending this with NV specific GL extensions. For instance, on the OpenGL Extension Registry, see all extensions matching NV_gpu_program*, NV_vertex_program*, NV_fragment_program*, NV_geometry_program*, NV_tessellation_program*, NV_compute_program*, NV_parameter_buffer_object*, such as:

Here’s an old quick ref guide:

For many years, the NVIDIA GLSL compiler (in Cg, in the graphics driver, etc.) compiled GLSL shaders down into this assembly language as a first step, and it was pretty well documented. This meant you could do pretty much anything you could in GLSL in NV Assembly, because the driver was making use of this path. With Turing/RTX* GPUs however, they’ve apparently switched to using “something else” for this 1st-step assembly language (possibly SPIR-V) as some of NV’s latest extensions (e.g. NV_mesh_shader) didn’t extend NV Assembly to add this support, but there is SPIR-V support for them.

However (to your next point)…

neither of these are the low-level ISA for the GPU. They’re a cross-GPU portable high-level assembly representation that the driver uses to generate the low-level driver assembly specific to that ISA. I’ve heard that low-level assembly language called SASS by NVIDIA folks, and allegedly some NV tools will let you collect and inspect it. It’s likely what CUDA produces as well.

For NVIDIA’s high-level assembly language specifically, see:

and related NV Assembly extensions. Just keep in mind this is the high-level cross-GPU portable assembly, not the low-level ISA.

However, for the cross-vendor high-level assembly solution, take a look at SPIR-V. It’s the new shader hotness for Vulkan, and of course supports GLSL authoring. You can use open source tools like glslangValidator and glslc to compile GLSL compute shaders down to SPIR-V assembly and inspect to your heart’s content.

Just keep in mind that neither of these are the GPU’s low-level ISA. For those, see your GPU vendor.

A recent blog post related to all this high-level / low-level shader assembly and shader patching/ reoptimization w.r.t. OpenGL and Vulkan:

Thanks very insightful infor, for Graphic Shaders, it will probably take some time to digest as I am spending most of my time with Compute (opencl).
However for opencl, I see path in the links:

opencl-> clang->LLVM-> SPIR-V LLVM IR translator ↔ SPIR-V tools.

hah! For spir-v white paper, I found example code, indeed it looks like a high-level portable code definitely not ISA perhaps I can find some useful info if i can manage to dig through opencl driver

// Magic number 0x07230203
// SPIR-V Version 99
// Generated by (magic number): 1
// Id’s are bound by 23
// schema 0
Source OpenCL 120
EntryPoint Kernel 9
MemoryModel Physical64 OpenCL1.2
Name 4 "LocalInvocationId"
Name 9 "add"
Name 10 "in1"
Name 11 "in2"
Name 12 "out"
Name 13 "entry"
Name 15 "call"
Name 16 "arrayidx"
Name 18 "arrayidx1"
Name 20 "add"
Name 21 "arrayidx2"
Decorate 4(LocalInvocationId) Constant
Decorate 4(LocalInvocationId) Built-In LocalInvocationId
Decorate 10(in1) FuncParamAttr 5
Decorate 11(in2) FuncParamAttr 5
Decorate 12(out) FuncParamAttr 5
Decorate 17 Alignment 4
Decorate 19 Alignment 4
Decorate 22 Alignment 4
1: TypeInt 64 0
2: TypeVector 1(int) 3
3: TypePointer UniformConstant 2(ivec3)
5: TypeVoid
6: TypeInt 32 0
7: TypePointer WorkgroupGlobal 6(int)
8: TypeFunction 5 7(ptr) 7(ptr) 7(ptr)
4(LocalInvocationId): 3(ptr) Variable UniformConstant
9(add): 5 Function NoControl 8
10(in1): 7(ptr) FunctionParameter
11(in2): 7(ptr) FunctionParameter
12(out): 7(ptr) FunctionParameter
13(entry): Label
14: 2(ivec3) Load 4(LocalInvocationId)
15(call): 1(int) CompositeExtract 14 0
16(arrayidx): 7(ptr) InBoundsAccessChain 10(in1) 15(call)
17: 6(int) Load 16(arrayidx)
18(arrayidx1): 7(ptr) InBoundsAccessChain 11(in2) 15(call)
19: 6(int) Load 18(arrayidx1)
20(add): 6(int) IAdd 19 17
21(arrayidx2): 7(ptr) InBoundsAccessChain 12(out) 15(call)
Store 22 21(arrayidx2) 20
Return
FunctionEnd

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.