Wondering why old phonne are faster than new one

hterrolle · December 10, 2024, 8:16pm

I bought a xiaomi 13T pro with ARM immortalis G715 GPU and is running slow than my old hauwei honnor pro using OpenCL. I am working with android.

There is few solution 1) i need to update my OpenCL code with new code. 2)nothing is done on phonne to improve OpenCL perforlmance. 3)phonne are already using OpenCL for other thing.

By the way i got the same problem With multitheading wich is very slow.

How can i explain this.

hterrolle · December 11, 2024, 10:05am

I ask the question differently.

Is code writen in 2019 using C++ and ARM mali G72 need to be modified to use ARM mali immortalis G715.

i read that ARM said “between the use of SVM buffer and CL buffer”. So, do i need to use SVM rather than CL_buffer ? or the difference should be not so important.

By the way the hauwei honnor play is nerly twice faster than the xiaomi 13T pro.

And it look like something is wrong somewhere, but where ?

hterrolle · December 11, 2024, 12:19pm

adding

#pragma OPENCL EXTENSION cl_khr_priority_hints : enable

Improve the speed of the kernal execution and at the same time the multithread execution. It is not so much but i can get between 20/30% of speed amélioration.

But it is still 20% slower than the mali G72 with Kirm processor.

locus · December 12, 2024, 3:02pm

Code written for the ARM Mali-G72 might need adjustments to run optimally on the ARM Mali-Immortalis G715. Specifically:

SVM vs. CL Buffers: If the G715 supports Shared Virtual Memory (SVM) and you’re using OpenCL, leveraging SVM can improve performance and simplify memory management. However, if you continue with CL buffers, it may still work but might not fully utilize the G715’s capabilities.
Performance Difference: The Huawei Honor Play outperforming the Xiaomi 13T Pro suggests potential issues in driver optimization, thermal throttling, or inefficiencies in your code. Profiling the application on the G715 could help identify bottlenecks.

Focus on ensuring your code matches the new architecture’s best practices and verify system-level factors like drivers and thermal management.

4o

hterrolle · December 16, 2024, 2:55pm

Thanks for the answer,

by the way, adding this improved the speed of the kernel.

#define CL_HPP_USE_IL_KHR
#define CL_HPP_USE_CL_SUB_GROUPS_KHR

i check thermal using arm performance studio and i got No throttle = 100%
But for the rest i do not know what to look for.

xiaomi do not use all the processor, what does hauwei.
xiaomi very slow with cl::NDRange(16,16) need to use cl::NDRange(1,1) on some kernel.

I think it is just the difference between hauwei implementation and xiaomi implementation of libGLES_mali.so. Because if i use the libOpenCL.so comming from hauwei it run on the xiaomi.

And all the file from xiaomi are dated to 1jan 2009. May be this is also a problem.

hterrolle · December 18, 2024, 2:50pm

I check the performance using arm performanc syudio and it look like as soon as your are using OpenCL on xiaomi it only run 4 core.
But if you run only OpenGL it run 8 core. And if i run OpenCL and OpenGL at same time it only use 4 core and the kernel are even slower.

By the way OpenGL performance look good.

I think i found the problem but not the solution. ;))

hterrolle · December 22, 2024, 1:41pm

It is definitly a problem with android. Compiling with different NDK toolchain i get different performance. It is an android mess with OpenCL. They do not like it. ;))

hterrolle · January 11, 2025, 1:26pm

Hi,

The problem for the xiaomi is that it need to be compiled whit 64bit compiler to get good performance.

It was not easy to managed to do it under android but at the end i made it.

All the core are running and i can get nearly the same performance ttha the hauwei honnor play. But hauwei still more stable in time frame. Xiaomi can veried from 35ms to 80ms from one frame to another. But in average let said it is the same speed.

hterrolle · January 23, 2025, 10:26am

hi,

I need to apologize for xiaomi because increasing the volume of data to be processed. The xiaomi is a lot faster than the hauwei, 50% faster. And specially the CPU multithreading for (4 pthread) on my last test.

hterrolle · February 15, 2025, 12:08pm

last test after all optimization.

xiaomi 13T pro is really fast. 3 to 4 time faster than my old hauwei mixing GPU and CPU processing.