I am working on optimizing a library using OpenMP. I benchmark the library on two different platforms:
In order to execute code on the phone, I just cross compile everything on my workstation and control the benchmarks with scripts that use adb. However, I had some issues getting everything optimized as I wanted i.e. close to an 8 theoretical speedup on the phone. The explanation would be the CPU usage when performing a simple Matrix multiplication operation. I have this basic code that helps me measuring the usage :
#include <cstdio>
#include <cstdlib>
// For custom types
#include "smu/core.h"
int main(void) {
long double cpua[4], cpub[4], loadavg;
FILE *fp;
char dump[50];
// Setting matrices
int32 nr = 500;
int32 nc = 500;
float32 *a = (float32*)malloc(nr * nc * sizeof(float32));
float32 *b = (float32*)malloc(nr * nc * sizeof(float32));
float32 *c = (float32*)malloc(nr * nc * sizeof(float32));
for (int32 i = 0; i < nr; ++i) {
float32 *adata = a + i * nc;
float32 *bdata = b + i * nc;
int32 cache_nc = nc;
for (int32 j = 0; j < cache_nc; ++j) {
adata[j] = (float32)rand() / (float32)RAND_MAX * 100.;
bdata[j] = (float32)rand() / (float32)RAND_MAX * 100. - 50.;
}
}
for(;;) {
fp = fopen("/proc/stat", "r");
fscanf(fp,"%*s %Lf %Lf %Lf %Lf", &cpua[0], &cpua[1], &cpua[2], &cpua[3]);
fclose(fp);
for (int32 i = 0; i < nr ; ++i) {
int32 cache_nc = nc;
float32 *adata = a + i * cache_nc;
float32 *cdata = c + i * cache_nc;
for (int32 j = 0; j < cache_nc; ++j) {
cdata[j] = 0.;
for (int32 k = 0; k < cache_nc; ++k)
cdata[j] += adata[k] * b[k * cache_nc + j];
}
}
fp = fopen("/proc/stat", "r");
fscanf(fp,"%*s %Lf %Lf %Lf %Lf", &cpub[0], &cpub[1], &cpub[2], &cpub[3]);
fclose(fp);
loadavg = ((cpub[0] + cpub[1] + cpub[2]) - (cpua[0] + cpua[1] + cpua[2])) /
((cpub[0] + cpub[1] + cpub[2] + cpub[3]) - (cpua[0] + cpua[1] + cpua[2] + cpua[3]));
printf("CPU usage : %Lf\n", loadavg);
fp = fopen("/proc/stat", "r");
fscanf(fp,"%*s %Lf %Lf %Lf %Lf", &cpua[0], &cpua[1], &cpua[2], &cpua[3]);
fclose(fp);
#pragma omp parallel for num_threads(8) schedule(dynamic, 1)
for (int32 i = 0; i < nr ; ++i) {
int32 cache_nc = nc;
float32 *adata = a + i * cache_nc;
float32 *cdata = c + i * cache_nc;
for (int32 j = 0; j < cache_nc; ++j) {
cdata[j] = 0.;
for (int32 k = 0; k < cache_nc; ++k)
cdata[j] += adata[k] * b[k * cache_nc + j];
}
}
fp = fopen("/proc/stat", "r");
fscanf(fp,"%*s %Lf %Lf %Lf %Lf", &cpub[0], &cpub[1], &cpub[2], &cpub[3]);
fclose(fp);
loadavg = ((cpub[0] + cpub[1] + cpub[2]) - (cpua[0] + cpua[1] + cpua[2])) /
((cpub[0] + cpub[1] + cpub[2] + cpub[3]) - (cpua[0] + cpua[1] + cpua[2] + cpua[3]));
printf("CPU usage with OpenMP : %Lf\n", loadavg);
}
free(a);
free(b);
free(c);
return(0);
}
On my x86 workstation, the results are as expected:
CPU usage : 0.267606
CPU usage with OpenMP : 1.000000
CPU usage : 0.271429
CPU usage with OpenMP : 1.000000
While on the phone it seems, it cannot get all the cores at once:
CPU usage : 0.143388
CPU usage with OpenMP : 0.495968
CPU usage : 0.129955
CPU usage with OpenMP : 0.496626
That is strange as the No OpenMP usage let me think as only 1 on 8 cores is used. I checked the OpenMP platform info and he can see correctly 8 cores on the Honor 5c.
My questions are:
EDIT:
I've tried to see directly in the OS how he handle the cores by executing this simple script:
#!/system/bin/sh
i=0
while : ; do
i=$(($i + 1))
done
And even having 8 threads running it would result in maximum 50% of CPU usage.
I read this article explaining that there could be several OS in a phone making only one of them usable. In my case it would be 1 per group of 4 cores. But then I don't understand why OpenMP would see 8 cores...
Firebase Cloud Functions: PubSub, "res.on is not a function"
TypeError: Cannot read properties of undefined (reading 'createMessageComponentCollector')
I have problem to get some code for App Invites for AndroidI've searched a lot here and the Web, but I can not find a solution to the problem
I'm using the mercadopago's API to process payments, when doing a basic checkout using the startCheckoutActivity method, I get a warning that says an email is mandatory, code example:
I tried to implement instrumentation test for my android projectIn my main project, I used dagger and retrofit and I provide the retrofit from my Module