Very little CPU usage

Message boards : Number crunching : Very little CPU usage
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Bryan

Send message
Joined: 3 Oct 19
Posts: 14
Credit: 32,908,253
RAC: 0
Message 23 - Posted: 4 Oct 2019, 14:04:02 UTC
Last modified: 4 Oct 2019, 14:04:52 UTC

I am running the linux native t1/t2 tasks on multiple machines.

1. On a 64t machine I'm only seeing 7% CPU usage even though the WU are using all threads. Is this the level that will be used until the WU finishes or is there a point that the WU will increase the CPU loading? I'm wondering if I can crunch other WU alongside so the threads don't sit idle 93% of the time.

2. The WU are reserving 1.2G of RAM yet they are only using 120Mb. Why reserve that much memory if it isn't needed?

I'm just trying to figure out the best way to crunch this project ... I'm not complaining.
ID: 23 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 3 Oct 19
Posts: 153
Credit: 32,412,973
RAC: 0
Message 25 - Posted: 4 Oct 2019, 14:39:25 UTC - in response to Message 23.  

I am seeing only 25% CPU usage when running two cores at a time on both an i7-4790 (Ubuntu 16.04) and a Ryzen 2700 (Ubuntu 18.04).

I am going to try one core at a time, by setting "Max # CPUs 1" to see if that would increase it.
ID: 25 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
damotbe
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Help desk expert

Send message
Joined: 23 Jul 19
Posts: 289
Credit: 464,119,561
RAC: 0
Message 27 - Posted: 4 Oct 2019, 16:15:18 UTC

We work on this issue.

On my system, it seems that the program bind the process to a certain core. If I take several jobs, they are all on the same core...
On which exact version of the app did you see this problem ?
ID: 27 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
damotbe
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Help desk expert

Send message
Joined: 23 Jul 19
Posts: 289
Credit: 464,119,561
RAC: 0
Message 28 - Posted: 4 Oct 2019, 16:23:24 UTC - in response to Message 27.  

For the RAM, we choose an upper bound to avoid to much crash (Amount of RAM needed depends of the problem in chemistry).
ID: 28 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 3 Oct 19
Posts: 153
Credit: 32,412,973
RAC: 0
Message 31 - Posted: 4 Oct 2019, 18:38:18 UTC - in response to Message 27.  
Last modified: 4 Oct 2019, 18:39:09 UTC

On which exact version of the app did you see this problem ?

I see it on 0.07 (t2) and 0.08 (t2). They are about the same.
I knew you would work on it. Good luck.
ID: 31 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Michael Goetz
Avatar

Send message
Joined: 4 Oct 19
Posts: 15
Credit: 70,119
RAC: 0
Message 32 - Posted: 4 Oct 2019, 19:11:58 UTC - in response to Message 27.  

We work on this issue.

On my system, it seems that the program bind the process to a certain core. If I take several jobs, they are all on the same core...
On which exact version of the app did you see this problem ?


Just turn off the flag to bind to specific CPU cores. Let the OS take care of thread scheduling.

I've been seeing this happening today on the native Linux apps (not VBOX). The (t1) apps run (4 of them) at 25%. The (t2) apps run at 50%. The 4-core (mt) apps crash after a few seconds. They seem completely broken.
Want to find one of the largest known primes? Try PrimeGrid. Or help cure disease at WCG.

ID: 32 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Michael Goetz
Avatar

Send message
Joined: 4 Oct 19
Posts: 15
Credit: 70,119
RAC: 0
Message 33 - Posted: 4 Oct 2019, 19:21:49 UTC
Last modified: 4 Oct 2019, 19:25:46 UTC

So here's the problem:

root@buster-64:~# ps -A |grep chem
13429 ?        00:36:02 nwchem
13430 ?        00:36:02 nwchem
13774 ?        00:34:37 nwchem
13775 ?        00:34:37 nwchem
root@buster-64:~# taskset -p 13429
pid 13429's current affinity mask: 1
root@buster-64:~# taskset -p 13430
pid 13430's current affinity mask: 2
root@buster-64:~# taskset -p 13774
pid 13774's current affinity mask: 1
root@buster-64:~# taskset -p 13775
pid 13775's current affinity mask: 2


And the fix:
root@buster-64:~# taskset -p 0xffffffff 13429
pid 13429's current affinity mask: 1
pid 13429's new affinity mask: f
root@buster-64:~# taskset -p 0xffffffff 13430
pid 13430's current affinity mask: 2
pid 13430's new affinity mask: f
root@buster-64:~# taskset -p 0xffffffff 13774
pid 13774's current affinity mask: 1
pid 13774's new affinity mask: f
root@buster-64:~# taskset -p 0xffffffff 13775
pid 13775's current affinity mask: 2
pid 13775's new affinity mask: f


That's assuming you're running the linux app. Even if you're running Windows, you can build your own Linux VM using the VBOX that came with BOINC, and then run linux BOINC apps inside *your* VBOX VM instead of BOINC's VBOX VM. That's what I'm doing. I'm running the native Linux apps on my Windows computer instead of running the Windows VBOX app.
Want to find one of the largest known primes? Try PrimeGrid. Or help cure disease at WCG.

ID: 33 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bryan

Send message
Joined: 3 Oct 19
Posts: 14
Credit: 32,908,253
RAC: 0
Message 34 - Posted: 4 Oct 2019, 19:32:51 UTC - in response to Message 33.  
Last modified: 4 Oct 2019, 19:45:45 UTC

On a 2 CPU 64t machine that is running 2 BOINC instances I'm showing 7% CPU utilization. Each instance is assigned 32 threads.

Looking at HTOP it is showing only 4 threads being utilized - each of the 4 is at 100% loading. 2 threads are active on each processor.
ID: 34 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[AF>Le_Pommier] Jerome_C2005

Send message
Joined: 26 Aug 19
Posts: 15
Credit: 1,265,326
RAC: 0
Message 40 - Posted: 5 Oct 2019, 13:28:11 UTC

@Michael : but running a VM it is consuming a certain amount of CPU just for "running the VM" (I always had "a good 10% CPU" in mind ?), so there you are paying twice the cost ?

Of course this is better than "loosing most the CPU" that the current versions of the app seem not able to use properly in certain conditions (obviously linked to parallel multi-tasks) but it cannot be a long term solution, right ? (whatever the reason, let's think it will be fixed)
ID: 40 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Michael Goetz
Avatar

Send message
Joined: 4 Oct 19
Posts: 15
Credit: 70,119
RAC: 0
Message 43 - Posted: 5 Oct 2019, 15:07:07 UTC - in response to Message 40.  
Last modified: 5 Oct 2019, 15:13:40 UTC

@Michael : but running a VM it is consuming a certain amount of CPU just for "running the VM" (I always had "a good 10% CPU" in mind ?), so there you are paying twice the cost ?

Of course this is better than "loosing most the CPU" that the current versions of the app seem not able to use properly in certain conditions (obviously linked to parallel multi-tasks) but it cannot be a long term solution, right ? (whatever the reason, let's think it will be fixed)


Um... I'm running the exact same VBOX VM as the QuChemPedIA "Windows" apps use, so the overhead from the VM is exactly the same. The only difference is I get to control what's going on inside the VM, as well as the machine characteristics (CPU cores, memory, network, etc.). In both cases, a native Linux app is running inside a VBOX Linux VM. When I'm actually in control of the VM, I can go in and fix the affinity problem. If I run the Windows app with its integrated VBOX Linux VM, I can't do that.

And you are *vastly* overestimating the overhead from a virtual machine. They're exceptionally thin, lightweight programs that don't do much except for translating system calls. And our crunching apps don't do system calls, for the most part. Overhead from the VM itself is probably less than 1% when you're just crunching. It will be more, of course, if you're running a GUI in the VM and it has to translate screen drawing from Linux to Windows, but that's not applicable here.

Addendum to the affinity fix: it seems the task switches processes every couple of hours, so the fix doesn't last until the task finishes. So...

<?php
exec("ps -A | grep chem", $tasks);
foreach ($tasks as $task) {
  echo $task."\n";
  sscanf($task, "%d", $taskid);
  exec("taskset -p 0xffffffff $taskid", $output);
}
echo implode("\n", $output)."\n";
?>


I run that short PHP program periodically via cron to make sure the QuChemPedIA apps are not tripping over their own feet while setting CPU affinities. I could easily rewrite that to actually set (rational) CPU affinities, but I don't see the need. The OS's thread scheduler should do a good enough job on its own.

P.S. Most of the people over on the PrimeGrid Discord server who are running QuChemPedIA have decided the proper way to deal with this problem is to only run one QuChemPedIA task at a time. I guess I'm more stubborn than they are. :)
Want to find one of the largest known primes? Try PrimeGrid. Or help cure disease at WCG.

ID: 43 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
damotbe
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Help desk expert

Send message
Joined: 23 Jul 19
Posts: 289
Credit: 464,119,561
RAC: 0
Message 46 - Posted: 6 Oct 2019, 8:04:27 UTC - in response to Message 43.  

Thank you for your returns. We indentify and investigate for a solution to this problem. The straightforward solution is to reactivate VM jobs for GNU/Linux system.

To complete Michael Goetz response, VM have a very small overhead. and worse than that, the native version is statically compiled. I am almost certain that the performance is sub-optimal compared to optimized versions of libraries in the VM.
ID: 46 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[AF>Le_Pommier] Jerome_C2005

Send message
Joined: 26 Aug 19
Posts: 15
Credit: 1,265,326
RAC: 0
Message 50 - Posted: 6 Oct 2019, 11:26:46 UTC

First thing Michael : sorry I misunderstood, having your own linux VM (on your windows) to run the native non VM linux task should be more or less equivalent than the "native VM windows task", agreed.

About the overhead of running *anything* in a VM on a host system : I very strongly contest that you only "loose" 1% CPU. I have some experience in using VM, the reason why I switched to Mac back in the days is when they turned to Intel that allowed windows VM on the Mac, and during several years I used it intensively (not anymore) and the "VM own processes" (the programs used to run and encapsulate whatever you are doing in the VM) were always using a fair amount of CPU (VirtualBox processes, or Parallels Desktop processes).

And with boinc I can see the same happening with that VB process that encapsulates the boinc VM tasks (I don't have any at the moment and I forgot its name on the Mac), that process (plus another one, I need to test again) always eats a good amount of CPU outside of the VM boinc tasks themselves. I'll have a closer look after we finish our AF RAID in a week, I'll try to run QCPIA again on my Mac.


Damotbe : if you switch back with the VM app for linux again I won't be able to help anymore, I only run it on the OVH linux VM machine that I have, I cannot run any VM in it.
ID: 50 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zombie67 [MM]
Avatar

Send message
Joined: 3 Oct 19
Posts: 11
Credit: 5,443,793
RAC: 0
Message 52 - Posted: 6 Oct 2019, 14:48:50 UTC

vbox now allows VM to run inside VM.
Reno, NV
Team: SETI.USA
ID: 52 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 3 Oct 19
Posts: 153
Credit: 32,412,973
RAC: 0
Message 53 - Posted: 6 Oct 2019, 15:34:39 UTC - in response to Message 46.  

The straightforward solution is to reactivate VM jobs for GNU/Linux system.

OK with me, if that is what works best for you and the science.
ID: 53 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Michael Goetz
Avatar

Send message
Joined: 4 Oct 19
Posts: 15
Credit: 70,119
RAC: 0
Message 54 - Posted: 6 Oct 2019, 15:48:27 UTC - in response to Message 53.  

The straightforward solution is to reactivate VM jobs for GNU/Linux system.

OK with me, if that is what works best for you and the science.


I must be missing something here. Doesn't the problem exists when you're running two tasks simultaneously, regardless of whether it's the native app, or the vbox app?

The native app is better than the vbox app because at least you have the option of manually changing the CPU affinity. If the linux app was also a VBOX app, you would lose the ability to manually correct the problem.
Want to find one of the largest known primes? Try PrimeGrid. Or help cure disease at WCG.

ID: 54 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bryan

Send message
Joined: 3 Oct 19
Posts: 14
Credit: 32,908,253
RAC: 0
Message 60 - Posted: 7 Oct 2019, 13:48:31 UTC
Last modified: 7 Oct 2019, 13:50:21 UTC

There is absolutely nothing wrong with the native app other than the assigning of core/thread affinity. Take that out of the executable so machines can use all threads and you have a winner. Like Michael, I've been running it with a affinity script and it works quite well.

I don't run VBox projects unless I have no alternative.
ID: 60 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[AF>Le_Pommier] Jerome_C2005

Send message
Joined: 26 Aug 19
Posts: 15
Credit: 1,265,326
RAC: 0
Message 67 - Posted: 7 Oct 2019, 20:41:15 UTC - in response to Message 52.  

vbox now allows VM to run inside VM.

If you refer to the last remark in my previous message, I was meaning "you cannot run any kind of virtualization software from a virtualized environment (like the linux VM (/cloud) machine I'm renting to a service provider called OVH)", I even tried installing VB inside of it (it will accept it), but when trying to run any actual VM inside of it, it will crash the whole host (VM) machine...

This is not the same as "VB inside VB", AFAIK (?)
ID: 67 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bryan

Send message
Joined: 3 Oct 19
Posts: 14
Credit: 32,908,253
RAC: 0
Message 68 - Posted: 8 Oct 2019, 1:20:09 UTC - in response to Message 67.  

Some hypervisors will run a VM inside of a VM and other do not support it.
ID: 68 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zombie67 [MM]
Avatar

Send message
Joined: 3 Oct 19
Posts: 11
Credit: 5,443,793
RAC: 0
Message 69 - Posted: 8 Oct 2019, 6:02:20 UTC - in response to Message 60.  
Last modified: 8 Oct 2019, 6:04:18 UTC

There is absolutely nothing wrong with the native app other than the assigning of core/thread affinity. Take that out of the executable so machines can use all threads and you have a winner. Like Michael, I've been running it with a affinity script and it works quite well.


Yes please! Let's do that.
ID: 69 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
damotbe
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Help desk expert

Send message
Joined: 23 Jul 19
Posts: 289
Credit: 464,119,561
RAC: 0
Message 73 - Posted: 8 Oct 2019, 7:49:59 UTC - in response to Message 69.  

we've been experiencing affinity problems since the beginning of the tests. We already test the use of all threads and performance are very very bad (worst than mono-thread application in certain cases).

Maybe can we include a sort of affinity script somehow or compile nwChem without MPI.
ID: 73 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : Very little CPU usage

©2024 Benoit DA MOTA - LERIA, University of Angers, France