Posts by Michael Goetz

1) Message boards : Number crunching : Error while computing with windows 10 (Message 159)
Posted 18 Oct 2019 by Michael Goetz
Post:
also what is "you should try PDW's instructions too" ??


Exactly what you just did.

I'm afraid I don't have any other ideas. This isn't my app.
2) Message boards : Number crunching : Error while computing with windows 10 (Message 157)
Posted 17 Oct 2019 by Michael Goetz
Post:
Follow these steps in order:
1. Turn on VT-X in your BIOS
2. Reboot
3. Detach from QuChemPedIA
4. Attach to QuChemPedIA

That should fix it.
You don't need to do 3 and 4.
You do need to reset the p_vm_extensions_disabled tag in the client_state.xml file.


You are correct, but I instruct people to do it this way because it's simpler to explain.

Byron, you should try PDW's instructions too, but first double check (again) that VT-X is enabled in the BIOS.
3) Message boards : Number crunching : Error while computing with windows 10 (Message 141)
Posted 16 Oct 2019 by Michael Goetz
Post:
Follow these steps in order:
1. Turn on VT-X in your BIOS
2. Reboot
3. Detach from QuChemPedIA
4. Attach to QuChemPedIA

That should fix it.
4) Message boards : Number crunching : Virtual Box (Message 127)
Posted 14 Oct 2019 by Michael Goetz
Post:
I have attached to the project and have VirtualBox installed. But I've never used VirtualBox before. I receive this message when requesting work:
10/14/2019 2:06:52 PM | QuChemPedIA@home | Message from server: VirtualBox is not installed
I let VB install into its own folder, is that my problem? Does it need to be in the Boinc folder?


There's a couple of things that could be causing a problem.

First, go into your BIOS and make sure VT-X is turned on. If it is, great. If not, turn it on.

Then reboot.

Now, regardless of whether VT-X was turned on or not, DETACH the QuChemPedIA project.

Next, attach QuChemPedIA again, and see if the error has gone away.

VT-X must be turned on -- and BOINC (as of 7.14.x) will remember if you tried to run VBOX without VT-X and will never let you run it after that. This is why you need to detach from QuChemPedIA and then join the project again after turning on VT-X.
5) Message boards : Number crunching : Download Error's ? (Message 122)
Posted 13 Oct 2019 by Michael Goetz
Post:
damotbe wrote:
In the slot directory, you should find a bash script. can you paste the code here to check ? Thank you


If that was directed at me, I'm afraid I won't be able to do that for a while. That computer is busy with other tasks, and since your tasks require (or at least request) a lot of memory, I can't run these tasks at the moment.
6) Message boards : Number crunching : Download Error's ? (Message 119)
Posted 13 Oct 2019 by Michael Goetz
Post:
damotbe wrote:
Affinity problem patch to use CPU correctly


Sorry, but it's not fixed yet. This is with the linux (t2) v0.11 apps that went in yesterday:

root@Stretch-Mel:~# ps -A | grep chem
29039 ?        00:22:49 nwchem
29040 ?        00:22:57 nwchem
30561 ?        00:04:39 nwchem
30562 ?        00:04:39 nwchem
root@Stretch-Mel:~# taskset -p 29039
pid 29039's current affinity mask: 1
root@Stretch-Mel:~# taskset -p 29040
pid 29040's current affinity mask: 2
root@Stretch-Mel:~# taskset -p 30561
pid 30561's current affinity mask: 1
root@Stretch-Mel:~# taskset -p 30562
pid 30562's current affinity mask: 2
7) Message boards : Number crunching : Download Error's ? (Message 106)
Posted 12 Oct 2019 by Michael Goetz
Post:
It looks like a new version of the apps was just installed on the server. Since there's several different app_versions for linux, and you're seeing two completely different errors, maybe only one of the app_versions has a bad download file or signature, while one or more of the others downloads fine, but doesn't validate.
8) Message boards : News : Public opening (Message 82)
Posted 9 Oct 2019 by Michael Goetz
Post:
We are pleased to announce the official opening of the quchempedia@home project.
Thank you for your precious help !


Will you go exporting stats too? This in order to get our scores visible on the stats sites.


They're already being exported. I know Free-DC is picking up the stats:

9) Message boards : Number crunching : Monster wu (Message 62)
Posted 7 Oct 2019 by Michael Goetz
Post:
Runtime is not predictable (not determinist), perhaps the prediction of the scheduler is totally erroneous.


Ok, but after 24hs i'm at less than 5%.
It seems that prediction of the scheduler is ok.....and i'll go out of time.

I've had a few on the native Linux app that were behaving the same way. Decided to just let them run and they finished way before the estimated finish and credited just fine. YMMV


Same here. I don't know if they're broken or not, but they behave very differently than the other tasks. They seem to not be making progress. I've been aborting them.
10) Message boards : Number crunching : Monster wu (Message 56)
Posted 6 Oct 2019 by Michael Goetz
Post:
The event log in BOINC manager indicates that the project is sending trickle-up messages as the work unit progresses. However, I also noticed that I am not receiving any credit for work done. Is this a problem?


Trickles don't necessarily mean you get credit. CPDN does that, but it's the only project I know of that does.

Most of PrimeGrid's apps trickle. We use that to let the server know that the task is progressing, and may extend the task deadline due to the trickles. But we never give out credit until the task is completed because it's not possible to determine if the result is correct until the end.

Giving out credit in the middle is tricky at best, and close to impossible in many situations. I'd be surprised if any other projects try to do what CPDN did. It also serves as a disincentive to users to complete the tasks, so I suspect few projects would ever want to do this.
11) Message boards : Number crunching : Very little CPU usage (Message 54)
Posted 6 Oct 2019 by Michael Goetz
Post:
The straightforward solution is to reactivate VM jobs for GNU/Linux system.

OK with me, if that is what works best for you and the science.


I must be missing something here. Doesn't the problem exists when you're running two tasks simultaneously, regardless of whether it's the native app, or the vbox app?

The native app is better than the vbox app because at least you have the option of manually changing the CPU affinity. If the linux app was also a VBOX app, you would lose the ability to manually correct the problem.
12) Message boards : Number crunching : Very little CPU usage (Message 43)
Posted 5 Oct 2019 by Michael Goetz
Post:
@Michael : but running a VM it is consuming a certain amount of CPU just for "running the VM" (I always had "a good 10% CPU" in mind ?), so there you are paying twice the cost ?

Of course this is better than "loosing most the CPU" that the current versions of the app seem not able to use properly in certain conditions (obviously linked to parallel multi-tasks) but it cannot be a long term solution, right ? (whatever the reason, let's think it will be fixed)


Um... I'm running the exact same VBOX VM as the QuChemPedIA "Windows" apps use, so the overhead from the VM is exactly the same. The only difference is I get to control what's going on inside the VM, as well as the machine characteristics (CPU cores, memory, network, etc.). In both cases, a native Linux app is running inside a VBOX Linux VM. When I'm actually in control of the VM, I can go in and fix the affinity problem. If I run the Windows app with its integrated VBOX Linux VM, I can't do that.

And you are *vastly* overestimating the overhead from a virtual machine. They're exceptionally thin, lightweight programs that don't do much except for translating system calls. And our crunching apps don't do system calls, for the most part. Overhead from the VM itself is probably less than 1% when you're just crunching. It will be more, of course, if you're running a GUI in the VM and it has to translate screen drawing from Linux to Windows, but that's not applicable here.

Addendum to the affinity fix: it seems the task switches processes every couple of hours, so the fix doesn't last until the task finishes. So...

<?php
exec("ps -A | grep chem", $tasks);
foreach ($tasks as $task) {
  echo $task."\n";
  sscanf($task, "%d", $taskid);
  exec("taskset -p 0xffffffff $taskid", $output);
}
echo implode("\n", $output)."\n";
?>


I run that short PHP program periodically via cron to make sure the QuChemPedIA apps are not tripping over their own feet while setting CPU affinities. I could easily rewrite that to actually set (rational) CPU affinities, but I don't see the need. The OS's thread scheduler should do a good enough job on its own.

P.S. Most of the people over on the PrimeGrid Discord server who are running QuChemPedIA have decided the proper way to deal with this problem is to only run one QuChemPedIA task at a time. I guess I'm more stubborn than they are. :)
13) Message boards : Number crunching : Very little CPU usage (Message 33)
Posted 4 Oct 2019 by Michael Goetz
Post:
So here's the problem:

root@buster-64:~# ps -A |grep chem
13429 ?        00:36:02 nwchem
13430 ?        00:36:02 nwchem
13774 ?        00:34:37 nwchem
13775 ?        00:34:37 nwchem
root@buster-64:~# taskset -p 13429
pid 13429's current affinity mask: 1
root@buster-64:~# taskset -p 13430
pid 13430's current affinity mask: 2
root@buster-64:~# taskset -p 13774
pid 13774's current affinity mask: 1
root@buster-64:~# taskset -p 13775
pid 13775's current affinity mask: 2


And the fix:
root@buster-64:~# taskset -p 0xffffffff 13429
pid 13429's current affinity mask: 1
pid 13429's new affinity mask: f
root@buster-64:~# taskset -p 0xffffffff 13430
pid 13430's current affinity mask: 2
pid 13430's new affinity mask: f
root@buster-64:~# taskset -p 0xffffffff 13774
pid 13774's current affinity mask: 1
pid 13774's new affinity mask: f
root@buster-64:~# taskset -p 0xffffffff 13775
pid 13775's current affinity mask: 2
pid 13775's new affinity mask: f


That's assuming you're running the linux app. Even if you're running Windows, you can build your own Linux VM using the VBOX that came with BOINC, and then run linux BOINC apps inside *your* VBOX VM instead of BOINC's VBOX VM. That's what I'm doing. I'm running the native Linux apps on my Windows computer instead of running the Windows VBOX app.
14) Message boards : Number crunching : Very little CPU usage (Message 32)
Posted 4 Oct 2019 by Michael Goetz
Post:
We work on this issue.

On my system, it seems that the program bind the process to a certain core. If I take several jobs, they are all on the same core...
On which exact version of the app did you see this problem ?


Just turn off the flag to bind to specific CPU cores. Let the OS take care of thread scheduling.

I've been seeing this happening today on the native Linux apps (not VBOX). The (t1) apps run (4 of them) at 25%. The (t2) apps run at 50%. The 4-core (mt) apps crash after a few seconds. They seem completely broken.
15) Message boards : Number crunching : No work? (Message 9)
Posted 4 Oct 2019 by Michael Goetz
Post:
The title says it all. :)

That aside, congratulations on getting the project started!




©2024 Benoit DA MOTA - LERIA, University of Angers, France