1)
Message boards :
Number crunching :
Error while computing with windows 10
(Message 159)
Posted 18 Oct 2019 by Michael Goetz Post: also what is "you should try PDW's instructions too" ?? Exactly what you just did. I'm afraid I don't have any other ideas. This isn't my app. |
2)
Message boards :
Number crunching :
Error while computing with windows 10
(Message 157)
Posted 17 Oct 2019 by Michael Goetz Post: Follow these steps in order:You don't need to do 3 and 4. You are correct, but I instruct people to do it this way because it's simpler to explain. Byron, you should try PDW's instructions too, but first double check (again) that VT-X is enabled in the BIOS. |
3)
Message boards :
Number crunching :
Error while computing with windows 10
(Message 141)
Posted 16 Oct 2019 by Michael Goetz Post: Follow these steps in order: 1. Turn on VT-X in your BIOS 2. Reboot 3. Detach from QuChemPedIA 4. Attach to QuChemPedIA That should fix it. |
4)
Message boards :
Number crunching :
Virtual Box
(Message 127)
Posted 14 Oct 2019 by Michael Goetz Post: I have attached to the project and have VirtualBox installed. But I've never used VirtualBox before. I receive this message when requesting work: There's a couple of things that could be causing a problem. First, go into your BIOS and make sure VT-X is turned on. If it is, great. If not, turn it on. Then reboot. Now, regardless of whether VT-X was turned on or not, DETACH the QuChemPedIA project. Next, attach QuChemPedIA again, and see if the error has gone away. VT-X must be turned on -- and BOINC (as of 7.14.x) will remember if you tried to run VBOX without VT-X and will never let you run it after that. This is why you need to detach from QuChemPedIA and then join the project again after turning on VT-X. |
5)
Message boards :
Number crunching :
Download Error's ?
(Message 122)
Posted 13 Oct 2019 by Michael Goetz Post: damotbe wrote: In the slot directory, you should find a bash script. can you paste the code here to check ? Thank you If that was directed at me, I'm afraid I won't be able to do that for a while. That computer is busy with other tasks, and since your tasks require (or at least request) a lot of memory, I can't run these tasks at the moment. |
6)
Message boards :
Number crunching :
Download Error's ?
(Message 119)
Posted 13 Oct 2019 by Michael Goetz Post: damotbe wrote: Affinity problem patch to use CPU correctly Sorry, but it's not fixed yet. This is with the linux (t2) v0.11 apps that went in yesterday: root@Stretch-Mel:~# ps -A | grep chem 29039 ? 00:22:49 nwchem 29040 ? 00:22:57 nwchem 30561 ? 00:04:39 nwchem 30562 ? 00:04:39 nwchem root@Stretch-Mel:~# taskset -p 29039 pid 29039's current affinity mask: 1 root@Stretch-Mel:~# taskset -p 29040 pid 29040's current affinity mask: 2 root@Stretch-Mel:~# taskset -p 30561 pid 30561's current affinity mask: 1 root@Stretch-Mel:~# taskset -p 30562 pid 30562's current affinity mask: 2 |
7)
Message boards :
Number crunching :
Download Error's ?
(Message 106)
Posted 12 Oct 2019 by Michael Goetz Post: It looks like a new version of the apps was just installed on the server. Since there's several different app_versions for linux, and you're seeing two completely different errors, maybe only one of the app_versions has a bad download file or signature, while one or more of the others downloads fine, but doesn't validate. |
8)
Message boards :
News :
Public opening
(Message 82)
Posted 9 Oct 2019 by Michael Goetz Post: We are pleased to announce the official opening of the quchempedia@home project. They're already being exported. I know Free-DC is picking up the stats: |
9)
Message boards :
Number crunching :
Monster wu
(Message 62)
Posted 7 Oct 2019 by Michael Goetz Post: Runtime is not predictable (not determinist), perhaps the prediction of the scheduler is totally erroneous. Same here. I don't know if they're broken or not, but they behave very differently than the other tasks. They seem to not be making progress. I've been aborting them. |
10)
Message boards :
Number crunching :
Monster wu
(Message 56)
Posted 6 Oct 2019 by Michael Goetz Post: The event log in BOINC manager indicates that the project is sending trickle-up messages as the work unit progresses. However, I also noticed that I am not receiving any credit for work done. Is this a problem? Trickles don't necessarily mean you get credit. CPDN does that, but it's the only project I know of that does. Most of PrimeGrid's apps trickle. We use that to let the server know that the task is progressing, and may extend the task deadline due to the trickles. But we never give out credit until the task is completed because it's not possible to determine if the result is correct until the end. Giving out credit in the middle is tricky at best, and close to impossible in many situations. I'd be surprised if any other projects try to do what CPDN did. It also serves as a disincentive to users to complete the tasks, so I suspect few projects would ever want to do this. |
11)
Message boards :
Number crunching :
Very little CPU usage
(Message 54)
Posted 6 Oct 2019 by Michael Goetz Post: The straightforward solution is to reactivate VM jobs for GNU/Linux system. I must be missing something here. Doesn't the problem exists when you're running two tasks simultaneously, regardless of whether it's the native app, or the vbox app? The native app is better than the vbox app because at least you have the option of manually changing the CPU affinity. If the linux app was also a VBOX app, you would lose the ability to manually correct the problem. |
12)
Message boards :
Number crunching :
Very little CPU usage
(Message 43)
Posted 5 Oct 2019 by Michael Goetz Post: @Michael : but running a VM it is consuming a certain amount of CPU just for "running the VM" (I always had "a good 10% CPU" in mind ?), so there you are paying twice the cost ? Um... I'm running the exact same VBOX VM as the QuChemPedIA "Windows" apps use, so the overhead from the VM is exactly the same. The only difference is I get to control what's going on inside the VM, as well as the machine characteristics (CPU cores, memory, network, etc.). In both cases, a native Linux app is running inside a VBOX Linux VM. When I'm actually in control of the VM, I can go in and fix the affinity problem. If I run the Windows app with its integrated VBOX Linux VM, I can't do that. And you are *vastly* overestimating the overhead from a virtual machine. They're exceptionally thin, lightweight programs that don't do much except for translating system calls. And our crunching apps don't do system calls, for the most part. Overhead from the VM itself is probably less than 1% when you're just crunching. It will be more, of course, if you're running a GUI in the VM and it has to translate screen drawing from Linux to Windows, but that's not applicable here. Addendum to the affinity fix: it seems the task switches processes every couple of hours, so the fix doesn't last until the task finishes. So... <?php exec("ps -A | grep chem", $tasks); foreach ($tasks as $task) { echo $task."\n"; sscanf($task, "%d", $taskid); exec("taskset -p 0xffffffff $taskid", $output); } echo implode("\n", $output)."\n"; ?> I run that short PHP program periodically via cron to make sure the QuChemPedIA apps are not tripping over their own feet while setting CPU affinities. I could easily rewrite that to actually set (rational) CPU affinities, but I don't see the need. The OS's thread scheduler should do a good enough job on its own. P.S. Most of the people over on the PrimeGrid Discord server who are running QuChemPedIA have decided the proper way to deal with this problem is to only run one QuChemPedIA task at a time. I guess I'm more stubborn than they are. :) |
13)
Message boards :
Number crunching :
Very little CPU usage
(Message 33)
Posted 4 Oct 2019 by Michael Goetz Post: So here's the problem: root@buster-64:~# ps -A |grep chem 13429 ? 00:36:02 nwchem 13430 ? 00:36:02 nwchem 13774 ? 00:34:37 nwchem 13775 ? 00:34:37 nwchem root@buster-64:~# taskset -p 13429 pid 13429's current affinity mask: 1 root@buster-64:~# taskset -p 13430 pid 13430's current affinity mask: 2 root@buster-64:~# taskset -p 13774 pid 13774's current affinity mask: 1 root@buster-64:~# taskset -p 13775 pid 13775's current affinity mask: 2 And the fix: root@buster-64:~# taskset -p 0xffffffff 13429 pid 13429's current affinity mask: 1 pid 13429's new affinity mask: f root@buster-64:~# taskset -p 0xffffffff 13430 pid 13430's current affinity mask: 2 pid 13430's new affinity mask: f root@buster-64:~# taskset -p 0xffffffff 13774 pid 13774's current affinity mask: 1 pid 13774's new affinity mask: f root@buster-64:~# taskset -p 0xffffffff 13775 pid 13775's current affinity mask: 2 pid 13775's new affinity mask: f That's assuming you're running the linux app. Even if you're running Windows, you can build your own Linux VM using the VBOX that came with BOINC, and then run linux BOINC apps inside *your* VBOX VM instead of BOINC's VBOX VM. That's what I'm doing. I'm running the native Linux apps on my Windows computer instead of running the Windows VBOX app. |
14)
Message boards :
Number crunching :
Very little CPU usage
(Message 32)
Posted 4 Oct 2019 by Michael Goetz Post: We work on this issue. Just turn off the flag to bind to specific CPU cores. Let the OS take care of thread scheduling. I've been seeing this happening today on the native Linux apps (not VBOX). The (t1) apps run (4 of them) at 25%. The (t2) apps run at 50%. The 4-core (mt) apps crash after a few seconds. They seem completely broken. |
15)
Message boards :
Number crunching :
No work?
(Message 9)
Posted 4 Oct 2019 by Michael Goetz Post: The title says it all. :) That aside, congratulations on getting the project started! |
©2023 Benoit DA MOTA - LERIA, University of Angers, France