Message boards :
Number crunching :
Application 0.15 (t1) (beta test) Result Not Completing Successfully
Message board moderation
Author | Message |
---|---|
Send message Joined: 4 Oct 19 Posts: 8 Credit: 3,108,300 RAC: 0 |
All of my 0.15 (t1) (beta test) results are not doing calculations, but continue running anyway and do not stop. Will need to abort them it appears. |
Send message Joined: 14 Dec 19 Posts: 68 Credit: 45,744,261 RAC: 0 |
I'm seeing a lot of 0.15s that just sit there near completion but little to no CPU usage. Do they need to meditate before ascending??? |
Send message Joined: 3 Jan 20 Posts: 5 Credit: 31,342,930 RAC: 0 |
+1 |
Send message Joined: 23 Jul 19 Posts: 289 Credit: 464,119,561 RAC: 0 |
I just test the last Boinc wrapper version because in the 0.14 it is a very old one (available on the boinc website). Seems that the version on github is bogus... I downgrade to 0.14, you can cancel jobs |
Send message Joined: 4 Oct 19 Posts: 8 Credit: 3,108,300 RAC: 0 |
Can't seem to get any work units. |
Send message Joined: 3 Oct 19 Posts: 153 Credit: 32,412,973 RAC: 0 |
Do they need to meditate before ascending??? I will be running them on a Zen core. It might work. |
Send message Joined: 14 Dec 19 Posts: 68 Credit: 45,744,261 RAC: 0 |
Are 0.16s and 0.17s behaving like 0.15s and stalling at the last moment??? |
Send message Joined: 3 Oct 19 Posts: 153 Credit: 32,412,973 RAC: 0 |
The 0.16 don't work either. I have over 20 of them running on two Ubuntu 18.04 machines, and they are up to about four hours with 0% CPU usage. I am aborting them. |
Send message Joined: 14 Dec 19 Posts: 68 Credit: 45,744,261 RAC: 0 |
They're really piling up. Time for a server-side abort signal. |
Send message Joined: 23 Jul 19 Posts: 289 Credit: 464,119,561 RAC: 0 |
Finally, the 0.17 is working well ! I don't know how to send an abort signal from server side with a condition on version. I Look at the documentation. |
Send message Joined: 23 Jul 19 Posts: 289 Credit: 464,119,561 RAC: 0 |
I Look at the documentation. Nothing planed to abort only computation with a given app version id... All these problems, because I tried to use the official boinc wrapper on github... 0.17 is a rollback to the old wrapper plus a sentry process (that I just coded) to patch bogus signal of this version. From the beginning, our application works and we have a lot of problems with the official BOINC tools. |
Send message Joined: 9 Dec 19 Posts: 11 Credit: 19,162,966 RAC: 0 |
Hi, how do I tell between good and bad units? On a few machines that were experiencing low CPU utilization or unusually long run times, I aborted them all, but I'd rather not make a habit of that. |
Send message Joined: 23 Jul 19 Posts: 289 Credit: 464,119,561 RAC: 0 |
0.15 and 0.16 are bogus on Linux Windows and Mac depends of official vbox_wrapper, so they could idle instead of computing... |
Send message Joined: 3 Oct 19 Posts: 153 Credit: 32,412,973 RAC: 0 |
Yes, 0.17 is working great. We are in business again. |
Send message Joined: 14 Dec 19 Posts: 68 Credit: 45,744,261 RAC: 0 |
0.15 and 0.16 are bogus on Linux Then a Linux host should not even be sent a windoze WU. Right? |
Send message Joined: 23 Jul 19 Posts: 289 Credit: 464,119,561 RAC: 0 |
0.15 and 0.16 are bogus on Linux Normally, no. |
Send message Joined: 12 Jan 20 Posts: 12 Credit: 220,914 RAC: 0 |
As i see, the project sends one WU to linux and Windows in many cases: Example |
Send message Joined: 23 Jul 19 Posts: 289 Credit: 464,119,561 RAC: 0 |
To be clear, one WU is sent to two volunteers, regardless of their operating system. It can be heterogeneous like the example you're showing. On the other hand, a Linux host will only receive calculations with the application for Linux. |
Send message Joined: 7 Oct 19 Posts: 8 Credit: 2,016,759 RAC: 0 |
All errors so far they don't see to run very long ... 1719898 1101058 485 13 Feb 2020, 22:34:27 UTC 13 Feb 2020, 22:35:34 UTC Validate error 6.10 1.68 --- NWChem long v0.18 (t1) x86_64-pc-linux-gnu 1719741 1101036 485 13 Feb 2020, 18:49:07 UTC 13 Feb 2020, 18:51:07 UTC Validate error 5.30 1.67 --- NWChem long v0.18 (t1) x86_64-pc-linux-gnu 1718999 1101014 485 13 Feb 2020, 10:41:25 UTC 13 Feb 2020, 10:42:33 UTC Validate error 4.26 0.00 --- NWChem long v0.18 (t2) x86_64-pc-linux-gnu 1718890 1101013 485 13 Feb 2020, 9:12:08 UTC 13 Feb 2020, 9:15:40 UTC Validate error 3.32 0.00 --- NWChem long v0.18 (t4) x86_64-pc-linux-gnu 1716552 1101042 485 12 Feb 2020, 21:55:03 UTC 13 Feb 2020, 0:10:34 UTC Validate error 4.33 0.00 --- NWChem long v0.18 (t2) x86_64-pc-linux-gnu |
Send message Joined: 3 Oct 19 Posts: 43 Credit: 40,548,179 RAC: 0 |
All errors so far they don't seem to run very long ... You want to get yourself one of these... https://quchempedia.univ-angers.fr/athome/workunit.php?wuid=1101056 I assume it is a validation problem again but I've let it run as it is for testing and I just got given a new shiny badge for testing :) |
©2024 Benoit DA MOTA - LERIA, University of Angers, France