Application 0.15 (t1) (beta test) Result Not Completing Successfully

Message boards : Number crunching : Application 0.15 (t1) (beta test) Result Not Completing Successfully
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
rbpeake

Send message
Joined: 4 Oct 19
Posts: 8
Credit: 3,108,300
RAC: 0
Message 506 - Posted: 6 Feb 2020, 16:25:38 UTC

All of my 0.15 (t1) (beta test) results are not doing calculations, but continue running anyway and do not stop. Will need to abort them it appears.
ID: 506 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 14 Dec 19
Posts: 68
Credit: 45,744,261
RAC: 0
Message 507 - Posted: 6 Feb 2020, 17:43:39 UTC
Last modified: 6 Feb 2020, 17:43:58 UTC

I'm seeing a lot of 0.15s that just sit there near completion but little to no CPU usage. Do they need to meditate before ascending???
ID: 507 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Trotador

Send message
Joined: 3 Jan 20
Posts: 5
Credit: 31,342,930
RAC: 0
Message 509 - Posted: 6 Feb 2020, 18:13:42 UTC

+1
ID: 509 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
damotbe
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Help desk expert

Send message
Joined: 23 Jul 19
Posts: 289
Credit: 464,119,561
RAC: 0
Message 510 - Posted: 6 Feb 2020, 18:37:52 UTC

I just test the last Boinc wrapper version because in the 0.14 it is a very old one (available on the boinc website). Seems that the version on github is bogus... I downgrade to 0.14, you can cancel jobs
ID: 510 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
rbpeake

Send message
Joined: 4 Oct 19
Posts: 8
Credit: 3,108,300
RAC: 0
Message 511 - Posted: 6 Feb 2020, 20:42:34 UTC

Can't seem to get any work units.
ID: 511 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 3 Oct 19
Posts: 153
Credit: 32,412,973
RAC: 0
Message 512 - Posted: 6 Feb 2020, 23:02:57 UTC - in response to Message 507.  

Do they need to meditate before ascending???

I will be running them on a Zen core. It might work.
ID: 512 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 14 Dec 19
Posts: 68
Credit: 45,744,261
RAC: 0
Message 518 - Posted: 7 Feb 2020, 16:49:43 UTC

Are 0.16s and 0.17s behaving like 0.15s and stalling at the last moment???
ID: 518 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 3 Oct 19
Posts: 153
Credit: 32,412,973
RAC: 0
Message 519 - Posted: 7 Feb 2020, 16:50:54 UTC

The 0.16 don't work either. I have over 20 of them running on two Ubuntu 18.04 machines, and they are up to about four hours with 0% CPU usage.
I am aborting them.
ID: 519 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 14 Dec 19
Posts: 68
Credit: 45,744,261
RAC: 0
Message 520 - Posted: 7 Feb 2020, 17:41:47 UTC

They're really piling up. Time for a server-side abort signal.
ID: 520 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
damotbe
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Help desk expert

Send message
Joined: 23 Jul 19
Posts: 289
Credit: 464,119,561
RAC: 0
Message 521 - Posted: 7 Feb 2020, 18:44:07 UTC - in response to Message 520.  

Finally, the 0.17 is working well !

I don't know how to send an abort signal from server side with a condition on version. I Look at the documentation.
ID: 521 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
damotbe
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Help desk expert

Send message
Joined: 23 Jul 19
Posts: 289
Credit: 464,119,561
RAC: 0
Message 522 - Posted: 7 Feb 2020, 19:30:39 UTC - in response to Message 521.  

I Look at the documentation.


Nothing planed to abort only computation with a given app version id...

All these problems, because I tried to use the official boinc wrapper on github...
0.17 is a rollback to the old wrapper plus a sentry process (that I just coded) to patch bogus signal of this version.
From the beginning, our application works and we have a lot of problems with the official BOINC tools.
ID: 522 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
crashtech

Send message
Joined: 9 Dec 19
Posts: 11
Credit: 19,162,966
RAC: 0
Message 523 - Posted: 7 Feb 2020, 19:49:05 UTC

Hi, how do I tell between good and bad units? On a few machines that were experiencing low CPU utilization or unusually long run times, I aborted them all, but I'd rather not make a habit of that.
ID: 523 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
damotbe
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Help desk expert

Send message
Joined: 23 Jul 19
Posts: 289
Credit: 464,119,561
RAC: 0
Message 524 - Posted: 7 Feb 2020, 20:11:37 UTC - in response to Message 523.  

0.15 and 0.16 are bogus on Linux
Windows and Mac depends of official vbox_wrapper, so they could idle instead of computing...
ID: 524 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 3 Oct 19
Posts: 153
Credit: 32,412,973
RAC: 0
Message 525 - Posted: 7 Feb 2020, 20:30:03 UTC - in response to Message 524.  

Yes, 0.17 is working great. We are in business again.
ID: 525 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 14 Dec 19
Posts: 68
Credit: 45,744,261
RAC: 0
Message 528 - Posted: 8 Feb 2020, 10:41:57 UTC - in response to Message 524.  

0.15 and 0.16 are bogus on Linux
Windows and Mac depends of official vbox_wrapper, so they could idle instead of computing...

Then a Linux host should not even be sent a windoze WU. Right?
ID: 528 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
damotbe
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Help desk expert

Send message
Joined: 23 Jul 19
Posts: 289
Credit: 464,119,561
RAC: 0
Message 529 - Posted: 8 Feb 2020, 10:47:44 UTC - in response to Message 528.  

0.15 and 0.16 are bogus on Linux
Windows and Mac depends of official vbox_wrapper, so they could idle instead of computing...

Then a Linux host should not even be sent a windoze WU. Right?


Normally, no.
ID: 529 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Serg

Send message
Joined: 12 Jan 20
Posts: 12
Credit: 220,914
RAC: 0
Message 530 - Posted: 8 Feb 2020, 14:49:37 UTC - in response to Message 529.  

As i see, the project sends one WU to linux and Windows in many cases: Example
ID: 530 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
damotbe
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Help desk expert

Send message
Joined: 23 Jul 19
Posts: 289
Credit: 464,119,561
RAC: 0
Message 531 - Posted: 8 Feb 2020, 16:39:27 UTC - in response to Message 530.  

To be clear, one WU is sent to two volunteers, regardless of their operating system. It can be heterogeneous like the example you're showing.

On the other hand, a Linux host will only receive calculations with the application for Linux.
ID: 531 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
STE\/E

Send message
Joined: 7 Oct 19
Posts: 8
Credit: 2,016,759
RAC: 0
Message 562 - Posted: 14 Feb 2020, 10:48:24 UTC

All errors so far they don't see to run very long ...

1719898 1101058 485 13 Feb 2020, 22:34:27 UTC 13 Feb 2020, 22:35:34 UTC Validate error 6.10 1.68 --- NWChem long v0.18 (t1)
x86_64-pc-linux-gnu
1719741 1101036 485 13 Feb 2020, 18:49:07 UTC 13 Feb 2020, 18:51:07 UTC Validate error 5.30 1.67 --- NWChem long v0.18 (t1)
x86_64-pc-linux-gnu
1718999 1101014 485 13 Feb 2020, 10:41:25 UTC 13 Feb 2020, 10:42:33 UTC Validate error 4.26 0.00 --- NWChem long v0.18 (t2)
x86_64-pc-linux-gnu
1718890 1101013 485 13 Feb 2020, 9:12:08 UTC 13 Feb 2020, 9:15:40 UTC Validate error 3.32 0.00 --- NWChem long v0.18 (t4)
x86_64-pc-linux-gnu
1716552 1101042 485 12 Feb 2020, 21:55:03 UTC 13 Feb 2020, 0:10:34 UTC Validate error 4.33 0.00 --- NWChem long v0.18 (t2)
x86_64-pc-linux-gnu
ID: 562 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
PDW

Send message
Joined: 3 Oct 19
Posts: 43
Credit: 40,548,179
RAC: 0
Message 563 - Posted: 14 Feb 2020, 10:58:25 UTC - in response to Message 562.  

All errors so far they don't seem to run very long ...

You want to get yourself one of these...

https://quchempedia.univ-angers.fr/athome/workunit.php?wuid=1101056

I assume it is a validation problem again but I've let it run as it is for testing and I just got given a new shiny badge for testing :)
ID: 563 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : Application 0.15 (t1) (beta test) Result Not Completing Successfully

©2024 Benoit DA MOTA - LERIA, University of Angers, France