New wus

Message boards : Number crunching : New wus
Message board moderation

To post messages, you must log in.

AuthorMessage
[VENETO] boboviz

Send message
Joined: 13 Sep 19
Posts: 69
Credit: 399,347
RAC: 0
Message 316 - Posted: 3 Dec 2019, 20:06:58 UTC

New wus (od9_athome) seems to be shorter than before (dsgdb9nsd_nwchem).
But my first wu of this batch....validation error!!
ID: 316 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
damotbe
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Help desk expert

Send message
Joined: 23 Jul 19
Posts: 289
Credit: 464,119,561
RAC: 0
Message 317 - Posted: 3 Dec 2019, 21:35:06 UTC - in response to Message 316.  

This new serie should be fast (low level of theory) and as usual many WUs will fail. Specially with this serie (and the next), because from now the AI will try to find and push the frontier of known chemistry.
ID: 317 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 3 Oct 19
Posts: 153
Credit: 32,412,973
RAC: 0
Message 318 - Posted: 3 Dec 2019, 22:48:59 UTC - in response to Message 317.  

The "od9" that fail (with a Validate error) usually do so quickly, in about a half-minute. It is really no problem.

I am reminded of CPDN, where the only way they know their initial conditions are realistic is to try them. That makes the science interesting.
ID: 318 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[VENETO] boboviz

Send message
Joined: 13 Sep 19
Posts: 69
Credit: 399,347
RAC: 0
Message 321 - Posted: 4 Dec 2019, 17:09:08 UTC - in response to Message 318.  

The "od9" that fail (with a Validate error) usually do so quickly, in about a half-minute. It is really no problem.

Only 1 error (the first wu).
Now, 9 wus crunched without problems.
ID: 321 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[VENETO] boboviz

Send message
Joined: 13 Sep 19
Posts: 69
Credit: 399,347
RAC: 0
Message 322 - Posted: 4 Dec 2019, 17:09:54 UTC - in response to Message 317.  

because from now the AI will try to find and push the frontier of known chemistry.

Sounds exciting
ID: 322 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
swiftmallard
Avatar

Send message
Joined: 13 Oct 19
Posts: 87
Credit: 6,026,455
RAC: 0
Message 324 - Posted: 5 Dec 2019, 3:06:58 UTC

The time estimates on these new WUs are far more accurate than before.
ID: 324 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
damotbe
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Help desk expert

Send message
Joined: 23 Jul 19
Posts: 289
Credit: 464,119,561
RAC: 0
Message 325 - Posted: 5 Dec 2019, 6:57:26 UTC - in response to Message 324.  

however, as last time I put a almost random value (in number of operations).
ID: 325 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[VENETO] boboviz

Send message
Joined: 13 Sep 19
Posts: 69
Credit: 399,347
RAC: 0
Message 326 - Posted: 5 Dec 2019, 8:06:48 UTC - in response to Message 325.  

however, as last time I put a almost random value (in number of operations).

Yeap
And this one is at 94% after 12hs.
Not shorter than before :-P
Seems to be "variable"
ID: 326 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
damotbe
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Help desk expert

Send message
Joined: 23 Jul 19
Posts: 289
Credit: 464,119,561
RAC: 0
Message 327 - Posted: 5 Dec 2019, 8:46:06 UTC - in response to Message 326.  

Yes, high variability is the problem. Computation time and success depend of the initial conditions... That's why I try random value based on the average runtime and of course that's not perfect.
ID: 327 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[VENETO] boboviz

Send message
Joined: 13 Sep 19
Posts: 69
Credit: 399,347
RAC: 0
Message 328 - Posted: 6 Dec 2019, 10:27:35 UTC - in response to Message 326.  

And this one is at 94% after 12hs.
Seems to be "variable"


Now 99.957% after 33h.
Seems to slow down at the end.
ID: 328 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
swiftmallard
Avatar

Send message
Joined: 13 Oct 19
Posts: 87
Credit: 6,026,455
RAC: 0
Message 330 - Posted: 6 Dec 2019, 14:10:53 UTC - in response to Message 328.  


Now 99.957% after 33h.
Seems to slow down at the end.

I've aborted two like this. Despite ~53K seconds of running time, there was only 60 seconds of CPU time on one, 19 seconds on the other.
Both showed as 99.999% completed.
ID: 330 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[VENETO] boboviz

Send message
Joined: 13 Sep 19
Posts: 69
Credit: 399,347
RAC: 0
Message 336 - Posted: 9 Dec 2019, 16:27:45 UTC - in response to Message 330.  

I've aborted two like this. Despite ~53K seconds of running time, there was only 60 seconds of CPU time on one, 19 seconds on the other.
Both showed as 99.999% completed.


I've got a lot of these.
Start very quickly, slow down and after 7hs of run, the cpu is at 0% of use and cpu time (in wu properties) is 18 seconds
Mmmm, the app code needs debug?
ID: 336 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Coleslaw
Avatar

Send message
Joined: 11 Oct 19
Posts: 5
Credit: 2,896,554
RAC: 0
Message 338 - Posted: 10 Dec 2019, 3:49:45 UTC

I have a work unit that has ran elapsed time of 23 days. I got to looking at the properties and it is just over 1 day of CPU time. However, it has locked down that CPU thread this entire time from other BOINC work. The progress kept going so I wanted to watch it for observation. Quite frankly one of the reasons I hate virtualbox projects.
ID: 338 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
swiftmallard
Avatar

Send message
Joined: 13 Oct 19
Posts: 87
Credit: 6,026,455
RAC: 0
Message 339 - Posted: 10 Dec 2019, 12:34:00 UTC - in response to Message 338.  
Last modified: 10 Dec 2019, 12:40:48 UTC

I have a work unit that has ran elapsed time of 23 days. I got to looking at the properties and it is just over 1 day of CPU time. However, it has locked down that CPU thread this entire time from other BOINC work. The progress kept going so I wanted to watch it for observation. Quite frankly one of the reasons I hate virtualbox projects.

That one would have been aborted 22 days ago, I've never had one complete successfully for longer than 21 hours.
ID: 339 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
damotbe
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Help desk expert

Send message
Joined: 23 Jul 19
Posts: 289
Credit: 464,119,561
RAC: 0
Message 340 - Posted: 10 Dec 2019, 15:16:05 UTC - in response to Message 339.  

the record on the server for valid WU is more than 10 days. But aborting WU after 24h is a good rule of thumb in general !
ID: 340 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[VENETO] boboviz

Send message
Joined: 13 Sep 19
Posts: 69
Credit: 399,347
RAC: 0
Message 341 - Posted: 10 Dec 2019, 18:45:42 UTC - in response to Message 338.  

The progress kept going so I wanted to watch it for observation. Quite frankly one of the reasons I hate virtualbox projects.

Now i'm crunching with my 4 cores mobile pc (that is costantly under control) and i'm tented to try this project on my 24 cores Xeon.
But this machine is not constantly observed by me, so i'm afraid of wasting time....
ID: 341 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
swiftmallard
Avatar

Send message
Joined: 13 Oct 19
Posts: 87
Credit: 6,026,455
RAC: 0
Message 368 - Posted: 20 Dec 2019, 15:49:12 UTC - in response to Message 340.  

the record on the server for valid WU is more than 10 days. But aborting WU after 24h is a good rule of thumb in general !

With the od9 units, I abort anything running longer than 12 hours.
ID: 368 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 3 Oct 19
Posts: 153
Credit: 32,412,973
RAC: 0
Message 369 - Posted: 20 Dec 2019, 22:54:05 UTC

The current batch of " od9" are running quite well for me on Linux (Ubuntu 18.04.3) on a Ryzen 2600.

When I set "Max # CPUs 1", I am able to run 11 work units at once without problems (reserving one core for a GPU).
The work units are averaging around an hour. https://quchempedia.univ-angers.fr/athome/results.php?hostid=822
It used to be that I had to limit the old work units to two or four at a time, depending on the machine.

And the time estimates are much better too, so I download a reasonable amount.

I don't know about the VirtualBox versions, whether they are similar or not.
ID: 369 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : New wus

©2024 Benoit DA MOTA - LERIA, University of Angers, France