Message boards :
Number crunching :
New wus
Message board moderation
Author | Message |
---|---|
Send message Joined: 13 Sep 19 Posts: 69 Credit: 399,347 RAC: 0 |
New wus (od9_athome) seems to be shorter than before (dsgdb9nsd_nwchem). But my first wu of this batch....validation error!! |
Send message Joined: 23 Jul 19 Posts: 289 Credit: 464,119,561 RAC: 0 |
This new serie should be fast (low level of theory) and as usual many WUs will fail. Specially with this serie (and the next), because from now the AI will try to find and push the frontier of known chemistry. |
Send message Joined: 3 Oct 19 Posts: 153 Credit: 32,412,973 RAC: 0 |
The "od9" that fail (with a Validate error) usually do so quickly, in about a half-minute. It is really no problem. I am reminded of CPDN, where the only way they know their initial conditions are realistic is to try them. That makes the science interesting. |
Send message Joined: 13 Sep 19 Posts: 69 Credit: 399,347 RAC: 0 |
The "od9" that fail (with a Validate error) usually do so quickly, in about a half-minute. It is really no problem. Only 1 error (the first wu). Now, 9 wus crunched without problems. |
Send message Joined: 13 Sep 19 Posts: 69 Credit: 399,347 RAC: 0 |
because from now the AI will try to find and push the frontier of known chemistry. Sounds exciting |
Send message Joined: 13 Oct 19 Posts: 87 Credit: 6,026,455 RAC: 0 |
The time estimates on these new WUs are far more accurate than before. |
Send message Joined: 23 Jul 19 Posts: 289 Credit: 464,119,561 RAC: 0 |
however, as last time I put a almost random value (in number of operations). |
Send message Joined: 13 Sep 19 Posts: 69 Credit: 399,347 RAC: 0 |
however, as last time I put a almost random value (in number of operations). Yeap And this one is at 94% after 12hs. Not shorter than before :-P Seems to be "variable" |
Send message Joined: 23 Jul 19 Posts: 289 Credit: 464,119,561 RAC: 0 |
Yes, high variability is the problem. Computation time and success depend of the initial conditions... That's why I try random value based on the average runtime and of course that's not perfect. |
Send message Joined: 13 Sep 19 Posts: 69 Credit: 399,347 RAC: 0 |
And this one is at 94% after 12hs. Now 99.957% after 33h. Seems to slow down at the end. |
Send message Joined: 13 Oct 19 Posts: 87 Credit: 6,026,455 RAC: 0 |
I've aborted two like this. Despite ~53K seconds of running time, there was only 60 seconds of CPU time on one, 19 seconds on the other. Both showed as 99.999% completed. |
Send message Joined: 13 Sep 19 Posts: 69 Credit: 399,347 RAC: 0 |
I've aborted two like this. Despite ~53K seconds of running time, there was only 60 seconds of CPU time on one, 19 seconds on the other. I've got a lot of these. Start very quickly, slow down and after 7hs of run, the cpu is at 0% of use and cpu time (in wu properties) is 18 seconds Mmmm, the app code needs debug? |
Send message Joined: 11 Oct 19 Posts: 5 Credit: 2,896,554 RAC: 0 |
I have a work unit that has ran elapsed time of 23 days. I got to looking at the properties and it is just over 1 day of CPU time. However, it has locked down that CPU thread this entire time from other BOINC work. The progress kept going so I wanted to watch it for observation. Quite frankly one of the reasons I hate virtualbox projects. |
Send message Joined: 13 Oct 19 Posts: 87 Credit: 6,026,455 RAC: 0 |
I have a work unit that has ran elapsed time of 23 days. I got to looking at the properties and it is just over 1 day of CPU time. However, it has locked down that CPU thread this entire time from other BOINC work. The progress kept going so I wanted to watch it for observation. Quite frankly one of the reasons I hate virtualbox projects. That one would have been aborted 22 days ago, I've never had one complete successfully for longer than 21 hours. |
Send message Joined: 23 Jul 19 Posts: 289 Credit: 464,119,561 RAC: 0 |
the record on the server for valid WU is more than 10 days. But aborting WU after 24h is a good rule of thumb in general ! |
Send message Joined: 13 Sep 19 Posts: 69 Credit: 399,347 RAC: 0 |
The progress kept going so I wanted to watch it for observation. Quite frankly one of the reasons I hate virtualbox projects. Now i'm crunching with my 4 cores mobile pc (that is costantly under control) and i'm tented to try this project on my 24 cores Xeon. But this machine is not constantly observed by me, so i'm afraid of wasting time.... |
Send message Joined: 13 Oct 19 Posts: 87 Credit: 6,026,455 RAC: 0 |
the record on the server for valid WU is more than 10 days. But aborting WU after 24h is a good rule of thumb in general ! With the od9 units, I abort anything running longer than 12 hours. |
Send message Joined: 3 Oct 19 Posts: 153 Credit: 32,412,973 RAC: 0 |
The current batch of " od9" are running quite well for me on Linux (Ubuntu 18.04.3) on a Ryzen 2600. When I set "Max # CPUs 1", I am able to run 11 work units at once without problems (reserving one core for a GPU). The work units are averaging around an hour. https://quchempedia.univ-angers.fr/athome/results.php?hostid=822 It used to be that I had to limit the old work units to two or four at a time, depending on the machine. And the time estimates are much better too, so I download a reasonable amount. I don't know about the VirtualBox versions, whether they are similar or not. |
©2024 Benoit DA MOTA - LERIA, University of Angers, France