Message boards :
Number crunching :
Some hosts are delaying (long) batch completion
Message board moderation
Author | Message |
---|---|
Send message Joined: 7 Nov 19 Posts: 31 Credit: 4,245,903 RAC: 0 |
|
Send message Joined: 6 Nov 19 Posts: 8 Credit: 156,845 RAC: 0 |
Some hosts download work and do not crunch anything. On what fact do you base your statement? That they have not yet returned something does not mean anything. These results have a very long return date. Until they time-out these host can still do all the work they have downloaded. |
Send message Joined: 7 Nov 19 Posts: 31 Credit: 4,245,903 RAC: 0 |
Just thinking... It's an uncommon behaviour that causes high number of pending WUs and it wouldn't be a problem if deadlines weren't so long. I would like to be sure that my completed work will get validated one day. I'm afraid of the fact that resends could be got by not-crunching hosts again. After 6 months or more, we don't know if this project will be still online. I guess it's someone bunkering for challenges like Formula BOINC sprints, otherwise it would be a waste of time to wait for resends when someone of active users can crunch them as soon as possible. |
Send message Joined: 23 Jul 19 Posts: 289 Credit: 464,119,561 RAC: 0 |
There are always extreme cases, don't make it a generalization. You're whining about points, and on my end, it's about the scientific results I'm waiting for... At the moment, project is running and we don't plan to shutdown it. The deadlines are too long? In my experience, if they are too short, there are a lot of failures. After a certain number of failures, the workunit is abandoned and I don't get my result. |
Send message Joined: 7 Nov 19 Posts: 31 Credit: 4,245,903 RAC: 0 |
There are always extreme cases, don't make it a generalization. You're whining about points, and on my end, it's about the scientific results I'm waiting for... At the moment, project is running and we don't plan to shutdown it. Your point is logically right. I'm not whining, I'm just raising a scenery that, as volunteer, I would hate so much. If project will run enough time, I've nothing to say anymore about it. The deadlines are too long? In my experience, if they are too short, there are a lot of failures. Good. After a certain number of failures, the workunit is abandoned and I don't get my result. Can't you run it locally by trusted machine to get your results? Anyway, thanks for your quick response. ;) |
Send message Joined: 23 Jul 19 Posts: 289 Credit: 464,119,561 RAC: 0 |
You're welcome. We don't have enough power locally. We were hoping to be able to divide the calculation time with Boinc, but it's not that simple and it's very time consuming. |
Send message Joined: 7 Nov 19 Posts: 31 Credit: 4,245,903 RAC: 0 |
|
Send message Joined: 21 Jun 20 Posts: 24 Credit: 68,559,000 RAC: 0 |
@damotbe, of the 469 nwchem-long tasks which are currently in progress, 359 are located on two compute servers of mine. I worked on nwchem-long until recently, but have this work suspended until January 1. Then these servers will be fully available again and will complete this work (plus any resends which they might receive in the process). These last remaining nwchem-long workunits have a disproportionally large number of inconclusive results, together with the occasional aborted or error results. That is, chances that these last WUs will end up with two valid results are rather slim. Though if two hosts with the very same hardware/software configuration turn in results, would this improve the chance of valid outcome somewhat? Besides the slim chances of validation, run times of some of these tasks easily exceed a week now. Nevertheless, the mentioned hosts are allocated for this work, restarting January, for as long as it takes to reach whatever conclusion of these workunits. (It's very stable hardware too, e.g. with ECC RAM, therefore suited for long running work.) PS, as an information to users who have the other 110 currently remaining nwchem-long tasks queued: The validation rate which I have seen with nwchem-long went very sharply down during the first half of December, even though I typically was in the position to contribute both of the results needed for validation. If points-per-day are important to you, then these last tasks are no longer viable from that perspective — because of the high ratio of inconclusive and error results, and because of the sometimes dramatically long runtimes. |
Send message Joined: 23 Jul 19 Posts: 289 Credit: 464,119,561 RAC: 0 |
it is highly probable that these molecules are non conclusive. I understand that it's very expensive and not worth the cost. This is really the major disadvantage of chemical space exploration. Whatever you choose to do, thank you for your help. |
Send message Joined: 21 Jun 20 Posts: 24 Credit: 68,559,000 RAC: 0 |
All the nwchem-long work which I still had left from December, plus a dozen resends which I received in January, is already finished by now. Except for 1 last task which is still running. This went a lot quicker than I thought — mostly because I received far fewer resends than I anticipated. (I expected inconclusive/ invalid results of one computer of mine to turn into resends to another computer of mine, like it still happened in December. But apparently the high ratio of inconclusive/ invalid nwchem-long results from my hosts in January caused the scheduler to assign these resends to other hosts.) So in short, of the 216 nwchem-longs which are progress at this time, I have got only 1 left now, and the rest became somebody else's problem in the meantime. ;-) |
©2024 Benoit DA MOTA - LERIA, University of Angers, France