Some hosts are delaying (long) batch completion

Author	Message
Luigi R. Send message Joined: 7 Nov 19 Posts: 31 Credit: 4,245,903 RAC: 0	Message 989 - Posted: 2 Aug 2020, 7:33:48 UTC Some hosts download work and do not crunch anything. E.g. 143 540 2447 2537 If we exclude the case they are working offline, there is something wrong here. ID: 989 · Rating: 0 · rate: / Reply Quote

Henk Haneveld Send message Joined: 6 Nov 19 Posts: 8 Credit: 156,845 RAC: 0	Message 990 - Posted: 2 Aug 2020, 13:33:59 UTC - in response to Message 989. Some hosts download work and do not crunch anything. E.g. 143 540 2447 2537 If we exclude the case they are working offline, there is something wrong here. On what fact do you base your statement? That they have not yet returned something does not mean anything. These results have a very long return date. Until they time-out these host can still do all the work they have downloaded. ID: 990 · Rating: 0 · rate: / Reply Quote

Luigi R. Send message Joined: 7 Nov 19 Posts: 31 Credit: 4,245,903 RAC: 0	Message 991 - Posted: 2 Aug 2020, 15:27:39 UTC Just thinking... It's an uncommon behaviour that causes high number of pending WUs and it wouldn't be a problem if deadlines weren't so long. I would like to be sure that my completed work will get validated one day. I'm afraid of the fact that resends could be got by not-crunching hosts again. After 6 months or more, we don't know if this project will be still online. I guess it's someone bunkering for challenges like Formula BOINC sprints, otherwise it would be a waste of time to wait for resends when someone of active users can crunch them as soon as possible. ID: 991 · Rating: 0 · rate: / Reply Quote

damotbe Volunteer moderator Project administrator Project developer Project tester Project scientist Help desk expert Send message Joined: 23 Jul 19 Posts: 289 Credit: 464,119,561 RAC: 0	Message 993 - Posted: 3 Aug 2020, 14:58:10 UTC - in response to Message 991. There are always extreme cases, don't make it a generalization. You're whining about points, and on my end, it's about the scientific results I'm waiting for... At the moment, project is running and we don't plan to shutdown it. The deadlines are too long? In my experience, if they are too short, there are a lot of failures. After a certain number of failures, the workunit is abandoned and I don't get my result. ID: 993 · Rating: 0 · rate: / Reply Quote

Luigi R. Send message Joined: 7 Nov 19 Posts: 31 Credit: 4,245,903 RAC: 0	Message 994 - Posted: 3 Aug 2020, 15:16:17 UTC - in response to Message 993. There are always extreme cases, don't make it a generalization. You're whining about points, and on my end, it's about the scientific results I'm waiting for... At the moment, project is running and we don't plan to shutdown it. Your point is logically right. I'm not whining, I'm just raising a scenery that, as volunteer, I would hate so much. If project will run enough time, I've nothing to say anymore about it. The deadlines are too long? In my experience, if they are too short, there are a lot of failures. Good. After a certain number of failures, the workunit is abandoned and I don't get my result. Can't you run it locally by trusted machine to get your results? Anyway, thanks for your quick response. ;) ID: 994 · Rating: 0 · rate: / Reply Quote

damotbe Volunteer moderator Project administrator Project developer Project tester Project scientist Help desk expert Send message Joined: 23 Jul 19 Posts: 289 Credit: 464,119,561 RAC: 0	Message 999 - Posted: 3 Aug 2020, 15:31:23 UTC - in response to Message 994. You're welcome. We don't have enough power locally. We were hoping to be able to divide the calculation time with Boinc, but it's not that simple and it's very time consuming. ID: 999 · Rating: 0 · rate: / Reply Quote

Luigi R. Send message Joined: 7 Nov 19 Posts: 31 Credit: 4,245,903 RAC: 0	Message 1120 - Posted: 2 Oct 2020, 8:23:56 UTC - in response to Message 989. E.g. 143 540 2447 2537 Host 2537 reported all (completed) tasks on 25 September; host 143, 540 and 2447 did not crunch anything before deadline. Who was 75% right? :( P.S. those hosts, if offline, could still report tasks with success before server resends them to other volunteers. ID: 1120 · Rating: 0 · rate: / Reply Quote

xii5ku Send message Joined: 21 Jun 20 Posts: 24 Credit: 68,559,000 RAC: 0	Message 1287 - Posted: 24 Dec 2020, 11:37:38 UTC Last modified: 24 Dec 2020, 11:42:28 UTC @damotbe, of the 469 nwchem-long tasks which are currently in progress, 359 are located on two compute servers of mine. I worked on nwchem-long until recently, but have this work suspended until January 1. Then these servers will be fully available again and will complete this work (plus any resends which they might receive in the process). These last remaining nwchem-long workunits have a disproportionally large number of inconclusive results, together with the occasional aborted or error results. That is, chances that these last WUs will end up with two valid results are rather slim. Though if two hosts with the very same hardware/software configuration turn in results, would this improve the chance of valid outcome somewhat? Besides the slim chances of validation, run times of some of these tasks easily exceed a week now. Nevertheless, the mentioned hosts are allocated for this work, restarting January, for as long as it takes to reach whatever conclusion of these workunits. (It's very stable hardware too, e.g. with ECC RAM, therefore suited for long running work.) PS, as an information to users who have the other 110 currently remaining nwchem-long tasks queued: The validation rate which I have seen with nwchem-long went very sharply down during the first half of December, even though I typically was in the position to contribute both of the results needed for validation. If points-per-day are important to you, then these last tasks are no longer viable from that perspective — because of the high ratio of inconclusive and error results, and because of the sometimes dramatically long runtimes. ID: 1287 · Rating: 0 · rate: / Reply Quote

damotbe Volunteer moderator Project administrator Project developer Project tester Project scientist Help desk expert Send message Joined: 23 Jul 19 Posts: 289 Credit: 464,119,561 RAC: 0	Message 1305 - Posted: 4 Jan 2021, 8:49:40 UTC - in response to Message 1287. it is highly probable that these molecules are non conclusive. I understand that it's very expensive and not worth the cost. This is really the major disadvantage of chemical space exploration. Whatever you choose to do, thank you for your help. ID: 1305 · Rating: 0 · rate: / Reply Quote

xii5ku Send message Joined: 21 Jun 20 Posts: 24 Credit: 68,559,000 RAC: 0	Message 1307 - Posted: 7 Jan 2021, 17:56:59 UTC Last modified: 7 Jan 2021, 17:57:40 UTC All the nwchem-long work which I still had left from December, plus a dozen resends which I received in January, is already finished by now. Except for 1 last task which is still running. This went a lot quicker than I thought — mostly because I received far fewer resends than I anticipated. (I expected inconclusive/ invalid results of one computer of mine to turn into resends to another computer of mine, like it still happened in December. But apparently the high ratio of inconclusive/ invalid nwchem-long results from my hosts in January caused the scheduler to assign these resends to other hosts.) So in short, of the 216 nwchem-longs which are progress at this time, I have got only 1 left now, and the rest became somebody else's problem in the meantime. ;-) ID: 1307 · Rating: 0 · rate: / Reply Quote