Message boards :
Number crunching :
Monster wu
Message board moderation
Author | Message |
---|---|
Send message Joined: 13 Sep 19 Posts: 69 Credit: 399,347 RAC: 0 |
I'm crunching this wu, and after 3hs i'm at 0,6% and remaining time is 20 days (but deadline is 19 october, so i cannot finish it in time). What should i do? Kill it? |
Send message Joined: 23 Jul 19 Posts: 289 Credit: 464,119,561 RAC: 0 |
Runtime is not predictable (not determinist), perhaps the prediction of the scheduler is totally erroneous. |
Send message Joined: 3 Oct 19 Posts: 2 Credit: 70,877 RAC: 0 |
The event log in BOINC manager indicates that the project is sending trickle-up messages as the work unit progresses. However, I also noticed that I am not receiving any credit for work done. Is this a problem? |
Send message Joined: 4 Oct 19 Posts: 15 Credit: 70,119 RAC: 0 |
The event log in BOINC manager indicates that the project is sending trickle-up messages as the work unit progresses. However, I also noticed that I am not receiving any credit for work done. Is this a problem? Trickles don't necessarily mean you get credit. CPDN does that, but it's the only project I know of that does. Most of PrimeGrid's apps trickle. We use that to let the server know that the task is progressing, and may extend the task deadline due to the trickles. But we never give out credit until the task is completed because it's not possible to determine if the result is correct until the end. Giving out credit in the middle is tricky at best, and close to impossible in many situations. I'd be surprised if any other projects try to do what CPDN did. It also serves as a disincentive to users to complete the tasks, so I suspect few projects would ever want to do this. Want to find one of the largest known primes? Try PrimeGrid. Or help cure disease at WCG. |
Send message Joined: 3 Oct 19 Posts: 43 Credit: 40,548,179 RAC: 0 |
They have discussed using trickles for credit here but currently when your task completes it will get validated and awarded credit if found to be valid. DHEP used trickles for awarding credit but due to lack of funding that project has stopped at the moment. |
Send message Joined: 13 Sep 19 Posts: 69 Credit: 399,347 RAC: 0 |
Runtime is not predictable (not determinist), perhaps the prediction of the scheduler is totally erroneous. Ok, but after 24hs i'm at less than 5%. It seems that prediction of the scheduler is ok.....and i'll go out of time. |
Send message Joined: 4 Oct 19 Posts: 1 Credit: 1,033,455 RAC: 0 |
Runtime is not predictable (not determinist), perhaps the prediction of the scheduler is totally erroneous. I've had a few on the native Linux app that were behaving the same way. Decided to just let them run and they finished way before the estimated finish and credited just fine. YMMV |
Send message Joined: 4 Oct 19 Posts: 15 Credit: 70,119 RAC: 0 |
Runtime is not predictable (not determinist), perhaps the prediction of the scheduler is totally erroneous. Same here. I don't know if they're broken or not, but they behave very differently than the other tasks. They seem to not be making progress. I've been aborting them. Want to find one of the largest known primes? Try PrimeGrid. Or help cure disease at WCG. |
Send message Joined: 13 Sep 19 Posts: 69 Credit: 399,347 RAC: 0 |
I've had a few on the native Linux app that were behaving the same way. Decided to just let them run and they finished way before the estimated finish and credited just fine. I'm crunching with Windows and virtualbox. So, i don't know if kill it or not |
Send message Joined: 4 Oct 19 Posts: 8 Credit: 3,108,300 RAC: 0 |
Same here, I decided to just let them run and they finished way before the estimated finish and credited just fine. It's the BOINC estimate of time to completion that is way off. (I use Windows.) |
Send message Joined: 13 Sep 19 Posts: 69 Credit: 399,347 RAC: 0 |
Same here, I decided to just let them run and they finished way before the estimated finish and credited just fine. It finished!!! After 34hs and 680 points! Great |
Send message Joined: 23 Jul 19 Posts: 289 Credit: 464,119,561 RAC: 0 |
I don't think we implement trickles (but I'd like to reward the calculation time a little bit with a bigger bonus at the end). Perhaps the VM Boinc wrapper use trickles ? Since we use nwChem almost exclusively (not or code), we don't integrate advancement points in the code, so computation time estimations are crazy sometimes |
Send message Joined: 3 Oct 19 Posts: 33 Credit: 197,169 RAC: 0 |
>>> computation time estimations are crazy sometimes They sure are! When I went to bed last night, the work unit I was watching had just over a minute left to run, but that was dropping VERY slowly. This morning, it had 18 seconds left to run, and now, perhaps six hours later, it shows 15 seconds remaining. Still, it is running, and there is loads of time before the deadline. I'm glad it is a 4GHz i7 though. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
Send message Joined: 29 Aug 19 Posts: 15 Credit: 159,816 RAC: 0 |
I've been crunching these in native Linux for 3+ weeks and this is certainly new behavior. The 2 current WU report 245+ days till completion. All the prior WU would estimate maximum 3.5 days. I do not currently have <fraction_done_exact/> in the app_config.xml but will see if it repairs the issue. Doubtful since the WU is reporting less than 1% complete after 7 hours. |
Send message Joined: 3 Oct 19 Posts: 33 Credit: 197,169 RAC: 0 |
...and at bedtime tonight the work unit is still going, with 6 seconds left... it has just passed 4 days elapsed CPU time now 99.998% done! Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
Send message Joined: 3 Oct 19 Posts: 33 Credit: 197,169 RAC: 0 |
... and this morning, it has just 2 seconds left to run... elapsed is 4.12.00.01. I've set no new tasks and have suspended the other work units you have sent. There is something wrong here. Looking at my task manager, I cannot see it using any CPU, but can see that I only have six BOINC tasks running. I have suspended the task. I want to KNOW if the damn thing is actually acheiiving anything. As soon as I suspended it, other BOINC tasks started running, and my CPU became 100% busy again. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
Send message Joined: 23 Jul 19 Posts: 289 Credit: 464,119,561 RAC: 0 |
From my point of view, all tasks validated by the server have done the work needed for the project, included those with huge computation time. The point is that some task failed with huge computation time to and there is no reason for that. It's seems to be boinc or boincmgr issue. |
Send message Joined: 3 Oct 19 Posts: 33 Credit: 197,169 RAC: 0 |
The work unit with 4.5 days of crunching aborted. I have deleted the others here and am leaving. >>> 32892 20227 52 4 Oct 2019, 19:34:16 UTC 9 Oct 2019, 13:45:10 UTC Error while computing 388,706.57 258.86 Look at the elapsed time and the CPU time values, as I have said before, there is something seriously wrong here. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
Send message Joined: 23 Jul 19 Posts: 289 Credit: 464,119,561 RAC: 0 |
Looking at your task : 2019-10-09 08:22:04 (26728): Status Report: Elapsed Time: '384170.065659' 2019-10-09 08:22:04 (26728): Status Report: CPU Time: '256.312500' and the log file begin with : 2019-10-07 13:10:12 (26728): Deleting stale snapshot. 2019-10-07 13:10:12 (26728): Checkpoint completed. Which is not the beginning of computation. Something goes wrong before that with vboxaddition.iso With comparison to other results, we think something goes wrong at your side, but I'm not able to know if it's a recurring or one-time error. |
Send message Joined: 29 Aug 19 Posts: 15 Credit: 159,816 RAC: 0 |
adrianxw's issue with VBox aside. My two native nwchem tasks (1 t2 and 1 t1) both show 244+ days left after 50+ hours run time. The previous tasks have always estimated somewhere over 3 days. One task went 5 days but the machine was downclocked 50% and these should complete within the 3 day period. I'll let them go 4 days. The estimate used to be a fairly good estimate. The problem with such drastic estimates is that all other work units from other projects will stop as BOINC thinks the work cache is completely full. |
©2024 Benoit DA MOTA - LERIA, University of Angers, France