Monster wu

Message boards : Number crunching : Monster wu
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
[VENETO] boboviz

Send message
Joined: 13 Sep 19
Posts: 69
Credit: 399,347
RAC: 0
Message 44 - Posted: 5 Oct 2019, 19:00:32 UTC

I'm crunching this wu, and after 3hs i'm at 0,6% and remaining time is 20 days (but deadline is 19 october, so i cannot finish it in time).
What should i do? Kill it?
ID: 44 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
damotbe
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Help desk expert

Send message
Joined: 23 Jul 19
Posts: 289
Credit: 464,119,561
RAC: 0
Message 45 - Posted: 6 Oct 2019, 7:59:17 UTC - in response to Message 44.  

Runtime is not predictable (not determinist), perhaps the prediction of the scheduler is totally erroneous.
ID: 45 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rapture

Send message
Joined: 3 Oct 19
Posts: 2
Credit: 70,877
RAC: 0
Message 55 - Posted: 6 Oct 2019, 16:16:43 UTC - in response to Message 45.  

The event log in BOINC manager indicates that the project is sending trickle-up messages as the work unit progresses. However, I also noticed that I am not receiving any credit for work done. Is this a problem?
ID: 55 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Michael Goetz
Avatar

Send message
Joined: 4 Oct 19
Posts: 15
Credit: 70,119
RAC: 0
Message 56 - Posted: 6 Oct 2019, 16:50:37 UTC - in response to Message 55.  

The event log in BOINC manager indicates that the project is sending trickle-up messages as the work unit progresses. However, I also noticed that I am not receiving any credit for work done. Is this a problem?


Trickles don't necessarily mean you get credit. CPDN does that, but it's the only project I know of that does.

Most of PrimeGrid's apps trickle. We use that to let the server know that the task is progressing, and may extend the task deadline due to the trickles. But we never give out credit until the task is completed because it's not possible to determine if the result is correct until the end.

Giving out credit in the middle is tricky at best, and close to impossible in many situations. I'd be surprised if any other projects try to do what CPDN did. It also serves as a disincentive to users to complete the tasks, so I suspect few projects would ever want to do this.
Want to find one of the largest known primes? Try PrimeGrid. Or help cure disease at WCG.

ID: 56 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
PDW

Send message
Joined: 3 Oct 19
Posts: 43
Credit: 40,548,179
RAC: 0
Message 57 - Posted: 6 Oct 2019, 17:57:02 UTC - in response to Message 55.  

They have discussed using trickles for credit here but currently when your task completes it will get validated and awarded credit if found to be valid.

DHEP used trickles for awarding credit but due to lack of funding that project has stopped at the moment.
ID: 57 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[VENETO] boboviz

Send message
Joined: 13 Sep 19
Posts: 69
Credit: 399,347
RAC: 0
Message 58 - Posted: 7 Oct 2019, 7:27:57 UTC - in response to Message 45.  

Runtime is not predictable (not determinist), perhaps the prediction of the scheduler is totally erroneous.


Ok, but after 24hs i'm at less than 5%.
It seems that prediction of the scheduler is ok.....and i'll go out of time.
ID: 58 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Skivelitis2
Avatar

Send message
Joined: 4 Oct 19
Posts: 1
Credit: 1,033,455
RAC: 0
Message 59 - Posted: 7 Oct 2019, 12:03:27 UTC - in response to Message 58.  

Runtime is not predictable (not determinist), perhaps the prediction of the scheduler is totally erroneous.


Ok, but after 24hs i'm at less than 5%.
It seems that prediction of the scheduler is ok.....and i'll go out of time.

I've had a few on the native Linux app that were behaving the same way. Decided to just let them run and they finished way before the estimated finish and credited just fine. YMMV
ID: 59 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Michael Goetz
Avatar

Send message
Joined: 4 Oct 19
Posts: 15
Credit: 70,119
RAC: 0
Message 62 - Posted: 7 Oct 2019, 14:47:36 UTC - in response to Message 59.  

Runtime is not predictable (not determinist), perhaps the prediction of the scheduler is totally erroneous.


Ok, but after 24hs i'm at less than 5%.
It seems that prediction of the scheduler is ok.....and i'll go out of time.

I've had a few on the native Linux app that were behaving the same way. Decided to just let them run and they finished way before the estimated finish and credited just fine. YMMV


Same here. I don't know if they're broken or not, but they behave very differently than the other tasks. They seem to not be making progress. I've been aborting them.
Want to find one of the largest known primes? Try PrimeGrid. Or help cure disease at WCG.

ID: 62 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[VENETO] boboviz

Send message
Joined: 13 Sep 19
Posts: 69
Credit: 399,347
RAC: 0
Message 63 - Posted: 7 Oct 2019, 15:23:49 UTC - in response to Message 59.  

I've had a few on the native Linux app that were behaving the same way. Decided to just let them run and they finished way before the estimated finish and credited just fine.

I'm crunching with Windows and virtualbox.
So, i don't know if kill it or not
ID: 63 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
rbpeake

Send message
Joined: 4 Oct 19
Posts: 8
Credit: 3,108,300
RAC: 0
Message 64 - Posted: 7 Oct 2019, 18:13:07 UTC
Last modified: 7 Oct 2019, 18:14:14 UTC

Same here, I decided to just let them run and they finished way before the estimated finish and credited just fine. It's the BOINC estimate of time to completion that is way off.

(I use Windows.)
ID: 64 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[VENETO] boboviz

Send message
Joined: 13 Sep 19
Posts: 69
Credit: 399,347
RAC: 0
Message 70 - Posted: 8 Oct 2019, 6:43:44 UTC - in response to Message 64.  

Same here, I decided to just let them run and they finished way before the estimated finish and credited just fine.


It finished!!! After 34hs and 680 points!
Great
ID: 70 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
damotbe
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Help desk expert

Send message
Joined: 23 Jul 19
Posts: 289
Credit: 464,119,561
RAC: 0
Message 74 - Posted: 8 Oct 2019, 8:01:51 UTC - in response to Message 70.  

I don't think we implement trickles (but I'd like to reward the calculation time a little bit with a bigger bonus at the end). Perhaps the VM Boinc wrapper use trickles ?

Since we use nwChem almost exclusively (not or code), we don't integrate advancement points in the code, so computation time estimations are crazy sometimes
ID: 74 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
adrianxw
Avatar

Send message
Joined: 3 Oct 19
Posts: 33
Credit: 197,169
RAC: 0
Message 75 - Posted: 8 Oct 2019, 10:52:52 UTC

>>> computation time estimations are crazy sometimes

They sure are! When I went to bed last night, the work unit I was watching had just over a minute left to run, but that was dropping VERY slowly. This morning, it had 18 seconds left to run, and now, perhaps six hours later, it shows 15 seconds remaining. Still, it is running, and there is loads of time before the deadline. I'm glad it is a 4GHz i7 though.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 75 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
marmot

Send message
Joined: 29 Aug 19
Posts: 15
Credit: 159,816
RAC: 0
Message 77 - Posted: 8 Oct 2019, 19:36:21 UTC

I've been crunching these in native Linux for 3+ weeks and this is certainly new behavior.

The 2 current WU report 245+ days till completion.
All the prior WU would estimate maximum 3.5 days.

I do not currently have <fraction_done_exact/> in the app_config.xml but will see if it repairs the issue.
Doubtful since the WU is reporting less than 1% complete after 7 hours.
ID: 77 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
adrianxw
Avatar

Send message
Joined: 3 Oct 19
Posts: 33
Credit: 197,169
RAC: 0
Message 79 - Posted: 8 Oct 2019, 20:20:05 UTC - in response to Message 75.  
Last modified: 8 Oct 2019, 20:23:23 UTC

...and at bedtime tonight the work unit is still going, with 6 seconds left... it has just passed 4 days elapsed CPU time now 99.998% done!
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 79 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
adrianxw
Avatar

Send message
Joined: 3 Oct 19
Posts: 33
Credit: 197,169
RAC: 0
Message 86 - Posted: 9 Oct 2019, 6:47:02 UTC - in response to Message 79.  
Last modified: 9 Oct 2019, 7:44:12 UTC

... and this morning, it has just 2 seconds left to run... elapsed is 4.12.00.01.

I've set no new tasks and have suspended the other work units you have sent. There is something wrong here. Looking at my task manager, I cannot see it using any CPU, but can see that I only have six BOINC tasks running. I have suspended the task. I want to KNOW if the damn thing is actually acheiiving anything. As soon as I suspended it, other BOINC tasks started running, and my CPU became 100% busy again.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 86 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
damotbe
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Help desk expert

Send message
Joined: 23 Jul 19
Posts: 289
Credit: 464,119,561
RAC: 0
Message 87 - Posted: 9 Oct 2019, 8:46:55 UTC - in response to Message 86.  

From my point of view, all tasks validated by the server have done the work needed for the project, included those with huge computation time.
The point is that some task failed with huge computation time to and there is no reason for that. It's seems to be boinc or boincmgr issue.
ID: 87 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
adrianxw
Avatar

Send message
Joined: 3 Oct 19
Posts: 33
Credit: 197,169
RAC: 0
Message 88 - Posted: 9 Oct 2019, 13:44:47 UTC - in response to Message 87.  
Last modified: 9 Oct 2019, 13:49:05 UTC

The work unit with 4.5 days of crunching aborted. I have deleted the others here and am leaving.

>>> 32892 20227 52 4 Oct 2019, 19:34:16 UTC 9 Oct 2019, 13:45:10 UTC Error while computing 388,706.57 258.86

Look at the elapsed time and the CPU time values, as I have said before, there is something seriously wrong here.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 88 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
damotbe
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Help desk expert

Send message
Joined: 23 Jul 19
Posts: 289
Credit: 464,119,561
RAC: 0
Message 93 - Posted: 10 Oct 2019, 11:50:55 UTC - in response to Message 88.  

Looking at your task :
2019-10-09 08:22:04 (26728): Status Report: Elapsed Time: '384170.065659'
2019-10-09 08:22:04 (26728): Status Report: CPU Time: '256.312500'



and the log file begin with :
2019-10-07 13:10:12 (26728): Deleting stale snapshot.
2019-10-07 13:10:12 (26728): Checkpoint completed.


Which is not the beginning of computation. Something goes wrong before that with vboxaddition.iso
With comparison to other results, we think something goes wrong at your side, but I'm not able to know if it's a recurring or one-time error.
ID: 93 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
marmot

Send message
Joined: 29 Aug 19
Posts: 15
Credit: 159,816
RAC: 0
Message 97 - Posted: 10 Oct 2019, 20:01:09 UTC
Last modified: 10 Oct 2019, 20:02:32 UTC

adrianxw's issue with VBox aside.

My two native nwchem tasks (1 t2 and 1 t1) both show 244+ days left after 50+ hours run time.

The previous tasks have always estimated somewhere over 3 days. One task went 5 days but the machine was downclocked 50% and these should complete within the 3 day period. I'll let them go 4 days.
The estimate used to be a fairly good estimate.

The problem with such drastic estimates is that all other work units from other projects will stop as BOINC thinks the work cache is completely full.
ID: 97 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : Monster wu

©2024 Benoit DA MOTA - LERIA, University of Angers, France