Posts by Henk Haneveld

1) Message boards : Number crunching : Some hosts are delaying (long) batch completion (Message 990)
Posted 2 Aug 2020 by Henk Haneveld
Post:
Some hosts download work and do not crunch anything.

E.g. 143 540 2447 2537

If we exclude the case they are working offline, there is something wrong here.

On what fact do you base your statement? That they have not yet returned something does not mean anything.

These results have a very long return date. Until they time-out these host can still do all the work they have downloaded.
2) Message boards : Number crunching : Long work units. (Message 931)
Posted 13 Jul 2020 by Henk Haneveld
Post:
Both my systems downloaded a work unit after doing that. One is running, the other is "Postponed: VM job unmanageable, restarting later" after 4 minutes.

It should restart after 24 hrs but if you are running VirtualBox version 6.xx then it is likely to happen again when your system is very busy.

I advice to go back to VirtualBox version 5.2.38. This version is much more tolerant of this problem.
3) Message boards : Number crunching : 2,5 days long and counting... (Message 892)
Posted 17 Jun 2020 by Henk Haneveld
Post:
Just for all you folks thinking of long running WUs, I've one that's at 12 1/2 days and counting on a 4GHz Skylake/Windows 10 box. At 99.999% now. I'm letting it run, hopefully to completion.
(BTXv2_athome_b3lyp-321gd_long,batch01,000006411,nwchem_long,1587024800)

Have you checked if VBoxHeadless.exe is still running?
Also every half hour there should be a trickle-up message in the Boinc-manager Eventlog.
If not: the job is dead and will never finish.
4) Message boards : Number crunching : No new task sent out when wingman aborted or got a validation error (Message 881)
Posted 12 Jun 2020 by Henk Haneveld
Post:
Yeah, it's insane that re-sending isn't a priority. On my end, it takes a really long time to complete the batches...

I think there is a way to do that look at:

https://boinc.berkeley.edu/trac/wiki/ProjectOptions

under the header: Accelerating retries altough you will have to figure out how it neeeds to be set to work.
5) Message boards : Number crunching : 2,5 days long and counting... (Message 875)
Posted 11 Jun 2020 by Henk Haneveld
Post:
I did a bit more checking.

It looks to me that because of the reboot the Virtualbox was killed.
After the reboot Boinc wanted to restart Virtualbox but could not find the snapshot.
The Boincmanager shows that the job is running but it is not,
You can check this by looking at the running processes on your system, if you don't find the process "Vboxheadless.exe" the job is dead
I don't know of a way to force a restart there for the only thing left is to abort the job.
6) Message boards : Number crunching : 2,5 days long and counting... (Message 872)
Posted 11 Jun 2020 by Henk Haneveld
Post:
I don't mind the long runtime. however what is really bad is the lack of checkpoints.

Yesterday my running result was at a runtime of over 2 days. Because of a system reboot it jumped back to a runtime of about 8 hrs and started again from that point.

Edit to post:

After some checking if found this is in the stderr.txt file

2020-06-10 12:45:10 (224): Restore from previously saved snapshot.
2020-06-10 12:45:10 (224): Error 0x80010105 in vbox52::VBOX_VM::restore_snapshot (c:\users\david\documents\boinc_git\boinc\samples\vboxwrapper\vbox_mscom_impl.cpp:1835)
2020-06-10 12:45:10 (224): Error: Getting Error Info! hr = 0x1

User david does not exists on my system. Should it not be the user ID of the person who is running the Boinc program on the local system?

Why is the snapshot saved in a documents directory? It shoud be in a slot directory under the Boinc_data directory.
7) Message boards : Number crunching : 2,5 days long and counting... (Message 871)
Posted 11 Jun 2020 by Henk Haneveld
Post:
I don't mind the long runtime. however what is really bad is the lack of checkpoints.

Yesterday my running result was at a runtime of over 2 days. Because of a system reboot it jumped back to a runtime of about 8 hrs and started again from that point.
8) Message boards : News : Updates and poll (Message 447)
Posted 15 Jan 2020 by Henk Haneveld
Post:
Do both and run them as seperate applications.

Create in the users preference setttings the option for the users to choose wich of these they want to run or even if they want to run them both.




©2024 Benoit DA MOTA - LERIA, University of Angers, France