Message boards :
Number crunching :
ERROR: Vboxwrapper lost communication with VirtualBox, rescheduling task for a later time
Message board moderation
Author | Message |
---|---|
Send message Joined: 2 Oct 21 Posts: 24 Credit: 68,200 RAC: 0 |
022-01-13 20:56:42 (16248): VM state change detected. (old = 'poweroff', new = 'running') 2022-01-13 20:56:47 (16248): Guest Log: vgdrvHeartbeatInit: Setting up heartbeat to trigger every 2000 milliseconds 2022-01-13 20:56:47 (16248): Guest Log: vboxguest: misc device minor 59, IRQ 20, I/O port d020, MMIO at 00000000f0400000 (size 0x400000) 2022-01-13 20:56:52 (16248): Preference change detected 2022-01-13 20:56:52 (16248): Setting CPU throttle for VM. (100%) 2022-01-13 20:56:53 (16248): Setting checkpoint interval to 600 seconds. (Higher value of (Preference: 180 seconds) or (Vbox_job.xml: 600 seconds)) 2022-01-13 20:57:08 (16248): Guest Log: vboxsf: g_fHostFeatures=0x8000000f g_fSfFeatures=0x1 g_uSfLastFunction=29 2022-01-13 21:13:07 (16248): Creating new snapshot for VM. 2022-01-13 21:13:16 (16248): Checkpoint completed. 2022-01-13 21:19:46 (16248): ERROR: Vboxwrapper lost communication with VirtualBox, rescheduling task for a later time. 2022-01-13 21:19:46 (16248): Powering off VM. 2022-01-13 21:19:46 (16248): Successfully stopped VM. Why? If its running on just one core then it should be fine. Only RAH Python and LHC ATLAS are using Vbox as well. RAH 1 core and ATLAS 4 cores. Can't Vbox handle running 6 cores at once? f I close Boinc Mgr and restart it the task will restart and run fine to completion. I've got 49 Gigs of memory just to handle all these big projects and I am only using 44% without QuChem. I haven't seen the combined usage even approach 50% So what else is going on? |
Send message Joined: 21 Jun 20 Posts: 24 Credit: 68,559,000 RAC: 0 |
I am running QuChem on Linux, therefore am not observing this here. But I got the same occasionally at Cosmology@home with the "camb_boinc2docker" application, and very frequently at Rosetta@home with the "rosetta python projects" application. (I've got Vbox 6.1.28, that's apparently a factor for the frequency of such events.) I suspect that vboxwrapper simply doesn't cope with the large latencies which a Vbox VM can sometimes exhibit. IOW my guess is that someone set a timeout too small somewhere. I am currently running "rosetta python projects" (merely 16 or fewer tasks at one on a computer with plenty of cores and 256 GB RAM) and am restarting the boinc client twice a day. Otherwise the client would run out of work eventually, since it does not request new work as long as there is one or more "postponed" task in the buffer. :-( |
Send message Joined: 2 Oct 21 Posts: 24 Credit: 68,200 RAC: 0 |
I am running QuChem on Linux, therefore am not observing this here. But I got the same occasionally at Cosmology@home with the "camb_boinc2docker" application, and very frequently at Rosetta@home with the "rosetta python projects" application. (I've got Vbox 6.1.28, that's apparently a factor for the frequency of such events.) I am a old time Rosetta cruncher. At times Python stuffs 12 or more tasks on my system. Maybe it is there that QuChem crashes. I have never really paid attention. Python uses up a ton of resources in all the key elements, so that may be a sign. |
Send message Joined: 21 Jun 20 Posts: 24 Credit: 68,559,000 RAC: 0 |
I found the following idea via the Rosetta@home message board, originally posted by @computezrmle at the Cosmology@home message board: http://www.cosmologyathome.org/forum_thread.php?id=7769&postid=22921 On Dec 5 2021 computezrmle wrote: Volunteers frequently affected by the postponed issue may try a different vboxwrapper. I haven't tried this myself yet. |
Send message Joined: 3 Oct 19 Posts: 153 Credit: 32,412,973 RAC: 0 |
I haven't tried this myself yet. I have. BOINC does a "Signature verification" and won't accept the new wrapper. But maybe you should try it and see if you can make it work. |
Send message Joined: 2 Oct 21 Posts: 24 Credit: 68,200 RAC: 0 |
I found the following idea via the Rosetta@home message board, originally posted by @computezrmle at the Cosmology@home message board: Not worth the hassle, every night when I go to bed I shut down the system. So I would have to rebuild this every morning? Forget it. And if other projects have issues, I would have to do it again? Nah...if QuChem can't figure this out, then they get the data back when I notice the problem or when BOINC restarts the next day. |
Send message Joined: 23 Feb 22 Posts: 23 Credit: 4,423,400 RAC: 0 |
Yesterday, I finally manged to attach to this project. Since then, I've had several such cases with the "postponed" issue. The even worse thing thoug is: as long as the fautly task is not removed manually, no new tasks are being downloaded. In the BOINC event log it says "...don't need", regardless of how big the buffer is in the settings (even several days). So, in each such "postponed" case one needs to abort the task manually, only then new tasks can be downloaded. Which is nonsense, of course. Hope that the project people can iron this problem out ASAP |
Send message Joined: 23 Feb 22 Posts: 23 Credit: 4,423,400 RAC: 0 |
It would be the job of the project developers to test those vboxwrappers and distribute them to the clients.the vboxwrapper used here is vboxwrapper_26200_windows_x86_64.exe. The one from the link above is a newer one and the same one as is being used by LHC: vboxwrapper_26203_windows_x86_64.exe. So after replacing the 26200 version with the 26203 version, the newer one needs to be renamed to read 26200 ? |
Send message Joined: 23 Feb 22 Posts: 23 Credit: 4,423,400 RAC: 0 |
I now have exchanged the vboxwrapper file as described above, i.e. the vboxwrapper_26203_windows_x86_64.exe is working under the name vboxwrapper_26200_windows_x86_64.exe, and three tasks have begun being processed "normally". So I will find out soon whether this helps to eliminate the ".postponed" problem, or not. |
©2024 Benoit DA MOTA - LERIA, University of Angers, France