Message boards :
Number crunching :
VM job unmanageable
Message board moderation
Author | Message |
---|---|
Send message Joined: 3 Oct 19 Posts: 11 Credit: 5,443,793 RAC: 0 |
On my windows machines, I am getting many tasks stuck in the "VM job unmanageable" state. It requires quitting/restarting BOINC to get them un-stuck. This with the latest versions of BOINC and vbox. This is with 0.07 version of the application. I notice there is a new 0.08 version. Does this new version fix the issue? Edit: I also found a task in the same state on one of my Macs. So the problem is not limited to Windows. Reno, NV Team: SETI.USA |
Send message Joined: 23 Jul 19 Posts: 289 Credit: 464,119,561 RAC: 0 |
The same on my computer (Windows) and I already see that problem on other project. Probably a Vbox known issue. For me, I obtain improvement with two strategies :
- no task interruption (eg. only one project or very very long switching time, like several days) |
Send message Joined: 3 Oct 19 Posts: 11 Credit: 5,443,793 RAC: 0 |
For what it's worth, I already had both of those set as you suggest. For example, I went to sleep with a 28 core machine (HT turned off), and no other CPU projects running. When I woke up six hours later, I had only 6 tasks running, and 52 tasks stalled. This is a serious problem, IMO. Edit: If task interruption is really a cause of this problem, then the situation is doubly bad. Because the only way to fix the stalled tasks is to quit/restart BOINC, which then interrupts the tasks that were still running. But I don't think either task interruption is really the cause. There are many other vbox projects out there that do not have this issue. Reno, NV Team: SETI.USA |
Send message Joined: 23 Jul 19 Posts: 289 Credit: 464,119,561 RAC: 0 |
Curious behaviour... At the moment, I have no idea. Energy saving configuration ? Boinc client that decides to switch tasks ? |
Send message Joined: 4 Oct 19 Posts: 1 Credit: 243,053 RAC: 0 |
Same problem here on my two hosts. Almost every WU runs for a few hours and then leaves with this state. First I suspected memory problems and reduced the amount of simultaneous running tasks to 2. But doesn't seem to help. And I have two different system with different VirtualBox versions: Win 7 with VB 6.0.4 and Win 10 with VB 6.0.12 Edit: Only 1 task finished so far, that was this one |
Send message Joined: 3 Oct 19 Posts: 11 Credit: 5,443,793 RAC: 0 |
Curious behaviour... At the moment, I have no idea. Nope. These are dedicated crunchers running 24/7. And there are no other projects to switch to. Reno, NV Team: SETI.USA |
Send message Joined: 26 Aug 19 Posts: 15 Credit: 1,265,326 RAC: 0 |
I remember this was an old recurrent issue with LHC when they started with VM applications years ago (and Boinc/Rob Walton was fighting to get a stable working wrapper), it would especially happen when "mixing" several VM tasks at the same time (and other VM projects like RNA...). I don't know how they solved this, I never had it anymore (I think) with any LHC sub-projects. Some days ago I think I found one QCPIA task in that same status on my iMac, but I also think it was at the start of the AF RAID and I must have I killed it and didn't want to try anything by that time. Try to search / discuss on the LHC forum ? |
Send message Joined: 3 Oct 19 Posts: 153 Credit: 32,412,973 RAC: 0 |
On my windows machines, I am getting many tasks stuck in the "VM job unmanageable" state. It requires quitting/restarting BOINC to get them un-stuck. This with the latest versions of BOINC and vbox. I see that on nanoHUB once every few days with VBox 5.2.10 and Ubuntu 18.04, and BOINC 7.14.1 (and 7.16.1). The work units are very short (less than 5 minutes), so I just abort them. |
Send message Joined: 3 Oct 19 Posts: 11 Credit: 5,443,793 RAC: 0 |
|
Send message Joined: 23 Jul 19 Posts: 289 Credit: 464,119,561 RAC: 0 |
Did they find a solution to this problem ? |
Send message Joined: 26 Aug 19 Posts: 15 Credit: 1,265,326 RAC: 0 |
I didn't see that with Nanohub on my iMac. And for LHC yes they did, but no idea how... |
Send message Joined: 3 Oct 19 Posts: 153 Credit: 32,412,973 RAC: 0 |
Did they find a solution to this problem ? Not on nanoHUB. The VM Unmanageable is relatively rare for them, and they have bigger problems than that at the moment. It does happen on LHC too. There are several threads on it. The most knowledgeable guy over there, computezrmle, has this to say about it: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4628&postid=34506#34506 |
Send message Joined: 26 Aug 19 Posts: 15 Credit: 1,265,326 RAC: 0 |
This is a very interesting thread I read a good part of it : people still experience this issue in recent time depending on LHC subproject. It seems to be a very subtle problem depending on many factors : - amount of RAM and resources available on the machine + boinc parameters about memory, task switch frequency... - how many VM tasks you run concurrently (it seems that "the less the better"), - the version of VB installed on the machine (it seems "the most recent is not necessarily the best one"), - the OS used, - the version of the VB boinc wrapper that is being implemented by the project application (all LHC subprojects don't use the same wrapper version) - and maybe other factors... So it's a bit of an alchemy !! |
Send message Joined: 23 Jul 19 Posts: 289 Credit: 464,119,561 RAC: 0 |
This is a very interesting thread I read a good part of it : people still experience this issue in recent time depending on LHC subproject. A good way to catch a good headache ! |
Send message Joined: 13 Oct 19 Posts: 87 Credit: 6,026,455 RAC: 0 |
Moved to VirtualBox thread |
Send message Joined: 8 Dec 19 Posts: 13 Credit: 652,594 RAC: 0 |
Hi all I found this message thread on Github that might interest those affected by this issue: https://github.com/BOINC/boinc/issues/3173 It seems that VBox thinks there isn't enough memory to complete the job and hence it delays restarting for 1 day. The fix seems to be to restart the BOINC Manager client and VBox shoud then restart, assuming any local memory intensive apps have ceased. A YouTube video claims that if you reduce the "Computing preferences > Computing > Use at most __% of the CPU time" setting prior to the restart of BOINC Manager might also fix this. https://www.youtube.com/watch?v=2CK8Yxxylnw regards Tim |
©2024 Benoit DA MOTA - LERIA, University of Angers, France