Posts by Greg_BE

1) Message boards : Number crunching : ERROR: Vboxwrapper lost communication with VirtualBox, rescheduling task for a later time (Message 1654)
Posted 12 Feb 2022 by Greg_BE
Post:
I found the following idea via the Rosetta@home message board, originally posted by @computezrmle at the Cosmology@home message board:
http://www.cosmologyathome.org/forum_thread.php?id=7769&postid=22921

On Dec 5 2021 computezrmle wrote:
Volunteers frequently affected by the postponed issue may try a different vboxwrapper.

BOINC's wiki pages mention communication problems between vboxwrapper and VirtualBox 6.x, especially on Windows.
They offer premade executables that may solve the problems:
https://boinc.berkeley.edu/trac/wiki/VboxApps#Premadevboxwrapperexecutables

It would be the job of the project developers to test those vboxwrappers and distribute them to the clients.
As long as this is not done volunteers could use the following steps as a workaround:

1. Download an alternative vboxwrapper from the page mentioned above (or use one you got from another project, e.g. LHC@home)
2. Start the BOINC client but suspend computing
3. Change to the project directory, e.g. projects/www.cosmologyathome.org, and replace the vboxwrapper there with the test version; the filename must be the name of the old vboxwrapper
4. Resume computing -> check the logfiles of tasks started after the patch


Each restart of the BOINC client will replace the patch with the original vboxwrapper from the project server.
This can be avoided setting <dont_check_file_sizes>1</dont_check_file_sizes> in cc_config.xml, but then all other automatic updates will also not work.

I haven't tried this myself yet.


Not worth the hassle, every night when I go to bed I shut down the system. So I would have to rebuild this every morning? Forget it. And if other projects have issues, I would have to do it again? Nah...if QuChem can't figure this out, then they get the data back when I notice the problem or when BOINC restarts the next day.
2) Message boards : Number crunching : Stuck tasks (Message 1651)
Posted 11 Feb 2022 by Greg_BE
Post:
I'm thinking since a bunch of us are on Rosetta and here that Python eats up a ton of resources and causes this project to get postponed or get stuck.

Have a look at my post and the answer I got to it in the VM thread.
I have had Rosetta tasks get stuck as well.
I knocked off 5 that got stuck the other week.
I think the key is Python. We know how much they drag a system down when 12 or more run at the same time.
3) Message boards : Number crunching : ERROR: Vboxwrapper lost communication with VirtualBox, rescheduling task for a later time (Message 1650)
Posted 11 Feb 2022 by Greg_BE
Post:
I am running QuChem on Linux, therefore am not observing this here. But I got the same occasionally at Cosmology@home with the "camb_boinc2docker" application, and very frequently at Rosetta@home with the "rosetta python projects" application. (I've got Vbox 6.1.28, that's apparently a factor for the frequency of such events.)

I suspect that vboxwrapper simply doesn't cope with the large latencies which a Vbox VM can sometimes exhibit. IOW my guess is that someone set a timeout too small somewhere.

I am currently running "rosetta python projects" (merely 16 or fewer tasks at one on a computer with plenty of cores and 256 GB RAM) and am restarting the boinc client twice a day. Otherwise the client would run out of work eventually, since it does not request new work as long as there is one or more "postponed" task in the buffer. :-(



I am a old time Rosetta cruncher. At times Python stuffs 12 or more tasks on my system.
Maybe it is there that QuChem crashes. I have never really paid attention.
Python uses up a ton of resources in all the key elements, so that may be a sign.
4) Message boards : Number crunching : ERROR: Vboxwrapper lost communication with VirtualBox, rescheduling task for a later time (Message 1638)
Posted 13 Jan 2022 by Greg_BE
Post:
022-01-13 20:56:42 (16248): VM state change detected. (old = 'poweroff', new = 'running')
2022-01-13 20:56:47 (16248): Guest Log: vgdrvHeartbeatInit: Setting up heartbeat to trigger every 2000 milliseconds

2022-01-13 20:56:47 (16248): Guest Log: vboxguest: misc device minor 59, IRQ 20, I/O port d020, MMIO at 00000000f0400000 (size 0x400000)

2022-01-13 20:56:52 (16248): Preference change detected
2022-01-13 20:56:52 (16248): Setting CPU throttle for VM. (100%)
2022-01-13 20:56:53 (16248): Setting checkpoint interval to 600 seconds. (Higher value of (Preference: 180 seconds) or (Vbox_job.xml: 600 seconds))
2022-01-13 20:57:08 (16248): Guest Log: vboxsf: g_fHostFeatures=0x8000000f g_fSfFeatures=0x1 g_uSfLastFunction=29

2022-01-13 21:13:07 (16248): Creating new snapshot for VM.
2022-01-13 21:13:16 (16248): Checkpoint completed.
2022-01-13 21:19:46 (16248): ERROR: Vboxwrapper lost communication with VirtualBox, rescheduling task for a later time.
2022-01-13 21:19:46 (16248): Powering off VM.
2022-01-13 21:19:46 (16248): Successfully stopped VM.


Why? If its running on just one core then it should be fine.
Only RAH Python and LHC ATLAS are using Vbox as well.
RAH 1 core and ATLAS 4 cores.
Can't Vbox handle running 6 cores at once?
f I close Boinc Mgr and restart it the task will restart and run fine to completion.

I've got 49 Gigs of memory just to handle all these big projects and I am only using 44% without QuChem. I haven't seen the combined usage even approach 50%

So what else is going on?
5) Message boards : Number crunching : Inconclusive validation (Message 1635)
Posted 13 Jan 2022 by Greg_BE
Post:
curious issue...


It has not repeated and I can not find the task that this came up on and notices have cleared out that message. So I can not tell you what task that was. Seems to be a one time thing.
6) Message boards : Number crunching : Inconclusive validation (Message 1628)
Posted 12 Jan 2022 by Greg_BE
Post:
ok new problem...

NWChem needs 3868.06MB more disk space. You currently have 227.98 MB available and it needs 4096.03 MB

C: is not even close to full. Still 97 GB free
BOINC and its projects have free reign. There are no restrictions on disk space.
The drive is not partitioned.

Only windows and programs plus BOINC are stored on C.
Long term deep storage is on a physical drive.

So what kind of nonsense message is this?

This is default settings.
Use no more than 100 GB
Leave at least 2 GB free
Use no more than 100 % of total

BOINC data folder (projects etc) is 47.9GB and the program folder is only 50.2KB
Drive is 208 GB
If a task needs more than 90GB of disk space, then its really out of control!
7) Message boards : Number crunching : Inconclusive validation (Message 1627)
Posted 12 Jan 2022 by Greg_BE
Post:
So burned a bunch of CPU time for a dead task. Nice. [heavy sarcasm]

Yes, I know. That is a very common situation.

I asked some time ago whether they could not reduce the number of work units before they declare it an "invalid".
But apparently they need that many results to ensure that it is in fact invalid.

I still expect that they could cut it down to maybe five (I think the "aborted" does not count).
After that, I have never seen one that turns our "valid". But I don't see the overall statistics either. The project admin does.



But look, the max results is 10
They got a bunch of invalids and inconclusives.
So since they have not reached the max results, then shouldn't it be sent out again?
Now some 200 tasks or more later it still sits unsent and I lose credit.
Really good work.
8) Message boards : Number crunching : Inconclusive validation (Message 1625)
Posted 11 Jan 2022 by Greg_BE
Post:
But check this https://quchempedia.univ-angers.fr/athome/workunit.php?wuid=3014719 out.

4 inconclusive results
2 validate errors
1 aborted
1 unsent (and nothing to say that it will be resent as it's a 301 series task and we are at 305 series)
So burned a bunch of CPU time for a dead task. Nice. [heavy sarcasm][/url]
9) Message boards : Number crunching : Why does this project only want to support Vbox 5.4.x? (Message 1609)
Posted 28 Dec 2021 by Greg_BE
Post:
I have Einstein on another PC, a recent laptop with an AMD CPU with graphic capabilities . Another, my fastest AMD PC, runs WCG but also a Linux Virtual Machine running both Einstein and QuChem on a SuSE Tumbleweed Linux, which is a Development version and has a 5.15.8 kernel. It is frequently updated and I have to reboot it. On the Intel PC from which I am writing I run QuChem, Rosetta@home and, alternating, LHC@home. Another older laptop runs QuChem on a SuSE Leap 15.0. I am trying to distribute my BOINC projects on 4 PCs plus one Virtual Machine.All this hardware costs me about 50 euros/month in electricity.
Tullio



Well I don't have that kind of luxury. Electric bills are high enough in Belgium to run one computer 16 hrs a day. And I can't afford to build another system. This system alone with some repairs and upgrades set me back 500 or so Euro. That's over a few years. New RAM will cost me another 70 for 2 sticks. That's enough money spent for volunteering.

So like I said, until I get the new RAM and then try running again, i'm withdrawing from this project because I can't run it the way it wants to run. To many VM errors. Not stable, can't keep communication, etc. Doesn't matter if its 6 or 5.
10) Message boards : Number crunching : Why does this project only want to support Vbox 5.4.x? (Message 1607)
Posted 28 Dec 2021 by Greg_BE
Post:
I have 12 GB RAM on an Intel i5 CPU 9400F running Windows 11 home edition and I am running two tasks at a time.The same PC can run two rosetta python tasks at the same time which need much more RAM. VBox uses 6344 MB.
Tullio



Yeah, but your not using your system like I do.
I have Einstein,LHC,Primegrid, Rosetta, SiDock and WCG + FAH
Right now without RAH I have enough memory, but the minute Python kicks in, then its down to 30% free and this project dies. And without screwing up RAH with project cpu limitations when it decides it wants to run 3-4 pythons at once then that's the end of the discussion.

I think I will be fine once I up the memory some more.
But then I have to figure out if I can run Vbox 6.x or not without some sort of error.
For now I opt out of this project.
11) Message boards : Number crunching : Why does this project only want to support Vbox 5.4.x? (Message 1605)
Posted 27 Dec 2021 by Greg_BE
Post:
Screw it.
I am off this project for awhile.
It doesn't work with my current configuration.
Need more RAM
And that 6.x is not working properly, again until I get more RAM I can't tell what is going on.
So again, screw it until later.
12) Message boards : Number crunching : Why does this project only want to support Vbox 5.4.x? (Message 1604)
Posted 27 Dec 2021 by Greg_BE
Post:
I am using VirtualBox.6.1.30 on this project, on Rosetta python and LHC@home without any problem.
Tullio



Well I was doing that and then I got Vboxwrapper lost communication with VirtualBox, rescheduling task for a later time.

So I backed up to 5.4.x and now I just the occasional memory error which I hope to have some new sticks in a few weeks.

And I noticed that if you "save" your data like you normally do when closing out BOINC then you get unmanageable.

All this nonsense is not to my liking.
You would think this project could do better to make it work like the other projects in the BOINC environment.
13) Message boards : Number crunching : Why does this project only want to support Vbox 5.4.x? (Message 1602)
Posted 26 Dec 2021 by Greg_BE
Post:
I find it very annoying to have to downgrade to 5.4.x
Why can't they support 6?
I'm still trying to find the right combo for projects.
I can't remember how the others that normally run on 6 work.
Been to busy.
14) Message boards : Number crunching : Waiting for memory (Message 1584)
Posted 26 Nov 2021 by Greg_BE
Post:
What percentage of RAM do you allow BOINC to use (set in your preferences) ?


Unlimited (100%) in both boxes.
But, I run a whole slew of projects on BOINC plus FAH.
I was just looking at Boinc Tasks record of memory and the totals there put it at 22.4 GB
I run FAH as well and the current figure is 370MB of memory.
Because now I have a RAH task that is waiting for memory.
So I guess I need to upgrade my oldest sticks of RAM to eliminate this problem.
How much RAM would you suggest? Another 6 GB or more for 30 or more GB of RAM?

But the interesting thing is, Windows task manager says I am only using 36% of that 24 GB, but Boinc says other wise. Why is this?
15) Message boards : Number crunching : Waiting for memory (Message 1581)
Posted 25 Nov 2021 by Greg_BE
Post:
I am not using close to half of my physical ram.
This machine was built for multi project high ram usage.

So is it Vbox that is complaining and how do I increase it's ram usage?
I am running LHC ATLAS, RAH -Python (finally),this project SiDock that all use Vbox. Maybe not all at the same time, but at least 3 tasks are using Vbox at the same time from different projects.
16) Message boards : Number crunching : Virtual enviorment unmanageable (Message 1579)
Posted 23 Nov 2021 by Greg_BE
Post:
My RAC is now 2955 yet I am out of the credits list. My last position was 40.
Tullio



Oh....I see your problem.
Your running 6.1.28.
This project does not like that.
Jim already pointed that out.
You need to roll back to 5.2.44

"You can usually solve it by going back to VirtualBox 5.2.44, at least on most projects.
https://www.virtualbox.org/wiki/Download_Old_Builds_5_2" - Jim1348

Once I rolled back everything runs fine. Even LHC ATLAS does not mind it.[/url]
17) Message boards : Number crunching : Virtual enviorment unmanageable (Message 1576)
Posted 22 Nov 2021 by Greg_BE
Post:
Neither LHC@home nor rosetta python.which requests 6144 MB of RAM, ever signal an "unmanageable VM job". Only QuChem does and I kill and restart BOINC.
Tullio



I think Master Jim is right (as usual) about rolling back to 5.2.xx
Things are running smoothly now with each task taking just a hair over 2 hrs to run.
LHC and WCG are working just fine as well.

I quit RAH...you probably saw my posts.
My system got blacklisted by the python scheduler, part of the slash and burn to solve problems.
Then 4.2 doesn't have any work or very little on and off, so I bailed after 15 years.
18) Message boards : Number crunching : Virtual enviorment unmanageable (Message 1574)
Posted 22 Nov 2021 by Greg_BE
Post:
If you don't listen to what I say, you are on your own.

Well being that doesn't work..i guess i'll roll back to 5.
Nothing to lose.



5.2.44 installed. Stable at 40+ minutes.
But now off to work.

How do you get BOINC to report the tasks immediately and download a new one?
19) Message boards : Number crunching : Virtual enviorment unmanageable (Message 1573)
Posted 21 Nov 2021 by Greg_BE
Post:
If you don't listen to what I say, you are on your own.

Well being that doesn't work..i guess i'll roll back to 5.
Nothing to lose.
20) Message boards : Number crunching : Virtual enviorment unmanageable (Message 1571)
Posted 21 Nov 2021 by Greg_BE
Post:
You can usually solve it by going back to VirtualBox 5.2.44, at least on most projects.
https://www.virtualbox.org/wiki/Download_Old_Builds_5_2

You will still get a lot of invalids, but that is just the nature of the scientific work and not a VBox problem.



Not really sure about going that far back. But I went back two flavors in 6.1 to .24
What's the point of building Vbox jobs if they go into unmanageable all the time?
I don't have all day to sit here and restart BOINC everytime.
I thought this project was a little more advanced.


Next 20

©2024 Benoit DA MOTA - LERIA, University of Angers, France