Posts by Erich56

1) Message boards : Number crunching : Out of Work (Message 1794)
Posted 5 Sep 2022 by Erich56
Post:
unfortunately, within the past days more and more tasks ended up with "validation not possible".
This is too bad after all the work then was for nothing.
Very reluctantly, I am thinking about quitting the project now, as I think it does no longer make much sense :-(
2) Message boards : Number crunching : Out of Work (Message 1793)
Posted 2 Sep 2022 by Erich56
Post:
I took the above news as notice that the project was coming to an end:

https://quchempedia.univ-angers.fr/athome/forum_thread.php?id=183&postid=1783
:-( :-( :-(
3) Message boards : Number crunching : Out of Work (Message 1790)
Posted 2 Sep 2022 by Erich56
Post:
Tasks ready to send 0

This is unusual for this project. I will be out of work in about an hour.
it's been like that for quite a while. Some of my computers are out of work for this projects over lenghy time-periods :-( Too bad :-(
4) Message boards : Number crunching : Stuck tasks (Message 1746)
Posted 12 May 2022 by Erich56
Post:
What's wrong with my computers ???
If you mean the "Vm job unmanageable" ones ....
no, Jim, I am talking about those tasks for which CPU activity suddenly stops right a few seconds after start, or after any other time later. The task itself though keeps running forever, until it is aborted manually. The tool BOINC tasks which you recommended to me makes this problem better and more easily visible. Still, I am not sitting around my computers day and night :-) Which means it could well happen that a new tasks has given up CPU usage right at the beginning, but is running all night through - uselessly and for nothing :-(

The other topic which you mentioned: "vm job unmanageable" happens once in a while, thanks god not too often.
5) Message boards : Number crunching : Stuck tasks (Message 1741)
Posted 8 May 2022 by Erich56
Post:
the number of "stuck" tasks has increased here markedly within the past few days :-(
any explanation for this?
am I the only one who is experiencing this problem? Or are there other people as well?
indeed, no one else is having this problem?
What's wrong with my computers ???
6) Message boards : Number crunching : Stuck tasks (Message 1740)
Posted 8 May 2022 by Erich56
Post:
the number of "stuck" tasks has increased here markedly within the past few days :-(
any explanation for this?
am I the only one who is experiencing this problem? Or are there other people as well?
indeed, no one else is having this problem?
What's wrong with my computers ???
7) Message boards : Number crunching : Stuck tasks (Message 1738)
Posted 1 May 2022 by Erich56
Post:
the number of "stuck" tasks has increased here markedly within the past few days :-(
any explanation for this?
am I the only one who is experiencing this problem? Or are there other people as well?
8) Message boards : Number crunching : Stuck tasks (Message 1737)
Posted 29 Apr 2022 by Erich56
Post:
the number of "stuck" tasks has increased here markedly within the past few days :-(
any explanation for this?
9) Message boards : Number crunching : use hyper threading? (Message 1725)
Posted 14 Apr 2022 by Erich56
Post:
I ran a full set of 16 wu's at once on my 8 core Linux machine but they seem to finish a lot slower than if I let only 8 tasks run at once. It is a bit difficult to compare since the times necessary to finish them are usually all over the place but on average I would say they finish quicker if only one task per actual cpu core is assigned. Does anybody have any data about this? Does hyper threading make sense with this project? Memory of 16 GB is sufficient in both cases.
you are indeed running 16 WUs concurrently with 16GB RAM?
So the RAM requirements seem to differ substantially between Linux and Windows. My Windows machines with 8GB RAM (and 6 "real" cores) don't allow more than 3 WUs concurrently. For each WU 1.900MB RAM is allocated, although the peak working set size does not exceed around 58MB. This is really too bad, as I can only use half of the CPU capacity on 3 such machines :-(
10) Message boards : Number crunching : High failure rate (Message 1722)
Posted 6 Apr 2022 by Erich56
Post:
Downloaded another batch today, same result, all but one failed quickly with the same error I mentioned above. One unit was different, it ran for 21:33 and then errored out with -108 (0xFFFFFF94) ERR_FOPEN.
I tried to attach a different machine to see if that helped, but it would not allow me to join that one.
did you get your problem solved in the meantime?

I happened to face the same problem recently on one of my machines, it was at the time where there was an about 1 day's server problem. I suspected that due to this server problem one of the downloaded tasks arrived here corrupt, thus causing damage to the Oracle VM.
I tried to remove remnants of the crashed task in the VM media manager - but nothing was shown there. Still, always I received the same error message which you cited. So I removed and re-installed the VM, but the error still showed up.
Then I wanted to remove the VM again, but it was somehow damaged and could no longer be removed.
Finally, all I could do was to make a complete clean re-installation of Windows10 :-(
Now everything works well. It was interesting to see what severe damage a corrupt file can cause.
11) Message boards : Number crunching : Stuck tasks (Message 1716)
Posted 31 Mar 2022 by Erich56
Post:
The tool BOINC tasks revealed some strange thing this morning:

I noticed that for one of the running tasks, a CPU usage of 131,47% was shown.
So I Iooked up the task properties in the BOINC manager and saw that the task has a runtime of 1:07hrs and a CPU time of 1:28hrs. How come?
12) Message boards : Number crunching : Stuck tasks (Message 1715)
Posted 27 Mar 2022 by Erich56
Post:
Have you tried BoincTasks?
https://efmer.com/boinctasks/download-boinctasks/

It is easy to install, and runs along with BOINC Manager. In fact, you can use it to control the BOINC tasks instead of BM if you want to, but that is not necessary.
It gives you an easy indication of the % of the CPU that any given task is using. So if it is not using much (i.e., will take forever to finish), then that will be easy to see.
And you can then easily abort the task.

I use it on my Win10 machine to monitor not only the work units on it, but also all my Ubuntu machines on the LAN. It makes them all readily available.
thanks, Jim, for the hint. I now installed it first on one of my PCs on which I run the highest number of tasks concurrently.
And yes, it helps to monitor at one glance what's going on.

Still, of course, stays the problem itself.

Also the problem of "postponed ..." which is even worse in a way since it prevents new tasks from being downloaded :-(
13) Message boards : Number crunching : Stuck tasks (Message 1713)
Posted 24 Mar 2022 by Erich56
Post:
I've had quite a number of such "stuck tasks" lately.
Whereas before, what happened mostly was that the CPU stopped working after a few seconds (and the task was still running, four hours and hours, until I found out on basis of an usually high runtime that something must be wrong), I recently had cases like this one https://quchempedia.univ-angers.fr/athome/result.php?resultid=10319732.
Unfortunately, I did not notice until after almost 9 hours that this task must be faulty, so I checked the task Properties of the BOINC Manager and I saw that the CPU was running for 3 hrs 47 mins only.
This kind of behaviour seems new, at least to me.
With several tasks running concurrently on 7 computers, it is of course difficult to monitor everything permanently in order to detect such faulty tasks early.
And if these faulty tasks are getting more and more, it is kind of annoying, of course :-(
14) Message boards : Number crunching : Stuck tasks (Message 1708)
Posted 11 Mar 2022 by Erich56
Post:
I had such a "stuck task" last night:

https://quchempedia.univ-angers.fr/athome/result.php?resultid=10177061

unfortunately, I found out only this morning, after 11-1/2 hours' runtime, and only 21 seconds CPU time.

Too bad that such a task does not stop automatically once it becomes faulty.
15) Message boards : Number crunching : Inconclusive validation (Message 1706)
Posted 10 Mar 2022 by Erich56
Post:
there is another interesting phenomenon:

the three computers to which I have so far attached this project have quite different generations of CPUs:

the oldest one is an Intel i3-3110M CPU @ 2.40GHz - this one produces the least inconclusives
then comes: Intel i5-6300U CPU @ 2.40GHz - this one produces a few more inconclusives
last not least: AMD Ryzen 5 4500U @ 2.30GHz - this is the one with clearly the most inconclusives

is this coincidence only, or could it be the case that the better the CPU the more inconclusives come out? Would be crazy, wouldn't it?
16) Message boards : Number crunching : Inconclusive validation (Message 1704)
Posted 10 Mar 2022 by Erich56
Post:
I joined 2 days ago, meanwhile I have attached 3 of my computers to the project.
What catches my eye are these many "inconclusives" - about a fourth of the tasks. In none of the projects I have been part of so far I have seen that many "inconclusives".

I now looked up other members' contributions and noticed that in some cases there were only a few "inconclusives", in other cases quite a lot, more than half of the tasks.
So the success of a task seems to depend on the individual computer.

Question: will the "inconclusive" status of a task ever change, or will it stay like this forever?
17) Message boards : Number crunching : ERROR: Vboxwrapper lost communication with VirtualBox, rescheduling task for a later time (Message 1703)
Posted 9 Mar 2022 by Erich56
Post:
I now have exchanged the vboxwrapper file as described above, i.e. the vboxwrapper_26203_windows_x86_64.exe is working under the name vboxwrapper_26200_windows_x86_64.exe, and three tasks have begun being processed "normally".
So I will find out soon whether this helps to eliminate the ".postponed" problem, or not.
18) Message boards : Number crunching : ERROR: Vboxwrapper lost communication with VirtualBox, rescheduling task for a later time (Message 1702)
Posted 9 Mar 2022 by Erich56
Post:
It would be the job of the project developers to test those vboxwrappers and distribute them to the clients.
As long as this is not done volunteers could use the following steps as a workaround:

1. Download an alternative vboxwrapper from the page mentioned above (or use one you got from another project, e.g. LHC@home)
2. Start the BOINC client but suspend computing
3. Change to the project directory, e.g. projects/www.cosmologyathome.org, and replace the vboxwrapper there with the test version; the filename must be the name of the old vboxwrapper
4. Resume computing -> check the logfiles of tasks started after the patch
the vboxwrapper used here is vboxwrapper_26200_windows_x86_64.exe. The one from the link above is a newer one and the same one as is being used by LHC: vboxwrapper_26203_windows_x86_64.exe.
So after replacing the 26200 version with the 26203 version, the newer one needs to be renamed to read 26200 ?
19) Message boards : Number crunching : ERROR: Vboxwrapper lost communication with VirtualBox, rescheduling task for a later time (Message 1701)
Posted 9 Mar 2022 by Erich56
Post:
Yesterday, I finally manged to attach to this project.
Since then, I've had several such cases with the "postponed" issue.
The even worse thing thoug is: as long as the fautly task is not removed manually, no new tasks are being downloaded. In the BOINC event log it says "...don't need", regardless of how big the buffer is in the settings (even several days).
So, in each such "postponed" case one needs to abort the task manually, only then new tasks can be downloaded. Which is nonsense, of course.
Hope that the project people can iron this problem out ASAP
20) Questions and Answers : Getting started : Problem joining QuChem (Message 1677)
Posted 24 Feb 2022 by Erich56
Post:
yes, I did

What else have you done that you haven't told us ?
I did not believe that it was necessary to point out to begin with that I had stopped and then restarted BOINC. This normally goes without saying, doesn't it?

And to answer your question: I did nothing else, since according to the instructions there was nothing else to do, Unless I was misreading the instructions or misunderstanding something.


Next 20

©2024 Benoit DA MOTA - LERIA, University of Angers, France