Posts by adrianxw

1) Message boards : Number crunching : Stuck tasks (Message 1775)
Posted 16 Aug 2022 by adrianxw
Post:
I'm Windows 8.1 x64, and have just set no new tasks again. There are too many issues here. I'm, seeing the "Postponed: VM job unmanageable restarting later" issue on pretty much all work units, but with the long deadline, I've ignored that, they start again after 24 hours. This morning, I saw another running, 100% done, and 26 hours . The project is not suitable for production as it is, (and as it was before...).
2) Message boards : Number crunching : High failure rate (Message 1769)
Posted 2 Aug 2022 by adrianxw
Post:
I re-enabled work fetch from the project to see if the earlier issues were just a memory. It downloaded 18 work units. Four jobs failed after a short period, (ie. less than two minutes), with an exit status of a helpful 0x00000000. The remainder started running, but within an hour, all had entered the "Postponed: VM job unmanageable, restarting later." state. "Later" appears to be 24 hours With the long deadline, this appears to be tolerable however, it simply makes a mess of the BOINC Manager screen. The exit status for these completed units is also 0x00000000, so clearly, failures are not discriminated against... I enabled work fetch again, and since doing so, four more units have arrived, I'll leave it running and see what happens.

Off topic:

This keeps appearing:

Your connection is not private
Attackers might be trying to steal your information from quchempedia.univ-angers.fr (for example, passwords, messages or credit cards). Learn more
NET::ERR_CERT_DATE_INVALID
3) Message boards : Number crunching : High failure rate (Message 1721)
Posted 3 Apr 2022 by adrianxw
Post:
Downloaded another batch today, same result, all but one failed quickly with the same error I mentioned above. One unit was different, it ran for 21:33 and then errored out with -108 (0xFFFFFF94) ERR_FOPEN.
I tried to attach a different machine to see if that helped, but it would not allow me to join that one.
4) Message boards : Number crunching : High failure rate (Message 1719)
Posted 2 Apr 2022 by adrianxw
Post:
I re enabled work from here this morning on one machine, (Intel Windows 8.1 x64), but all work that came crashed after about 15 seconds with...

>>> 1 (0x00000001) Unknown error code
5) Message boards : Science : NWChemEx (Message 1699)
Posted 6 Mar 2022 by adrianxw
Post:
This unit has 21+ hours on it.
6) Message boards : Science : NWChemEx (Message 1697)
Posted 6 Mar 2022 by adrianxw
Post:
Most of the work units finish reasonably quickly, but I have one on here at the moment, which is showing 99.999% complete. Sounds like a faulty unit to me, are there any LONG work units at the moment?
7) Message boards : Number crunching : Long work units. (Message 1003)
Posted 4 Aug 2020 by adrianxw
Post:
When you have several machines attached to numerous projects, to spend time absorbing all threads on all forums of all projects is a non starter. As I said, best of luck, I have detached my systems from the project.
8) Message boards : Number crunching : Long work units. (Message 988)
Posted 29 Jul 2020 by adrianxw
Post:
This machine has 15 projects in its portfolio, probably half of which have work running at any one time, and show no problems, so I doubt it is...

>>> (either busy host , or ram management unefficient , or power micro waves,...)

The other machine has a similar portfolio but without GPU projects, the GPU in that machine is older and showing signs of trouble, (arrays of black spots on part of the screen, the parts with spots move around over time, RAM issue I suspect).

The problem for me is that when your task gets into that state, it is preventing another project from usig the resource. Looking at the line I posted above, that might be for a very long time. I am not available to watch it 24/7. I'll leave it at "no new tasks" for now

Best of luck.
9) Message boards : Number crunching : Long work units. (Message 984)
Posted 28 Jul 2020 by adrianxw
Post:
I've set no new tasks. There is clearly something wrong here. The job I had that had 100.000% complete, and remaining 00:00:00 yesterday, is still there and "running" this morning, it is approaching 9 CPU days now. The task manager is not showing 100% system in use though, I've stopped and started other projects work units, nothing brings it back up to 100%. I'm aborting it now.

19 Jul 2020, 10:18:01 UTC 28 Jul 2020, 7:55:37 UTC Aborted 752,166.99 55,497.80 --- NWChem long v0.11 (vbox64_t1)
windows_x86_64
10) Message boards : Number crunching : Long work units. (Message 979)
Posted 27 Jul 2020 by adrianxw
Post:
The time remaining has dropped to zero now, yet the task continues to run, certainly rather odd work units.
11) Message boards : Number crunching : Long work units. (Message 977)
Posted 27 Jul 2020 by adrianxw
Post:
>>> It would not be the first time someone has had a very long running task.

Indeed. I well recall climate prediction work units running for months.
12) Message boards : Number crunching : Long work units. (Message 974)
Posted 27 Jul 2020 by adrianxw
Post:
The task manager shows the CPU pretty much maxed out on all cores/threads, typically 94% wobbles up and down 1-2%, it is what I would expect to see on these machines.
The tasks here have a ridiculously long deadline, so I guess he is expecting long runs, but the "remaining" item is WAY out of order.
13) Message boards : Number crunching : Validate error. (Message 972)
Posted 27 Jul 2020 by adrianxw
Post:
>>> Workunit 1379411

This work unit has been set "validate error" by all that have crunched it so far, after several days of CPU time each. Not happy about that.
14) Message boards : Number crunching : Long work units. (Message 970)
Posted 27 Jul 2020 by adrianxw
Post:
>>> Are you certain that your CPU is actually working on this?

Yes. I am really curious about what it is actually doing, I've asked, no reply. It is still running this morning but the remaining field is now down to 1 second.
15) Message boards : Number crunching : Long work units. (Message 966)
Posted 26 Jul 2020 by adrianxw
Post:
12 Hours later and still crunching away, but now down to 00:00:03 remaining, hours minutes and seconds are relative with these jobs. Would be really good to know what it is they are doing, they are using serious amounts of CPU time.
16) Message boards : Number crunching : Long work units. (Message 964)
Posted 26 Jul 2020 by adrianxw
Post:
The time remaining field is not reliable. Yesterday morning, I saw a work unit had 1:40 left to crunch, this morning, it is still there with 0:06 left to crunch. If you see a job like that, just leave it alone, the time does trickle down, and with the deadline being so long, this, and the other problem I've commented on should not become issues.
17) Message boards : Number crunching : Long work units. (Message 932)
Posted 13 Jul 2020 by adrianxw
Post:
The job restarted and has continued to run. It has a very long expiary date, (October), so if it hangs for 24 hours every now and again, I don't suppose there is any harm done.
18) Message boards : Number crunching : Long work units. (Message 930)
Posted 12 Jul 2020 by adrianxw
Post:
Both my systems downloaded a work unit after doing that. One is running, the other is "Postponed: VM job unmanageable, restarting later" after 4 minutes.
19) Message boards : Number crunching : Long work units. (Message 929)
Posted 12 Jul 2020 by adrianxw
Post:
I have toggled that switch now.
20) Message boards : Number crunching : Long work units. (Message 918)
Posted 2 Jul 2020 by adrianxw
Post:
I have added the config file and set my preference to only crunch the longs, but still have not received a work unit.


Next 20

©2024 Benoit DA MOTA - LERIA, University of Angers, France