Message boards :
Number crunching :
Monster wu
Message board moderation
Previous · 1 · 2
Author | Message |
---|---|
Send message Joined: 23 Jul 19 Posts: 289 Credit: 464,119,561 RAC: 0 |
Yes, today we have a serious problem with estimations (140 days on my computer). We will try rollback to previous estimation. |
Send message Joined: 13 Oct 19 Posts: 87 Credit: 6,026,455 RAC: 0 |
I am getting invalids after 75 sec time, VM job unmanageable, 25 day estimated run times for WUs due in 10 days. Nothing completed or likely to complete yet. |
Send message Joined: 3 Oct 19 Posts: 33 Credit: 197,169 RAC: 0 |
>>> we think something goes wrong at your side Maybe you do, but I find it very odd that none of the other projects on here are having any issues with VBox. I had set no new tasks, and will now remove the project from this machines portfolio, I had not added it to any others. Good luck. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
Send message Joined: 23 Jul 19 Posts: 289 Credit: 464,119,561 RAC: 0 |
What I meant was that the particular error I was looking at seemed to come from your side. Not all errors... and I have problems with my win7 computer with Quchempedia, LHC and other Vbox projects. Sometimes (often) it fails, but there is no clear reason... Other projects have or had issues with VBox, at least at the beginning and this project is very young and at the moment we have no manpower to continue developments. |
Send message Joined: 15 Oct 19 Posts: 2 Credit: 100,073 RAC: 0 |
Ok, so I have a runaway WU like adrianxw's. It is very long running, now over 7 days, and not using any appreciable CPU time (looking at Process Explorer). I have suspended it and am thinking of aborting it. Any last requests before it gets the axe? |
Send message Joined: 7 Oct 19 Posts: 10 Credit: 650,307 RAC: 0 |
I too have many wu's which run for days apparently doing nothing. I usually abort them after 2 days. I would suggest that you install a hard stop after 2 days, 3 days or whatever so the wu's will complete after that time has elapsed. In the mean time, I have set my 8 Windows machines to "no new tasks" and moved them to LHC. I will keep this project on my 2 Linux machines as they seem to have a much higher success rate. Cheers |
Send message Joined: 23 Jul 19 Posts: 289 Credit: 464,119,561 RAC: 0 |
We begin to have statistical distribution of runtime. So the idea of a hard stop, could be a very good compromise. Thank you for the suggestion ! I note that as a task for the internship next summer. |
Send message Joined: 13 Sep 19 Posts: 69 Credit: 399,347 RAC: 0 |
|
Send message Joined: 13 Oct 19 Posts: 87 Credit: 6,026,455 RAC: 0 |
I would not abort for at least two days. Remember, the estimated time remaining is not more accurate than the total estimated time. |
Send message Joined: 10 Oct 19 Posts: 1 Credit: 39,416 RAC: 0 |
I noticed that the remaining time is just a number in this project. You just have to see if every WU works with a CPU core. If the CPU time is similar to the elapsed time, it's working. Just wait until it finishes. ASUS X570 E-Gaming AMD Ryzen 9 3950X, 16 core / 32 thread 4.4 GHz AMD Radeon Sapphire RX 480 4GB Nitro+ Nvidia GTX 1080 Ti Gaming X Trio 4x16 GB Corsair Vengeance RGB 3466 MHz |
Send message Joined: 8 Oct 19 Posts: 13 Credit: 2,548,714 RAC: 0 |
Something happened to the tasks over the weekend. They either 1 - Run for a couple of seconds with validate errors. 2 - Run for days and never reach 100% with the CPU usage dropping. 3 - Get to 100% and never complete with the GPU usage dropping. They just aren't completing like they were in the past week. |
Send message Joined: 13 Oct 19 Posts: 87 Credit: 6,026,455 RAC: 0 |
I've not seen any changes |
Send message Joined: 8 Oct 19 Posts: 13 Credit: 2,548,714 RAC: 0 |
Mine aren't the vbox tasks, yours are. |
Send message Joined: 13 Sep 19 Posts: 69 Credit: 399,347 RAC: 0 |
|
Send message Joined: 3 Oct 19 Posts: 153 Credit: 32,412,973 RAC: 0 |
89935 Have you tried limiting them to two? It works on native Linux. https://quchempedia.univ-angers.fr/athome/forum_thread.php?id=23&postid=195#195 You might be able to do four, depending on the CPU, but you will fall off a cliff at some point. It looks like you have reached it. EDIT: For example, I can run only two at a time on my Ryzen 2600, at less than 10 hours thus far. https://quchempedia.univ-angers.fr/athome/results.php?hostid=755 I am running four at a time on another Ryzen 2600, and they are up to 19 hours now. But I was running four at a time on a Ryzen 3600 without too much of a problem, probably because that has a larger L3 cache (32 MB). |
Send message Joined: 11 Oct 19 Posts: 5 Credit: 2,896,554 RAC: 0 |
|
Send message Joined: 11 Oct 19 Posts: 5 Credit: 2,896,554 RAC: 0 |
https://quchempedia.univ-angers.fr/athome/workunit.php?wuid=37247 That was the work unit that was running over 14 days on a Windows host. As you can see a Linux host picked up the resend and finished it much quicker. I'm guessing there is an issue with the VM rather than the work units being monsters... |
©2024 Benoit DA MOTA - LERIA, University of Angers, France