Monster wu

Message boards : Number crunching : Monster wu
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
damotbe
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Help desk expert

Send message
Joined: 23 Jul 19
Posts: 289
Credit: 464,119,561
RAC: 0
Message 98 - Posted: 10 Oct 2019, 20:21:48 UTC - in response to Message 97.  

Yes, today we have a serious problem with estimations (140 days on my computer).
We will try rollback to previous estimation.
ID: 98 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
swiftmallard
Avatar

Send message
Joined: 13 Oct 19
Posts: 87
Credit: 6,026,455
RAC: 0
Message 129 - Posted: 14 Oct 2019, 22:11:47 UTC - in response to Message 98.  

I am getting invalids after 75 sec time, VM job unmanageable, 25 day estimated run times for WUs due in 10 days. Nothing completed or likely to complete yet.
ID: 129 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
adrianxw
Avatar

Send message
Joined: 3 Oct 19
Posts: 33
Credit: 197,169
RAC: 0
Message 207 - Posted: 23 Oct 2019, 7:05:04 UTC

>>> we think something goes wrong at your side

Maybe you do, but I find it very odd that none of the other projects on here are having any issues with VBox. I had set no new tasks, and will now remove the project from this machines portfolio, I had not added it to any others.

Good luck.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 207 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
damotbe
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Help desk expert

Send message
Joined: 23 Jul 19
Posts: 289
Credit: 464,119,561
RAC: 0
Message 215 - Posted: 25 Oct 2019, 15:19:52 UTC - in response to Message 207.  

What I meant was that the particular error I was looking at seemed to come from your side. Not all errors... and I have problems with my win7 computer with Quchempedia, LHC and other Vbox projects. Sometimes (often) it fails, but there is no clear reason...

Other projects have or had issues with VBox, at least at the beginning and this project is very young and at the moment we have no manpower to continue developments.
ID: 215 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Werinbert

Send message
Joined: 15 Oct 19
Posts: 2
Credit: 100,073
RAC: 0
Message 286 - Posted: 14 Nov 2019, 12:55:27 UTC

Ok, so I have a runaway WU like adrianxw's. It is very long running, now over 7 days, and not using any appreciable CPU time (looking at Process Explorer).
I have suspended it and am thinking of aborting it. Any last requests before it gets the axe?
ID: 286 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dataman
Avatar

Send message
Joined: 7 Oct 19
Posts: 10
Credit: 650,307
RAC: 0
Message 287 - Posted: 14 Nov 2019, 14:24:35 UTC

I too have many wu's which run for days apparently doing nothing. I usually abort them after 2 days. I would suggest that you install a hard stop after 2 days, 3 days or whatever so the wu's will complete after that time has elapsed.

In the mean time, I have set my 8 Windows machines to "no new tasks" and moved them to LHC. I will keep this project on my 2 Linux machines as they seem to have a much higher success rate.

Cheers

ID: 287 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
damotbe
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Help desk expert

Send message
Joined: 23 Jul 19
Posts: 289
Credit: 464,119,561
RAC: 0
Message 288 - Posted: 15 Nov 2019, 7:13:45 UTC - in response to Message 287.  

We begin to have statistical distribution of runtime. So the idea of a hard stop, could be a very good compromise. Thank you for the suggestion ! I note that as a task for the internship next summer.
ID: 288 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[VENETO] boboviz

Send message
Joined: 13 Sep 19
Posts: 69
Credit: 399,347
RAC: 0
Message 296 - Posted: 19 Nov 2019, 7:17:22 UTC

Another big boy
8% after 37hs.
Continue to crunch or abort??
ID: 296 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
swiftmallard
Avatar

Send message
Joined: 13 Oct 19
Posts: 87
Credit: 6,026,455
RAC: 0
Message 298 - Posted: 19 Nov 2019, 13:35:44 UTC - in response to Message 296.  

I would not abort for at least two days. Remember, the estimated time remaining is not more accurate than the total estimated time.
ID: 298 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Alessio Susi
Avatar

Send message
Joined: 10 Oct 19
Posts: 1
Credit: 39,416
RAC: 0
Message 300 - Posted: 20 Nov 2019, 20:42:54 UTC

I noticed that the remaining time is just a number in this project. You just have to see if every WU works with a CPU core. If the CPU time is similar to the elapsed time, it's working. Just wait until it finishes.
ASUS X570 E-Gaming
AMD Ryzen 9 3950X, 16 core / 32 thread 4.4 GHz
AMD Radeon Sapphire RX 480 4GB Nitro+
Nvidia GTX 1080 Ti Gaming X Trio
4x16 GB Corsair Vengeance RGB 3466 MHz

ID: 300 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 8 Oct 19
Posts: 13
Credit: 2,548,714
RAC: 0
Message 301 - Posted: 24 Nov 2019, 22:57:52 UTC

Something happened to the tasks over the weekend. They either
1 - Run for a couple of seconds with validate errors.
2 - Run for days and never reach 100% with the CPU usage dropping.
3 - Get to 100% and never complete with the GPU usage dropping.

They just aren't completing like they were in the past week.
ID: 301 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
swiftmallard
Avatar

Send message
Joined: 13 Oct 19
Posts: 87
Credit: 6,026,455
RAC: 0
Message 302 - Posted: 25 Nov 2019, 3:08:24 UTC

I've not seen any changes
ID: 302 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 8 Oct 19
Posts: 13
Credit: 2,548,714
RAC: 0
Message 303 - Posted: 25 Nov 2019, 3:47:25 UTC

Mine aren't the vbox tasks, yours are.
ID: 303 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[VENETO] boboviz

Send message
Joined: 13 Sep 19
Posts: 69
Credit: 399,347
RAC: 0
Message 304 - Posted: 27 Nov 2019, 13:07:19 UTC

89935
Completed after 33hs.
I need a cpu with more IPC!!
ID: 304 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 3 Oct 19
Posts: 153
Credit: 32,412,973
RAC: 0
Message 305 - Posted: 27 Nov 2019, 16:29:26 UTC - in response to Message 304.  
Last modified: 27 Nov 2019, 16:47:16 UTC

89935
Completed after 33hs.
I need a cpu with more IPC!!

Have you tried limiting them to two? It works on native Linux.
https://quchempedia.univ-angers.fr/athome/forum_thread.php?id=23&postid=195#195

You might be able to do four, depending on the CPU, but you will fall off a cliff at some point.
It looks like you have reached it.

EDIT: For example, I can run only two at a time on my Ryzen 2600, at less than 10 hours thus far.
https://quchempedia.univ-angers.fr/athome/results.php?hostid=755
I am running four at a time on another Ryzen 2600, and they are up to 19 hours now.

But I was running four at a time on a Ryzen 3600 without too much of a problem, probably because that has a larger L3 cache (32 MB).
ID: 305 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Coleslaw
Avatar

Send message
Joined: 11 Oct 19
Posts: 5
Credit: 2,896,554
RAC: 0
Message 307 - Posted: 30 Nov 2019, 14:48:39 UTC

I have a work unit that has been running 14 days and is now past deadline. Another work unit that has ran 11 days and a third that has ran almost 9 days. If we are to continue getting monster work units, a re-assessment of completion deadline should be considered.
ID: 307 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Coleslaw
Avatar

Send message
Joined: 11 Oct 19
Posts: 5
Credit: 2,896,554
RAC: 0
Message 309 - Posted: 1 Dec 2019, 4:20:40 UTC

https://quchempedia.univ-angers.fr/athome/workunit.php?wuid=37247

That was the work unit that was running over 14 days on a Windows host. As you can see a Linux host picked up the resend and finished it much quicker. I'm guessing there is an issue with the VM rather than the work units being monsters...
ID: 309 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : Number crunching : Monster wu

©2024 Benoit DA MOTA - LERIA, University of Angers, France