Application 0.15 (t1) (beta test) Result Not Completing Successfully

Message boards : Number crunching : Application 0.15 (t1) (beta test) Result Not Completing Successfully
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
STE\/E

Send message
Joined: 7 Oct 19
Posts: 8
Credit: 2,016,759
RAC: 0
Message 564 - Posted: 14 Feb 2020, 12:32:07 UTC

I would if I could ... lol
ID: 564 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 14 Dec 19
Posts: 68
Credit: 45,744,261
RAC: 0
Message 565 - Posted: 14 Feb 2020, 16:20:33 UTC
Last modified: 14 Feb 2020, 16:22:35 UTC

Just got my first t8 Long to validate with credit. It ran for 73 hours.
ID: 565 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[VENETO] boboviz

Send message
Joined: 13 Sep 19
Posts: 69
Credit: 399,347
RAC: 0
Message 566 - Posted: 14 Feb 2020, 16:33:01 UTC - in response to Message 565.  

It ran for 73 hours.

Which cpu?
ID: 566 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 14 Dec 19
Posts: 68
Credit: 45,744,261
RAC: 0
Message 567 - Posted: 14 Feb 2020, 20:44:15 UTC

Xeon E5-2686 v4. Not sure how CPU time is so much higher than run time. Logic dictates that Run Time > or = CPU Time.

https://quchempedia.univ-angers.fr/athome/result.php?resultid=1719940

https://quchempedia.univ-angers.fr/athome/workunit.php?wuid=1101030

My wingman ran for 93 hours on an AMD Ryzen 7 1700 Eight-Core Processor. I wonder if this t8 WU was the only thing running on this 8c CPU?
ID: 567 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
PHILIPPE

Send message
Joined: 4 Jan 20
Posts: 60
Credit: 516,736
RAC: 0
Message 568 - Posted: 15 Feb 2020, 16:55:00 UTC - in response to Message 567.  

For a multi-core work unit , the cpu time is < or = run times * number cpu used.
This is the main difference between a single core work unit and a multi-core.

In this case the 2 computers check the relation :

1 : computer ID : 930 : Linux Mint Tricia ; 3.8 billions floating operations
2 : computer ID : 1170 : Arch Linux : 4.7 billions floating operations

1 : cpu times : 263313.10 <= run times 46692 * 8 = 373 536
2 : cpu times : 334408.30 <= run times 45903 * 8 = 367 224

If we evaluate the cpu efficiency : cpu times / ( run times * number cpu used )
1 : efficiency = 263313.10 / ( 46692 * 8 ) = 70 %
2 : efficiency = 334408.30 / ( 45903 * 8 ) = 91 %

So for the case of this unique work unit , the computer 2 seems to have a better behavior.
But this is true that we don't know the real conditions, the 2 computers were facing.
So it's too recent to get conclusions. More results are necessary to have a better knowledge of the situation.
( For those who want to optimize in the best way... Not a race but a better energetic performance )
ID: 568 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 14 Dec 19
Posts: 68
Credit: 45,744,261
RAC: 0
Message 569 - Posted: 15 Feb 2020, 17:51:08 UTC

Ok, but there's no reason that two different runs should perform the same number of operations. I don't know exactly what they're simulating but usually it's two molecules placed in close proximity. Different orientations are tested seeking the configuration with lowest energy. No two simulations will get to the lowest energy configuration via the same path, even if run on the same computer.

BTW, I see thousands of Longs on the Server Status but only 2 users running them. Little help...
ID: 569 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
damotbe
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Help desk expert

Send message
Joined: 23 Jul 19
Posts: 289
Credit: 464,119,561
RAC: 0
Message 570 - Posted: 15 Feb 2020, 21:20:56 UTC - in response to Message 569.  

Ok, but there's no reason that two different runs should perform the same number of operations. I don't know exactly what they're simulating but usually it's two molecules placed in close proximity. Different orientations are tested seeking the configuration with lowest energy. No two simulations will get to the lowest energy configuration via the same path, even if run on the same computer.

BTW, I see thousands of Longs on the Server Status but only 2 users running them. Little help...


Thomas can confirm and add more details.

Here there is only one molecule per calculation and each workunit takes a different molecule. A worlkunit is done in 3 phases. first, the calculation of the ground state, then, the frequencies to check that we are in a ground state, and finally, the calculation of the excited states. So, two workunits have no reason to do the same number of operations and this is difficult to evaluate. What is more surprising is that the same workunit does not always converge towards the same result despite the fact that the calculation should be deterministic.
ID: 570 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
swiftmallard
Avatar

Send message
Joined: 13 Oct 19
Posts: 87
Credit: 6,026,455
RAC: 0
Message 571 - Posted: 15 Feb 2020, 21:21:40 UTC - in response to Message 569.  

I wish I could...
ID: 571 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 3 Oct 19
Posts: 153
Credit: 32,412,973
RAC: 0
Message 572 - Posted: 15 Feb 2020, 21:34:28 UTC - in response to Message 571.  

If you can't run the longs, you aren't missing much. They are all invalid.
https://quchempedia.univ-angers.fr/athome/results.php?hostid=822&offset=0&show_names=0&state=5&appid=3
ID: 572 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
matsu_pl

Send message
Joined: 5 Oct 19
Posts: 1
Credit: 1,387,981
RAC: 0
Message 574 - Posted: 16 Feb 2020, 15:17:44 UTC

There might be something wrong with the worker.sh script.
It works correctly on some computers, but fails on others.
NWChem finishes correctly:
Normal termination.
04:32:23 (2933): worker.sh exited; CPU time 263313.088107
04:32:23 (2933): called boinc_finish(0)
and then worker.sh fails with:
worker.sh: line 6: kill: (2937) - No such process
ID: 574 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Zalster

Send message
Joined: 16 Dec 19
Posts: 25
Credit: 11,938,843
RAC: 0
Message 577 - Posted: 16 Feb 2020, 16:26:13 UTC - in response to Message 565.  

Just got my first t8 Long to validate with credit. It ran for 73 hours.


Been away, just checked too.. Only 2 have validated yet, rest are pending.. One ran 59 hours, the other 62 hours. Interesting to see how run.
ID: 577 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Trotador

Send message
Joined: 3 Jan 20
Posts: 5
Credit: 31,342,930
RAC: 0
Message 581 - Posted: 16 Feb 2020, 20:41:40 UTC

"worker.sh: line 6: kill: (2937) - No such process" appears also in the valid units so does not seem to be the problem.

What I have not found at the moment is a valid long wu crunched with just one core/thread (t1). More results are needed to confirm if it is actually the case.
ID: 581 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 3 Oct 19
Posts: 153
Credit: 32,412,973
RAC: 0
Message 582 - Posted: 16 Feb 2020, 22:45:42 UTC - in response to Message 581.  

What I have not found at the moment is a valid long wu crunched with just one core/thread (t1). More results are needed to confirm if it is actually the case.

That could be. I have set up a second machine, identical to the first (both Ryzen 2600, Ubuntu 18.04.4), except that it is crunching only t2 longs.
https://quchempedia.univ-angers.fr/athome/results.php?hostid=1477

That should be a good enough test.
ID: 582 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 3 Oct 19
Posts: 153
Credit: 32,412,973
RAC: 0
Message 583 - Posted: 17 Feb 2020, 7:42:28 UTC - in response to Message 582.  

It looks like the t1 longs are beginning to validate. Chances are, they have tweaked their validator.
hhttp://ttps://quchempedia.univ-angers.fr/athome/results.php?hostid=822

Also, it seems that a t1 needs to be validated against another t1. They don't work with a t4 or t8 that I can see.
ID: 583 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 3 Oct 19
Posts: 153
Credit: 32,412,973
RAC: 0
Message 585 - Posted: 17 Feb 2020, 9:28:39 UTC - in response to Message 583.  

I see a t1 long validated against both a t4 and a t8 now, so they are making progress.
ID: 585 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
damotbe
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Help desk expert

Send message
Joined: 23 Jul 19
Posts: 289
Credit: 464,119,561
RAC: 0
Message 586 - Posted: 17 Feb 2020, 11:54:32 UTC - in response to Message 574.  

There might be something wrong with the worker.sh script.
It works correctly on some computers, but fails on others.
NWChem finishes correctly:
Normal termination.
04:32:23 (2933): worker.sh exited; CPU time 263313.088107
04:32:23 (2933): called boinc_finish(0)
and then worker.sh fails with:
worker.sh: line 6: kill: (2937) - No such process


the error message of worker.sh (No such process) is harmless. I just forgot to hide this meaningless message.
ID: 586 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : Number crunching : Application 0.15 (t1) (beta test) Result Not Completing Successfully

©2024 Benoit DA MOTA - LERIA, University of Angers, France