Message boards :
Number crunching :
Application 0.15 (t1) (beta test) Result Not Completing Successfully
Message board moderation
Previous · 1 · 2
Author | Message |
---|---|
Send message Joined: 7 Oct 19 Posts: 8 Credit: 2,016,759 RAC: 0 |
I would if I could ... lol |
Send message Joined: 14 Dec 19 Posts: 68 Credit: 45,744,261 RAC: 0 |
Just got my first t8 Long to validate with credit. It ran for 73 hours. |
Send message Joined: 13 Sep 19 Posts: 69 Credit: 399,347 RAC: 0 |
It ran for 73 hours. Which cpu? |
Send message Joined: 14 Dec 19 Posts: 68 Credit: 45,744,261 RAC: 0 |
Xeon E5-2686 v4. Not sure how CPU time is so much higher than run time. Logic dictates that Run Time > or = CPU Time. https://quchempedia.univ-angers.fr/athome/result.php?resultid=1719940 https://quchempedia.univ-angers.fr/athome/workunit.php?wuid=1101030 My wingman ran for 93 hours on an AMD Ryzen 7 1700 Eight-Core Processor. I wonder if this t8 WU was the only thing running on this 8c CPU? |
Send message Joined: 4 Jan 20 Posts: 60 Credit: 516,736 RAC: 0 |
For a multi-core work unit , the cpu time is < or = run times * number cpu used. This is the main difference between a single core work unit and a multi-core. In this case the 2 computers check the relation : 1 : computer ID : 930 : Linux Mint Tricia ; 3.8 billions floating operations 2 : computer ID : 1170 : Arch Linux : 4.7 billions floating operations 1 : cpu times : 263313.10 <= run times 46692 * 8 = 373 536 2 : cpu times : 334408.30 <= run times 45903 * 8 = 367 224 If we evaluate the cpu efficiency : cpu times / ( run times * number cpu used ) 1 : efficiency = 263313.10 / ( 46692 * 8 ) = 70 % 2 : efficiency = 334408.30 / ( 45903 * 8 ) = 91 % So for the case of this unique work unit , the computer 2 seems to have a better behavior. But this is true that we don't know the real conditions, the 2 computers were facing. So it's too recent to get conclusions. More results are necessary to have a better knowledge of the situation. ( For those who want to optimize in the best way... Not a race but a better energetic performance ) |
Send message Joined: 14 Dec 19 Posts: 68 Credit: 45,744,261 RAC: 0 |
Ok, but there's no reason that two different runs should perform the same number of operations. I don't know exactly what they're simulating but usually it's two molecules placed in close proximity. Different orientations are tested seeking the configuration with lowest energy. No two simulations will get to the lowest energy configuration via the same path, even if run on the same computer. BTW, I see thousands of Longs on the Server Status but only 2 users running them. Little help... |
Send message Joined: 23 Jul 19 Posts: 289 Credit: 464,119,561 RAC: 0 |
Ok, but there's no reason that two different runs should perform the same number of operations. I don't know exactly what they're simulating but usually it's two molecules placed in close proximity. Different orientations are tested seeking the configuration with lowest energy. No two simulations will get to the lowest energy configuration via the same path, even if run on the same computer. Thomas can confirm and add more details. Here there is only one molecule per calculation and each workunit takes a different molecule. A worlkunit is done in 3 phases. first, the calculation of the ground state, then, the frequencies to check that we are in a ground state, and finally, the calculation of the excited states. So, two workunits have no reason to do the same number of operations and this is difficult to evaluate. What is more surprising is that the same workunit does not always converge towards the same result despite the fact that the calculation should be deterministic. |
Send message Joined: 13 Oct 19 Posts: 87 Credit: 6,026,455 RAC: 0 |
I wish I could... |
Send message Joined: 3 Oct 19 Posts: 153 Credit: 32,412,973 RAC: 0 |
If you can't run the longs, you aren't missing much. They are all invalid. https://quchempedia.univ-angers.fr/athome/results.php?hostid=822&offset=0&show_names=0&state=5&appid=3 |
Send message Joined: 5 Oct 19 Posts: 1 Credit: 1,387,981 RAC: 0 |
There might be something wrong with the worker.sh script. It works correctly on some computers, but fails on others. NWChem finishes correctly: Normal termination. 04:32:23 (2933): worker.sh exited; CPU time 263313.088107 04:32:23 (2933): called boinc_finish(0)and then worker.sh fails with: worker.sh: line 6: kill: (2937) - No such process |
Send message Joined: 16 Dec 19 Posts: 25 Credit: 11,938,843 RAC: 0 |
Just got my first t8 Long to validate with credit. It ran for 73 hours. Been away, just checked too.. Only 2 have validated yet, rest are pending.. One ran 59 hours, the other 62 hours. Interesting to see how run. |
Send message Joined: 3 Jan 20 Posts: 5 Credit: 31,342,930 RAC: 0 |
"worker.sh: line 6: kill: (2937) - No such process" appears also in the valid units so does not seem to be the problem. What I have not found at the moment is a valid long wu crunched with just one core/thread (t1). More results are needed to confirm if it is actually the case. |
Send message Joined: 3 Oct 19 Posts: 153 Credit: 32,412,973 RAC: 0 |
What I have not found at the moment is a valid long wu crunched with just one core/thread (t1). More results are needed to confirm if it is actually the case. That could be. I have set up a second machine, identical to the first (both Ryzen 2600, Ubuntu 18.04.4), except that it is crunching only t2 longs. https://quchempedia.univ-angers.fr/athome/results.php?hostid=1477 That should be a good enough test. |
Send message Joined: 3 Oct 19 Posts: 153 Credit: 32,412,973 RAC: 0 |
It looks like the t1 longs are beginning to validate. Chances are, they have tweaked their validator. hhttp://ttps://quchempedia.univ-angers.fr/athome/results.php?hostid=822 Also, it seems that a t1 needs to be validated against another t1. They don't work with a t4 or t8 that I can see. |
Send message Joined: 3 Oct 19 Posts: 153 Credit: 32,412,973 RAC: 0 |
I see a t1 long validated against both a t4 and a t8 now, so they are making progress. |
Send message Joined: 23 Jul 19 Posts: 289 Credit: 464,119,561 RAC: 0 |
There might be something wrong with the worker.sh script. the error message of worker.sh (No such process) is harmless. I just forgot to hide this meaningless message. |
©2024 Benoit DA MOTA - LERIA, University of Angers, France