How to run native Linux without errors

Message boards : Number crunching : How to run native Linux without errors
Message board moderation

To post messages, you must log in.

AuthorMessage
Jim1348

Send message
Joined: 3 Oct 19
Posts: 153
Credit: 32,412,973
RAC: 0
Message 239 - Posted: 31 Oct 2019, 12:43:11 UTC

Since I am now getting only repair jobs, which usually run perfectly on my four Ryzen PCs, it appears that many people have not configured their machines to run this project.
It takes a bit of experimentation, and I have explained here what works for me:
https://quchempedia.univ-angers.fr/athome/forum_thread.php?id=23&postid=195

The only errors I see now are the very short ones; not a problem.
https://quchempedia.univ-angers.fr/athome/results.php?hostid=331
https://quchempedia.univ-angers.fr/athome/results.php?hostid=454
https://quchempedia.univ-angers.fr/athome/results.php?hostid=455
https://quchempedia.univ-angers.fr/athome/results.php?hostid=550

And when the long time estimation problem gets fixed, I can do more.
ID: 239 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
damotbe
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Help desk expert

Send message
Joined: 23 Jul 19
Posts: 289
Credit: 464,119,561
RAC: 0
Message 247 - Posted: 4 Nov 2019, 9:46:18 UTC - in response to Message 239.  

Hi Jim!

Thanks for your help and motivation. To help you to understand, here is a few facts. The code run until a convergence criterion that makes the runtime unpredictable and there is no checkpoints. So the case can be decomposes such that :
1/ successful jobs (from short to very very long runtime, ie. from couple of minutes to days)
2/ technical issue (very short runtime at initialization, and randomly during runtime : client side problem most of the time)
3/ chemical issue = bad chemical question (very short runtime)
4/ convergence issue = sometimes, the method failed to find a solution (very very long runtime)[/list]
ID: 247 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 3 Oct 19
Posts: 153
Credit: 32,412,973
RAC: 0
Message 249 - Posted: 4 Nov 2019, 12:26:55 UTC - in response to Message 247.  

Thanks for the explanation. Normally for the t1 work units the CPU time is approximately the same as the run time, and on the t2 work units it is twice as much. I suppose that is the expected behavior.

Occasionally, that rule is violated and a t1 work unit may have twice the CPU run time, while conversely a t2 work unit might have the same run times. I suppose that is an example of the "core affinity" problem, but I have not checked it out further. However, it is not a major problem on my machines.

But I think the machines with hyper-threading (Ryzens, i7-8700 for example) do better than the one machine I have that has only full cores, an i7-9700 (eight full cores).
https://quchempedia.univ-angers.fr/athome/results.php?hostid=327
It seems that its run times are longer than you would expect on the i7-9700, though I have not done very many work units on it and it may just be the normal variation in the work being done.
But it appears that HT helps.
ID: 249 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dataman
Avatar

Send message
Joined: 7 Oct 19
Posts: 10
Credit: 650,307
RAC: 0
Message 250 - Posted: 4 Nov 2019, 16:07:01 UTC - in response to Message 247.  

Hi Jim!

Thanks for your help and motivation. To help you to understand, here is a few facts. The code run until a convergence criterion that makes the runtime unpredictable and there is no checkpoints. So the case can be decomposes such that :
1/ successful jobs (from short to very very long runtime, ie. from couple of minutes to days)
2/ technical issue (very short runtime at initialization, and randomly during runtime : client side problem most of the time)
3/ chemical issue = bad chemical question (very short runtime)
4/ convergence issue = sometimes, the method failed to find a solution (very very long runtime)[/list]

Thanks for the information as it helps a lot to understand the process and my short-running validation errors. My settings are 4 wu's and no limit on cores used. This works very well on my 2 x i7 8700k's and 1 x i7 8700. The 3 x threadrippers (1950x) are not performing as well as I often find (using device manager) that one or more of the wu's have stopped processing. I have set them to no new tasks until I have some time to look into why that is happening. I have not tried this project on my Win 10 i7 970's or 920's as they are busy on other projects. My sole Linux Ubuntu machine (i7 920) was a dismal failure as all the wu's used 2 cpu's and started off fine but eventually stopped using any CPU. I need to look into that also but "She who must be obeyed" has a different agenda on how I spend my time at home. LOL

Overall, I am pleased with the project and will continue to support it.

Cheers

ID: 250 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 3 Oct 19
Posts: 153
Credit: 32,412,973
RAC: 0
Message 251 - Posted: 4 Nov 2019, 16:23:16 UTC - in response to Message 250.  

My settings are 4 wu's and no limit on cores used. This works very well on my 2 x i7 8700k's and 1 x i7 8700.

Thanks. I was wondering if my i7-8700 could do more. It should, and I will give it a try.
ID: 251 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : How to run native Linux without errors

©2024 Benoit DA MOTA - LERIA, University of Angers, France