Message boards :
Number crunching :
New T1 native nwchem work unit affinity problem.
Message board moderation
Author | Message |
---|---|
Send message Joined: 29 Aug 19 Posts: 15 Credit: 159,816 RAC: 0 |
On the Linux machine that has 4 cores and runs 4 nwchem T1 at a time, all 4 nwchem processes choose to bind to core 1 and share it's time. So Linux process manager shows each nwchem process using 25% of a core and the other 3 CPU cores sit idle. The T1 WU don't interact and coordinate CPU core affinity selection. Another issue is that I have no way to force the server to send down T4 WU in order to use all 4 cores. |
Send message Joined: 23 Jul 19 Posts: 289 Credit: 464,119,561 RAC: 0 |
We stopped generation of T4 WU (not efficient + affinity issue). Perhaps you have old WU, can you verify the BASH script executed in your slots directory ? Thank you |
Send message Joined: 29 Aug 19 Posts: 15 Credit: 159,816 RAC: 0 |
run.sh is dated October 3, 2019 as are all *.nw files. Files in /bin folder are dated 8/30/2019 |
Send message Joined: 14 Oct 19 Posts: 7 Credit: 2,614,863 RAC: 0 |
I have the same problem too, on THREE different computers, one of which I installed yesterday, and thus must be totally fresh and pristine to all its files and scripts. (All of them seems to be running T1 Linux native tasks.) I would gladly turn on several more computers if I knew that they would be working efficiently. Please let us know when Linux computers could run the tasks efficiently. Have a nice weekend!! Kindest regards, Gunnar |
Send message Joined: 8 Oct 19 Posts: 13 Credit: 2,548,714 RAC: 0 |
Same for me for the most part. 2P system using just the 1st thread of each CPU. |
Send message Joined: 3 Oct 19 Posts: 153 Credit: 32,412,973 RAC: 0 |
I find that on Linux, the native work units run much better if you set "Max # CPUs 2" in your preferences page, https://quchempedia.univ-angers.fr/athome/prefs.php?subset=project and also run only a maximum of two work units at a time. You can control the maximum number downloaded at a time using the "Max # jobs" setting, but for more control over the actual number running, you can use an "app_config.xml" file placed in the "quchempedia.univ-angers.fr_athome" projects folder (in /var/lib/boinc-client/projects). If you are not familiar with an app_config.xml, you create it using a text editor such as Notepad, and save it as an ".xml" file. It should contain: <app_config> <project_max_concurrent>2</project_max_concurrent> </app_config> Then, you do a "read config files", or just reboot your computer to activate it. It is possible that other numbers running at a time may work better, but two works for me relatively well. It has turned impossible to run work units into successes for me, both t1 and t2 on several machines. |
Send message Joined: 29 Aug 19 Posts: 15 Credit: 159,816 RAC: 0 |
This was added to my configuration 30+ days ago and will do nothing to correct the issue discussed in this thread. The T1 WU's are all attaching to a single core and even max_concurrent = 2 will still leave a single CPU core unused. |
Send message Joined: 3 Oct 19 Posts: 153 Credit: 32,412,973 RAC: 0 |
The T1 WU's are all attaching to a single core and even max_concurrent = 2 will still leave a single CPU core unused. It works for me. Running a "top" command shows four cores fully utilized with "nwchem". Maybe it is not the same for your CPU architecture? |
Send message Joined: 3 Oct 19 Posts: 153 Credit: 32,412,973 RAC: 0 |
I have 16 virtual cores by the way, if it makes a difference. |
Send message Joined: 3 Oct 19 Posts: 153 Credit: 32,412,973 RAC: 0 |
I now see some long ones (25 days estimate) in the buffer. So this procedure may not be a fix for them. We will see. |
©2024 Benoit DA MOTA - LERIA, University of Angers, France