21)
Message boards :
Number crunching :
"Multithreading" in prefs
(Message 900)
Posted 21 Jun 2020 by xii5ku Post: Sorry for reviving an old thread. But the issue at hand is still applicable, so here we go: dannyridel wrote: I've set the multithread settings to a max of 4 CPUS. Somehow I keep on getting work that runs on 1 CPU core only.Check the apps.php page, i.e. "Computing -> Applications" at the top of the web page. At this time, this hints that
the "NWChem long" application is single-threaded on Windows, the "NWChem long" application is currently single-threaded on Linux unless "Run test applications?" is switched to Yes at the project preferences, until the 2t/ 4t/ 8t versions are promoted out of beta status. At least that's how I understand it. |
22)
Message boards :
Number crunching :
Suspicious near-instant results with NWChem long t4
(Message 899)
Posted 21 Jun 2020 by xii5ku Post: Luigi R. wrote: xii5ku wrote:On my host, each nwchem_long task takes 8.2 MBytes in /tmp. (BTW, I completed three tasks by now, and out of these three, one did not remove its "pid.*" subdirectory in /tmp/ompi.*/.)Besides a full /tmp, or lacking access permissions to /tmp, another potential problem source could be [...]What do you mean for full /tmp? 0byte? 8.2 MBytes is not much obviously. If there is no space left for this small amount in /tmp anymore, the host may exhibit serious other problems outside of boinc as well. |
23)
Message boards :
Number crunching :
Suspicious near-instant results with NWChem long t4
(Message 896)
Posted 21 Jun 2020 by xii5ku Post: Besides a full /tmp, or lacking access permissions to /tmp, another potential problem source could be issues with the TCP port which MPI (Open MPI?) uses. I have one nwchem_long task running so far, and this for example occupies the port 38253. This may show you what ports are (or were) in use: cat /tmp/ompi.*/pid.*/contact.txtSo, maybe those who had failures after a few seconds run time had some conflict which prevented the use of the TCP port? Luigi R. wrote: P.S. please, don't care about errors. They are caused by bash crashes and I solved it with os restart. ;)But maybe those bash crashes were caused by nwchem_long not cleaning up properly. |
24)
Message boards :
Number crunching :
Suspicious near-instant results with NWChem long t4
(Message 894)
Posted 21 Jun 2020 by xii5ku Post: Alien Seeker wrote: I've had the problem again, this time on the other computer and with only 1 core per task. I suspect the reason this time was a full /tmp; although I didn't check the size, the problem vanished when I removed the many leftover /tmp/ompi.hostname.123/pid.1234 directories from previous computations. crashtech wrote: Has there been a resolution to this issue? One of my computers only runs WUs for a few seconds, then marks them as complete @crashtech, maybe this host has a full /tmp (like Alien Seeker suspected with the own host). Check with "df -h /tmp" for example. Or the boinc-client service on this host is set up in a way which does not permit it to create files outside of its data directory, or at least not in /tmp. What does /lib/systemd/system/boinc-client.service contain on this host? |
©2024 Benoit DA MOTA - LERIA, University of Angers, France