21)
Message boards :
Number crunching :
Please Don't Abort WUs
(Message 933)
Posted 13 Jul 2020 by Luigi R. Post: Finally server has started resends for my old inconclusive workunits. :) |
22)
Message boards :
Number crunching :
Suspicious near-instant results with NWChem long t4
(Message 905)
Posted 22 Jun 2020 by Luigi R. Post: Luigi R. wrote:Maybe it's OT, but I found _bin_bash.1000.crash in /var/crash about the last bash crash.P.S. please, don't care about errors. They are caused by bash crashes and I solved it with os restart. ;)But maybe those bash crashes were caused by nwchem_long not cleaning up properly. https://pastebin.com/j70fnPxW |
23)
Message boards :
Number crunching :
Suspicious near-instant results with NWChem long t4
(Message 898)
Posted 21 Jun 2020 by Luigi R. Post: me wrote: 2950037504.0;tcp://192.168.1.6,192.168.1.4,172.17.0.1:60553 2084 2383937536.0;tcp://192.168.1.6,192.168.1.15,172.17.0.1:47451 10730 3311796224.0;tcp://192.168.1.6,192.168.1.4,172.17.0.1:39214 25236 3311337472.0;tcp://192.168.1.6,192.168.1.4,172.17.0.1:55077 25261 Note that 192.168.1.6 is eth0 ip and 192.168.1.4 is wlan0 ip. 192.168.1.15 is wlan0 ip too, but that task is the oldest one and it's running for 11hours. |
24)
Message boards :
Number crunching :
Suspicious near-instant results with NWChem long t4
(Message 897)
Posted 21 Jun 2020 by Luigi R. Post: Besides a full /tmp, or lacking access permissions to /tmp, another potential problem source could be issues with the TCP port which MPI (Open MPI?) uses.What do you mean for full /tmp? 0byte? This morning I had 600MB free space. I deleted some log files and now it is 3.2GB. I have one nwchem_long task running so far, and this for example occupies the port 38253. 2950037504.0;tcp://192.168.1.6,192.168.1.4,172.17.0.1:60553 2084 2383937536.0;tcp://192.168.1.6,192.168.1.15,172.17.0.1:47451 10730 3311796224.0;tcp://192.168.1.6,192.168.1.4,172.17.0.1:39214 25236 3311337472.0;tcp://192.168.1.6,192.168.1.4,172.17.0.1:55077 25261 Random ports, I guess. I will check port of failed tasks, if it happens again and it is possible to do after failure. Otherwise we need to log used ports. But maybe those bash crashes were caused by nwchem_long not cleaning up properly.I don't know. I thought it isn't a problem related to this one. |
25)
Message boards :
Number crunching :
Suspicious near-instant results with NWChem long t4
(Message 895)
Posted 21 Jun 2020 by Luigi R. Post: This happened to me for the first time too. My computer usually run QuChemPedIA with success. Yesterday I increased work cache to 10 days. I had ~80 in-progress tasks downloaded at ~19:45. This morning at 4 AM all failed (they went to pending/invalid). Use this result to see my host: https://quchempedia.univ-angers.fr/athome/result.php?resultid=2386836 P.S. please, don't care about errors. They are caused by bash crashes and I solved it with os restart. ;) |
26)
Message boards :
Number crunching :
2,5 days long and counting...
(Message 874)
Posted 11 Jun 2020 by Luigi R. Post: So, the only explanation, since I use the official Wrapper, is that the error comes from the wrapper wrote by a David somewhere (Boinc Team ?)Yeah, I did a google search and it's plenty of errors from different boinc projects that use Virtualbox. https://www.google.com/search?client=ubuntu&channel=fs&q=c%3A%5Cusers%5Cdavid%5Cdocuments%5Cboinc_git%5Cboinc%5Csamples%5Cvboxwrapper%5Cvbox_mscom_impl&ie=utf-8&oe=utf-8 |
27)
Message boards :
Number crunching :
Please Don't Abort WUs
(Message 865)
Posted 10 Jun 2020 by Luigi R. Post: I don't believe it is a real problem. Project server will resend aborted/errored WUs at the end of this batch. @damotbe could answer better than me. Anyway Luigi R. wrote: Why has server not sent them yet to 3rd wingman after 10 days? I have the same problem.damotbe wrote: https://quchempedia.univ-angers.fr/athome/forum_thread.php?id=85&postid=802#802 |
28)
Message boards :
Number crunching :
Credits for t8=t4=t2=5.000 seriously?
(Message 833)
Posted 6 May 2020 by Luigi R. Post: Hi Diplomat, I don't understand what's (or if there is something) wrong with your host. So try to edit your project preferences to get t1 tasks only, as ProDigit suggested. If your RAC improves, you will know your problem is solved or at least you will have bypassed that. ;) |
29)
Message boards :
Number crunching :
Credits for t8=t4=t2=5.000 seriously?
(Message 820)
Posted 27 Apr 2020 by Luigi R. Post: Have you tried to monitor processes list? Are there 20 (8+8+4) nwchem running at 100%? |
30)
Message boards :
Number crunching :
Credits for t8=t4=t2=5.000 seriously?
(Message 805)
Posted 24 Apr 2020 by Luigi R. Post: It does not look your cpu used 8 threads for red circled task because runtime=cputime. Runtime should be way lower than cputime, as you see on t2 tasks. A reason could be that your cpu is running too many processes/tasks so that every one claims 8 threads, but it uses only 1. |
31)
Message boards :
Number crunching :
Suspicious near-instant results with NWChem long t4
(Message 802)
Posted 23 Apr 2020 by Luigi R. Post: Why has server not sent them yet to 3rd wingman after 10 days? I have the same problem. |
©2024 Benoit DA MOTA - LERIA, University of Angers, France