Posts by Alien Seeker

1) Message boards : Number crunching : Personal record (Message 852)
Posted 30 May 2020 by Alien Seeker
Post:
Beat you in runtime, although my CPU time is only half yours: https://quchempedia.univ-angers.fr/athome/workunit.php?wuid=1357518 ;-)

2329733 1793 9 May 2020, 2:58:42 UTC 30 May 2020, 6:45:51 UTC Completed, validation inconclusive 1,372,298.30 305,437.80 pending NWChem long v0.19 (t1) x86_64-pc-linux-gnu

(I'm forcibly throttling BOINC or the fans go into a frenzy and the CPU even has to downclock itself to avoid overheating.)
2) Message boards : Number crunching : Tasks incorrectly marked as invalid: Please check validation rules (Message 830)
Posted 30 Apr 2020 by Alien Seeker
Post:
My result led to a validation error, so it must've diverged (after a long computing time; tasks that don't validate generally end early), and the wingmate converged. It'll be interesting to see what the next wingmates return.
3) Message boards : Number crunching : Tasks incorrectly marked as invalid: Please check validation rules (Message 816)
Posted 26 Apr 2020 by Alien Seeker
Post:
What's especially interesting is that in this WU's specific case, we both run the same Linux distribution, yet we find completely different results (the wingmate's converged, mine didn't). Is the computation we run purely deterministic, and the only difference in our different hardware (and presumably a difference in rounding)? Or is there a bit of purposeful random added when looking for the next step?

In the end, it probably means this particular molecule is borderline and only weakly stable. I'm not that surprised it happens, I remember some very problematic runs back in the day when I did theoretical chemistry.
4) Message boards : Number crunching : Suspicious near-instant results with NWChem long t4 (Message 815)
Posted 26 Apr 2020 by Alien Seeker
Post:
I've had the problem again, this time on the other computer and with only 1 core per task. I suspect the reason this time was a full /tmp; although I didn't check the size, the problem vanished when I removed the many leftover /tmp/ompi.hostname.123/pid.1234 directories from previous computations.

I think tasks should clean up after themselves when they end; even if each directory is rather small, they pile up after a while and the /tmp partition isn't meant to be very big.
5) Message boards : Number crunching : Tasks incorrectly marked as invalid: Please check validation rules (Message 801)
Posted 22 Apr 2020 by Alien Seeker
Post:
Besides credits, there's another problem with not distinguishing between unstable molecule configurations and incorrect computing. Take this task for example: is it deemed invalid because the app has a bug or because the molecule truly is unstable? The other wingmate's result apparently converged, but there's no way to know which of us is right.
6) Message boards : Number crunching : Tasks incorrectly marked as invalid: Please check validation rules (Message 784)
Posted 18 Apr 2020 by Alien Seeker
Post:
Couldn't the validators consider it a different kind of valid result though? You'd have valid-stable and valid-diverging, which might require more participants to agree but would still eventually end up as valid tasks.

The problem is that an 'invalid' result in BOINC projects normally mean the host or application has an underlying problem, not that the task itself is intrinsically unstable. There should be a difference between actual computational errors and expected chemically unstable molecules, if only to make sure you only add expected unstable compounds to your corpus, but not computational errors.

To formulate it differently, if I understand well, we're sorting molecules between stable and unstable conformations. Both are valid (= expected) results of the computation. On the other hand, a host could pretend to compute correctly but actually turn out to have hidden computational errors: a result that can't be reproduced by another host is invalid in the BOINC sense.
7) Message boards : Number crunching : Tasks incorrectly marked as invalid: Please check validation rules (Message 764)
Posted 14 Apr 2020 by Alien Seeker
Post:
I got one such task too: computation for WU 877899 ended very quickly, and all submitted results ended in validation errors.

I haven't looked in detail at the computations we're doing here, but from my past experience with computational chemistry, I assume it means the model diverges instead of converging towards a stable (low energy) state. If so, it's a scientifically interesting information which should be accepted as valid, albeit perhaps with less credits than normal tasks as they tend to end more quickly.
8) Message boards : Number crunching : Suspicious near-instant results with NWChem long t4 (Message 759)
Posted 12 Apr 2020 by Alien Seeker
Post:
Now you mention it, I've played a lot with max_ncpus_pct from the global preferences in the last days while trying to get a setting that worked for me. (To answer your interrogation, BOINC rounds to the higher number below the threshold, so 90% of 4 CPUs would be 3 threads running.) It may have happened that the t4 tasks failed when I allowed fewer than 100% of CPUs. Now I have a configuration I'm happy with, I'll keep an eye out for more t4 tasks.

The failed tasks should still have appeared as errors though, it would make debugging easier.
9) Message boards : Number crunching : Suspicious near-instant results with NWChem long t4 (Message 757)
Posted 12 Apr 2020 by Alien Seeker
Post:
It's either the work units are faulty or it's a lack of resources. I noticed that both are 4 thread work units and that machine is only a 4 thread CPU.


I can now confirm the problem came from the execution and not the WU itself: result 2278138 failed to validate now the wingmate has returned their result. It should appear as a computing error, there must be a sanity check missing somewhere in the app.

I agree the 4 threads version of the app is the likely culprit; a t2 is currently running successfully on the same host. If it happens again with more t4 works, I'll limit max_cpus, I assume it'll stop the server from sending me t4 tasks?
10) Message boards : Number crunching : Suspicious near-instant results with NWChem long t4 (Message 753)
Posted 12 Apr 2020 by Alien Seeker
Post:
Two tasks supposedly terminated "successfully" after only a few seconds on one of my computers: 2276475 and 2278138. Both tasks were using the t4 version of NWChem long, and I suspect there's an error somewhere which wasn't detected properly.

The wingmates are still to return their results but the instant execution looks suspicious.
11) Message boards : Number crunching : Short Tasks run for 14 hours and counting. (Message 734)
Posted 8 Apr 2020 by Alien Seeker
Post:
My machine is been reliable with quite a few different BOINC tasks over the years.


It may not be your machine as such but a combination between your machine, VirtualBox and QuChemPedIA tasks. If you have a look at the forums, you'll see VirtualBox has caused a lot of headaches.
12) Message boards : Cafe : SETI@home expat (Message 668)
Posted 5 Mar 2020 by Alien Seeker
Post:
Thanks! It'll be nice to do some quantum chemistry again, even if just by lending CPU cycles for the project.
13) Message boards : Cafe : SETI@home expat (Message 665)
Posted 5 Mar 2020 by Alien Seeker
Post:
Hello! With the distributed part of SETI@home coming to an end, I've been looking for potential replacement BOINC projects and found this one. When I was a student, a long time ago, I worked in theoretical chemistry (on Gaussian95/Gaussian98 and a Valence Bond program called turtle, which, ahem, deserved its name).

Once the flow of WU from SETI@home dries out at the end of this month, I'll start crunching QuChemPedIA. :-)




©2024 Benoit DA MOTA - LERIA, University of Angers, France