Message boards :
Number crunching :
Tasks incorrectly marked as invalid: Please check validation rules
Message board moderation
Author | Message |
---|---|
Send message Joined: 11 Apr 20 Posts: 23 Credit: 442,800 RAC: 0 |
The following two tasks have been marked as invalid, however, the server can't decide whether they are valid or not because these tasks have never been returned validly by anyone: https://quchempedia.univ-angers.fr/athome/results.php?userid=522&offset=0&show_names=0&state=5&appid= Moreover, when I check the logs of my tasks it appears to me that these are valid. Please check your validation rules on the server side and re-check these tasks (it seems you are wasting compute resources when discarding proper results). Michael. President of Rechenkraft.net - Germany's first and largest distributed computing organization. |
Send message Joined: 5 Mar 20 Posts: 13 Credit: 805,400 RAC: 0 |
I got one such task too: computation for WU 877899 ended very quickly, and all submitted results ended in validation errors. I haven't looked in detail at the computations we're doing here, but from my past experience with computational chemistry, I assume it means the model diverges instead of converging towards a stable (low energy) state. If so, it's a scientifically interesting information which should be accepted as valid, albeit perhaps with less credits than normal tasks as they tend to end more quickly. |
Send message Joined: 23 Jul 19 Posts: 289 Credit: 464,119,561 RAC: 0 |
There are execution and validation errors. Your workunits have been marked as invalid because as soon as the task is returned, without comparison, it is possible to see that the calculation went wrong. |
Send message Joined: 11 Apr 20 Posts: 23 Credit: 442,800 RAC: 0 |
There are execution and validation errors. Your workunits have been marked as invalid because as soon as the task is returned, without comparison, it is possible to see that the calculation went wrong. If that is the case, then you need to really check what is going wrong in detail - and here is why: In my case, of 307 tasks processed on two Windows machines in total, at present 7 had to be aborted manually because they are running infinitely and another 72 tasks (!) are marked as invalid due to the two issues we are discussing here (execution AND validation errors as you said above). That is a failure rate of approx. 25% meaning that a quarter of our compute time per defintion goes to waste (and a quarter of our electricity heats up the air for nothing). If that is a problem at my end, I would probably no longer run my machines, but that is certainly NOT the case: I run all my machines for distributed computing projects since 2001 in a 24/7 style. Doing that I have acquired some experience. The two machines in question here run all other currently active DC projects without producing errors, so there must be some sort of issue with your client. Your project is rather new and it is known that sometimes there are issues with virtualbox (again, knowing this, I do not run virtualbox on eight of the possible cores but only on a fraction of that and there are no other DC projects or applications worth to mention running in parallel). So, I think it is normal that your project has some issues at this stage. But you will need to acquire some expertise in exactly identifying what is going wrong. Usually, the DC community is excellently suited to help finding this out. Try to test different Virtualbox versions (choosing the correct version has been a solution for several other DC project issues with virtualbox). Is it possible that the RAM size of the virtualbox environment is set too small in some cases? Check rounding inconsistencies when validating tasks (AMD vs. AMD, Intel vs. Intel, not cross-validation - probably long known by you). Ask people to run some of the invalid tasks on their machines for camparison of the environment. You may even deploy hardware testing apps in the beta testing section and ask specific people to participate here if you really suspect a machine issue. These are just a few ideas, of course. I do not want to rant about the problems. I like your project and I just would like to help you make it better by reporting all inconsistencies I can find. 25% failure rate is not a nice thing to have. You can assume that most other participants, at least those supporting DC for a longer period of time, will be helpful. Michael. President of Rechenkraft.net - Germany's first and largest distributed computing organization. |
Send message Joined: 11 Apr 20 Posts: 23 Credit: 442,800 RAC: 0 |
I think a first step would be to cleanly separate the validation from the execution errors and then see in what e.g. the execution error group of tasks differs from the successfully completed ones. Do these have a greater RAM requirement, etc.? The fact that the same machines return valid tasks to me again hints that there is no hardware issue. Michael. President of Rechenkraft.net - Germany's first and largest distributed computing organization. |
Send message Joined: 23 Jul 19 Posts: 289 Credit: 464,119,561 RAC: 0 |
I explained this a lot to the different project participants at the beginning of the project. You can't know in advance if the calculation will converge and how long it will take. The consequence is unpredictable calculation times and invalid tasks when the chemistry doesn't work. This is how we explore the frontier between combinatorial chemistry and what is theoretically possible and stable. Scientifically, the invalid tasks are very important because they will allow us to train models (Artificial Intelligence) which will allow us in the future to predict the validity or not of the molecules studied with a good precision we hope. At the moment, we are already using these data and the results are encouraging. |
Send message Joined: 5 Mar 20 Posts: 13 Credit: 805,400 RAC: 0 |
Couldn't the validators consider it a different kind of valid result though? You'd have valid-stable and valid-diverging, which might require more participants to agree but would still eventually end up as valid tasks. The problem is that an 'invalid' result in BOINC projects normally mean the host or application has an underlying problem, not that the task itself is intrinsically unstable. There should be a difference between actual computational errors and expected chemically unstable molecules, if only to make sure you only add expected unstable compounds to your corpus, but not computational errors. To formulate it differently, if I understand well, we're sorting molecules between stable and unstable conformations. Both are valid (= expected) results of the computation. On the other hand, a host could pretend to compute correctly but actually turn out to have hidden computational errors: a result that can't be reproduced by another host is invalid in the BOINC sense. |
Send message Joined: 11 Apr 20 Posts: 23 Credit: 442,800 RAC: 0 |
I explained this a lot to the different project participants at the beginning of the project. You can't know in advance if the calculation will converge and how long it will take. The consequence is unpredictable calculation times and invalid tasks when the chemistry doesn't work. This is how we explore the frontier between combinatorial chemistry and what is theoretically possible and stable. Scientifically, the invalid tasks are very important because they will allow us to train models (Artificial Intelligence) which will allow us in the future to predict the validity or not of the molecules studied with a good precision we hope. At the moment, we are already using these data and the results are encouraging. Now I understand. Thanks. But then there is still this VM configuration issue (never ending tasks) for which I posted a few error log notes here. Michael. President of Rechenkraft.net - Germany's first and largest distributed computing organization. |
Send message Joined: 3 Oct 19 Posts: 11 Credit: 5,443,793 RAC: 0 |
I explained this a lot to the different project participants at the beginning of the project. You can't know in advance if the calculation will converge and how long it will take. The consequence is unpredictable calculation times and invalid tasks when the chemistry doesn't work. This is how we explore the frontier between combinatorial chemistry and what is theoretically possible and stable. Scientifically, the invalid tasks are very important because they will allow us to train models (Artificial Intelligence) which will allow us in the future to predict the validity or not of the molecules studied with a good precision we hope. At the moment, we are already using these data and the results are encouraging. If our machines are calculating the tasks properly, then they should be given credits for those tasks. Looking at my results, Valid (2346) ยท Invalid (2091). Almost half of my computing contributions are going to waste here, with regard to credits. Edit: Perhaps a solution is to roughly double the credits, so that the valid tasks pay for the invalid tasks? |
Send message Joined: 5 Mar 20 Posts: 13 Credit: 805,400 RAC: 0 |
Besides credits, there's another problem with not distinguishing between unstable molecule configurations and incorrect computing. Take this task for example: is it deemed invalid because the app has a bug or because the molecule truly is unstable? The other wingmate's result apparently converged, but there's no way to know which of us is right. |
Send message Joined: 23 Jul 19 Posts: 289 Credit: 464,119,561 RAC: 0 |
I'll answer as a computer scientist. At the beginning and after some tests, the project chemist thought it was reasonable to think that the calculations would all lead to the same conformation and on known molecules, this is the case. As it happens, your calculations and contributions show us that this hypothesis does not hold within the limits of stable chemistry. It's scientifically interesting and as a specialist in Artificial Intelligence, I'm beginning to train models that could help in this direction. So far, it's not working very well, but we're working on it. |
Send message Joined: 5 Mar 20 Posts: 13 Credit: 805,400 RAC: 0 |
What's especially interesting is that in this WU's specific case, we both run the same Linux distribution, yet we find completely different results (the wingmate's converged, mine didn't). Is the computation we run purely deterministic, and the only difference in our different hardware (and presumably a difference in rounding)? Or is there a bit of purposeful random added when looking for the next step? In the end, it probably means this particular molecule is borderline and only weakly stable. I'm not that surprised it happens, I remember some very problematic runs back in the day when I did theoretical chemistry. |
Send message Joined: 11 Apr 20 Posts: 23 Credit: 442,800 RAC: 0 |
A plain rounding issue could be fatal. Maybe contact Prof. Gernot Frenking, theoretical chemist at Philipps-University of Marburg/Germany. Michael. President of Rechenkraft.net - Germany's first and largest distributed computing organization. |
Send message Joined: 23 Jul 19 Posts: 289 Credit: 464,119,561 RAC: 0 |
What's especially interesting is that in this WU's specific case, we both run the same Linux distribution, yet we find completely different results (the wingmate's converged, mine didn't). Is the computation we run purely deterministic, and the only difference in our different hardware (and presumably a difference in rounding)? Or is there a bit of purposeful random added when looking for the next step? It's a question I've asked Thomas, the theoretical chemist, several times, and you can imagine that it was a major concern in setting up the project... Is there any randomness (stochastic gradient descent for example)? The (very big) code that does the calculations is not ours, so I am not able to tell. In any case, when I designed the calculation process for Boinc, I was told that the calculation was deterministic. That's why I suspect problems with floating point precision (hardware). In any case, the validation code is very lax on the comparison of molecular energies, so this is a real divergence! |
Send message Joined: 5 Mar 20 Posts: 13 Credit: 805,400 RAC: 0 |
|
Send message Joined: 27 Apr 20 Posts: 11 Credit: 714,200 RAC: 0 |
Validate Error on this WU for every person who has processed it. Linux or Windows VM, not one validation. WU seems to have ended correctly (at least for me), but it is a very short WU, so maybe that is the problem. Please check This WU Thanks Conan |
Send message Joined: 19 Apr 20 Posts: 1 Credit: 4,120,000 RAC: 0 |
Scientifically, the invalid tasks are very important because they will allow us to train models (Artificial Intelligence) which will allow us in the future to predict the validity or not of the molecules studied with a good precision we hope. Well if that is true, then you should also award these 'invalid tasks', as they help your study... |
Send message Joined: 12 Oct 20 Posts: 9 Credit: 1,502,000 RAC: 0 |
Let me put some additional parts to the discussion on my machines I see much more invalid results on my windows systems with vbox app (> 50%) than on my Linux systems (< 33 %) (Running on the same hardware) Based on this it would be good to check the windows results for invalid results based on "in windows vbox" related issues instead of calculation based issues This will also have a negative effect on you AI based result approval if the AI is not able to identify such results And also this is producing much more bad results than are neccessary With VirtualBox 6.1 on a Windows Host the Windows vbox does produce much more errors during calculation (like the task is no longer manageable, will try again in 24 hours) than on VirtualBox 5.2 (VirtualBox 6.0 is between the both versions with this issue) And when a result is hard shut down (poweroff) with a "no longer manageble" issue it will not get a valid result when it is able to finish the calculation after a restart (next reboot or after 24 hours) - and the results have a CPU load of 100 % when I see the issue happen, so calculation is running. Result that do not have the "no longer manageble" issue will mostly have a valid result But VirtualBox 5.2 and 6.0 are no longer supported
Boinc also delivers Boinc with VirtualBox 6.1 now, so it will be good for you to have a better running Windows vbox app Matthias |
Send message Joined: 23 Jul 19 Posts: 289 Credit: 464,119,561 RAC: 0 |
as soon as we have an engineer... |
©2024 Benoit DA MOTA - LERIA, University of Angers, France