Message boards :
Number crunching :
Validation Inconclusive
Message board moderation
Author | Message |
---|---|
Send message Joined: 22 Jul 20 Posts: 10 Credit: 21,000 RAC: 0 |
Might just be the occasional errant WU, but there's 2 Validation Inconclusive on Workunit 1413837 One is mine and one is wingman. 3rd wingman hasn't been assigned yet. |
Send message Joined: 13 Oct 19 Posts: 87 Credit: 6,026,455 RAC: 0 |
An occasional invalid result is normal in this project. |
Send message Joined: 22 Jul 20 Posts: 10 Credit: 21,000 RAC: 0 |
Roger that. Just notable since it wasn't just my work. Had this been an actual emergency...everybody would be panicking anyways. ~Yav |
Send message Joined: 7 Nov 19 Posts: 31 Credit: 4,245,903 RAC: 0 |
There is any problem with inconclusive couples. They are sent to third host soon or later. More later than soon. :) Waited for 3 months to know some inconclusive results of mine were the valid ones. Now resends may be faster because a few tasks remain. |
Send message Joined: 23 Jul 19 Posts: 289 Credit: 464,119,561 RAC: 0 |
Thank you for responding to this issue that has already been discussed. A little reading of the previous topics would have even allowed Yavanius to see that it's a problem that requires intervention on the server code and that it's not simple. |
Send message Joined: 27 Apr 20 Posts: 11 Credit: 714,200 RAC: 0 |
I have just completed WU 1384465 and I am the 5th person to do so. All are at Validation Inconclusive. Isn't at least one of them correct? I have another two I am about to start (1 has just started) and they also have had 4 previous completions all Validation Inconclusive. They are WU 1343731 and WU 1354673 If they can't be validated then they should be cancelled as they are wasting a huge amount of processing time. I know you said before that even the failures can help (at least I think you said that), but failure after failure (which is what they become if no validations occur), does not help the volunteers with no reward for lots of processing time. I am not complaining as this is a very well run project and you are a responsive Admin, but I may just abort these work units as I doubt I will get anything for them. Conan |
Send message Joined: 22 Jul 20 Posts: 10 Credit: 21,000 RAC: 0 |
Thank you for responding to this issue that has already been discussed. A little reading of the previous topics would have even allowed Yavanius to see that it's a problem that requires intervention on the server code and that it's not simple. Few points: 1. I did read. Validation Error is not the same as Validation Inconclusive...that's the most recent posts 2. I don't see criticism of Validation Error posted more than once. 3. The last validation inconclusive topic was back in April. So May-June-July...3 months which brings us to... 4. I don't see anything on the News page of the Top page warning folks that the project is acting as more of a beta than a stable project. They say for every one person who says something, 10 others don't. In BOINC it's probably 10 other don't say anything and 10 just go 'Oh well' (that's the nicer of the sentiments) and move on... Post it up front, instead of folks have to search out that there is known issues. THAT'S simple... |
Send message Joined: 7 Nov 19 Posts: 31 Credit: 4,245,903 RAC: 0 |
I know you said before that even the failures can help (at least I think you said that), but failure after failure (which is what they become if no validations occur), does not help the volunteers with no reward for lots of processing time.As good tasks are validated (and crossed off), only problematic tasks remain and get resent out (e.g. 1374241). So failures and chances of not getting rewarded are increasing a bit. |
Send message Joined: 11 Apr 20 Posts: 23 Credit: 442,800 RAC: 0 |
On all my machines, no single task has been validated starting August 11th. It is a total of 37 tasks, many of these are running quite long (sometimes a few days). I suspect something is generally wrong with work packets handed out since that date for Linux OS (I have actually stopped supporting this project using Windows OS due to the Virtualbox approach which is way too ressource hungry while the similar Linux tasks run smoothly - until August 11th.). I checked all work packets and I found that NONE of the many wingmen working on the companion tasks have returned a single valid task, too. That is the reason why I believe you need to check your work packets. I have now suspended retrieving work packets until this issue has been resolved. The machines returned most of this project's tasks properly before August 11th and currently work flawlessly for other DC projects in parallel. So it is no issue at my end. Michael. President of Rechenkraft.net - Germany's first and largest distributed computing organization. |
Send message Joined: 3 Oct 19 Posts: 153 Credit: 32,412,973 RAC: 0 |
Michael, I have many invalids and inconclusives too, but I have had several validate within the last day. Since I can't see your machines, I can't comment beyond that. https://quchempedia.univ-angers.fr/athome/results.php?userid=31 |
Send message Joined: 11 Apr 20 Posts: 23 Credit: 442,800 RAC: 0 |
A typical log: Stderr Ausgabe <core_client_version>7.9.3</core_client_version> <![CDATA[ <stderr_txt> 04:57:24 (6476): wrapper (7.5.26014): starting 04:57:24 (6476): wrapper: running worker.sh () Jobs starts with 1 cores STEP OPT : Starting Create output archive OPT.out Normal termination. 13:30:20 (6476): worker.sh exited; CPU time 1.953940 13:30:20 (6476): called boinc_finish(0) </stderr_txt> ]]> I only process the more demanding NWChem long tasks with Linux since around Agust 11th. Before, I had all tasks in the works. The last correctly validated "long" tasks were returned on August 9th. Maybe it is just an issue with these long tasks then? The project lead has to specifically take a look at these work packets injected into the system around August 11th, I think. Michael. President of Rechenkraft.net - Germany's first and largest distributed computing organization. |
Send message Joined: 7 Nov 19 Posts: 31 Credit: 4,245,903 RAC: 0 |
[...] Michael, there isn't any problem with long tasks. Mostly "multiple inconclusive"/"validate error" WUs are circulating. Good tasks are gone, all assigned to someone. If you request new tasks now, you probably get those tasks that concern unstable molecules. Sometimes you will get good expired tasks, that could be probably not problematic. |
Send message Joined: 11 Apr 20 Posts: 23 Credit: 442,800 RAC: 0 |
Again, ALL tasks of type long released after August 11th appear faulty. For each invalidated task, you will finde wingmen crate-wise to confirm these tasks are buggy. Why re-circulate this often then? Can anyone show me a properly validated one meeting the specs I described above (Linux, long, released after August 11th)? Michael. President of Rechenkraft.net - Germany's first and largest distributed computing organization. |
Send message Joined: 7 Nov 19 Posts: 31 Credit: 4,245,903 RAC: 0 |
Again, ALL tasks of type long released after August 11th appear faulty.I observed sometimes they get validated. https://quchempedia.univ-angers.fr/athome/workunit.php?wuid=1383666 Admin said faulty tasks are useful too. https://quchempedia.univ-angers.fr/athome/forum_thread.php?id=104&postid=941#941 Can anyone show me a properly validated one meeting the specs I described above (Linux, long, released after August 11th)?Personally I can't. I downloaded all my tasks before. Anyway I can link something: https://quchempedia.univ-angers.fr/athome/workunit.php?wuid=1360632 https://quchempedia.univ-angers.fr/athome/workunit.php?wuid=1360629 https://quchempedia.univ-angers.fr/athome/workunit.php?wuid=1360365 https://quchempedia.univ-angers.fr/athome/workunit.php?wuid=1382404 https://quchempedia.univ-angers.fr/athome/workunit.php?wuid=1360317 https://quchempedia.univ-angers.fr/athome/workunit.php?wuid=1360885 etc... Go to top computers and check yourself by setting valid long tasks on Linux systems. e.g. https://quchempedia.univ-angers.fr/athome/results.php?hostid=884&offset=0&show_names=0&state=4&appid=3 |
Send message Joined: 3 Oct 19 Posts: 153 Credit: 32,412,973 RAC: 0 |
In looking through my last 10 valids, one thing sticks out like a sore thumb (American expression): My Linux (Ubuntu 18.04) machine validates only against other Linux machines. It does not validate against Windows machines running VirtualBox. That is of no particular interest to me, and perhaps not very surprising. Whether it can be improved on I don't know, but mention it for what it is worth. The fact that there have not been any new longs since August 11 accounts for the facts that there have not been any new valids since then, and is of no consequence. I can do the shorts. |
Send message Joined: 11 Apr 20 Posts: 23 Credit: 442,800 RAC: 0 |
Well, for some computations validation Linux vs. Windows do not work. If this was the case here, too, the project lead should quickly change their server system such that WUs are not cross-OS validated. A simple test comparison of a set of identical tasks calculated on Linux and Windows machines should do to clarify what is going on. Michael. [edit]: In my case, however, the LONG tasks are mainly delivered to Linux systems (only a beta-tester app for Windows has been released for these LONG tasks). Still, ALL tasks are invalidated, so it is not a cross-validation issue here. President of Rechenkraft.net - Germany's first and largest distributed computing organization. |
Send message Joined: 3 Oct 19 Posts: 153 Credit: 32,412,973 RAC: 0 |
If this was the case here, too, the project lead should quickly change their server system such that WUs are not cross-OS validated. Yes, I would think they would want to do that, to save their time and ours. By the way, I am up to the _6, _7 and _8 on the longs, so they will be running out shortly if they do not add more. But there is a good supply of shorts. |
Send message Joined: 23 Jul 19 Posts: 289 Credit: 464,119,561 RAC: 0 |
You always make a lot of assumptions... Our validation is not strict but loose. The difference in OS and/or CPU model has not been an issue for a few months now. On the other hand, we are at the end of the batch and indeed only the most unstable molecular systems remain. I understand how painful it is not to see any task validated. This is inherent to the project and I thank the volunteers who continue to help us despite the conditions. I wish I could reward your efforts, but I haven't figured out how to tweak the Validator to allow for sorting of results and earning of credits. |
Send message Joined: 3 Oct 19 Posts: 153 Credit: 32,412,973 RAC: 0 |
I wish I could reward your efforts, but I haven't figured out how to tweak the Validator to allow for sorting of results and earning of credits. I don't look at credits, so don't spend your time for me. But I am wondering if you need 8 attempts at validation? Maybe 5 would do. But you know that best of course. |
Send message Joined: 23 Jul 19 Posts: 289 Credit: 464,119,561 RAC: 0 |
I also find that 8 tries is too much... I don't know if it's possible to change this behavior, but cancelled jobs are considered as error runs...I had been forced to go up to 8 because too many people were taking a lot of workunits and then cancelling. |
©2024 Benoit DA MOTA - LERIA, University of Angers, France