Validation Inconclusive

Message boards : Number crunching : Validation Inconclusive
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Yavanius
Avatar

Send message
Joined: 22 Jul 20
Posts: 10
Credit: 21,000
RAC: 0
Message 961 - Posted: 24 Jul 2020, 21:29:20 UTC

Might just be the occasional errant WU, but there's 2 Validation Inconclusive on Workunit 1413837

One is mine and one is wingman. 3rd wingman hasn't been assigned yet.
ID: 961 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
swiftmallard
Avatar

Send message
Joined: 13 Oct 19
Posts: 87
Credit: 6,026,455
RAC: 0
Message 962 - Posted: 24 Jul 2020, 21:52:55 UTC

An occasional invalid result is normal in this project.
ID: 962 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Yavanius
Avatar

Send message
Joined: 22 Jul 20
Posts: 10
Credit: 21,000
RAC: 0
Message 965 - Posted: 26 Jul 2020, 18:17:09 UTC - in response to Message 962.  

Roger that. Just notable since it wasn't just my work.


Had this been an actual emergency...everybody would be panicking anyways.

~Yav
ID: 965 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Luigi R.

Send message
Joined: 7 Nov 19
Posts: 31
Credit: 4,245,903
RAC: 0
Message 986 - Posted: 28 Jul 2020, 8:19:53 UTC

There is any problem with inconclusive couples.
They are sent to third host soon or later.
More later than soon. :)

Waited for 3 months to know some inconclusive results of mine were the valid ones.
Now resends may be faster because a few tasks remain.
ID: 986 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
damotbe
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Help desk expert

Send message
Joined: 23 Jul 19
Posts: 289
Credit: 464,119,561
RAC: 0
Message 997 - Posted: 3 Aug 2020, 15:25:08 UTC - in response to Message 986.  

Thank you for responding to this issue that has already been discussed. A little reading of the previous topics would have even allowed Yavanius to see that it's a problem that requires intervention on the server code and that it's not simple.
ID: 997 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Conan

Send message
Joined: 27 Apr 20
Posts: 11
Credit: 714,200
RAC: 0
Message 1014 - Posted: 10 Aug 2020, 11:47:59 UTC

I have just completed WU 1384465 and I am the 5th person to do so. All are at Validation Inconclusive. Isn't at least one of them correct?

I have another two I am about to start (1 has just started) and they also have had 4 previous completions all Validation Inconclusive. They are WU 1343731 and WU 1354673
If they can't be validated then they should be cancelled as they are wasting a huge amount of processing time.

I know you said before that even the failures can help (at least I think you said that), but failure after failure (which is what they become if no validations occur), does not help the volunteers with no reward for lots of processing time.

I am not complaining as this is a very well run project and you are a responsive Admin, but I may just abort these work units as I doubt I will get anything for them.

Conan
ID: 1014 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Yavanius
Avatar

Send message
Joined: 22 Jul 20
Posts: 10
Credit: 21,000
RAC: 0
Message 1015 - Posted: 10 Aug 2020, 16:30:40 UTC - in response to Message 997.  

Thank you for responding to this issue that has already been discussed. A little reading of the previous topics would have even allowed Yavanius to see that it's a problem that requires intervention on the server code and that it's not simple.


Few points:

1. I did read. Validation Error is not the same as Validation Inconclusive...that's the most recent posts

2. I don't see criticism of Validation Error posted more than once.

3. The last validation inconclusive topic was back in April. So May-June-July...3 months which brings us to...

4. I don't see anything on the News page of the Top page warning folks that the project is acting as more of a beta than a stable project. They say for every one person who says something, 10 others don't. In BOINC it's probably 10 other don't say anything and 10 just go 'Oh well' (that's the nicer of the sentiments) and move on...


Post it up front, instead of folks have to search out that there is known issues. THAT'S simple...
ID: 1015 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Luigi R.

Send message
Joined: 7 Nov 19
Posts: 31
Credit: 4,245,903
RAC: 0
Message 1017 - Posted: 11 Aug 2020, 14:59:31 UTC - in response to Message 1014.  
Last modified: 11 Aug 2020, 15:20:08 UTC

I know you said before that even the failures can help (at least I think you said that), but failure after failure (which is what they become if no validations occur), does not help the volunteers with no reward for lots of processing time.
As good tasks are validated (and crossed off), only problematic tasks remain and get resent out (e.g. 1374241).
So failures and chances of not getting rewarded are increasing a bit.
ID: 1017 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Michael H.W. Weber
Avatar

Send message
Joined: 11 Apr 20
Posts: 23
Credit: 442,800
RAC: 0
Message 1022 - Posted: 21 Aug 2020, 19:41:16 UTC
Last modified: 21 Aug 2020, 19:45:07 UTC

On all my machines, no single task has been validated starting August 11th.
It is a total of 37 tasks, many of these are running quite long (sometimes a few days).
I suspect something is generally wrong with work packets handed out since that date for Linux OS (I have actually stopped supporting this project using Windows OS due to the Virtualbox approach which is way too ressource hungry while the similar Linux tasks run smoothly - until August 11th.).

I checked all work packets and I found that NONE of the many wingmen working on the companion tasks have returned a single valid task, too.
That is the reason why I believe you need to check your work packets.
I have now suspended retrieving work packets until this issue has been resolved.
The machines returned most of this project's tasks properly before August 11th and currently work flawlessly for other DC projects in parallel. So it is no issue at my end.

Michael.
President of Rechenkraft.net - Germany's first and largest distributed computing organization.
ID: 1022 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 3 Oct 19
Posts: 153
Credit: 32,412,973
RAC: 0
Message 1023 - Posted: 21 Aug 2020, 20:16:08 UTC - in response to Message 1022.  
Last modified: 21 Aug 2020, 20:32:17 UTC

Michael,

I have many invalids and inconclusives too, but I have had several validate within the last day.
Since I can't see your machines, I can't comment beyond that.
https://quchempedia.univ-angers.fr/athome/results.php?userid=31
ID: 1023 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Michael H.W. Weber
Avatar

Send message
Joined: 11 Apr 20
Posts: 23
Credit: 442,800
RAC: 0
Message 1024 - Posted: 22 Aug 2020, 8:56:44 UTC
Last modified: 22 Aug 2020, 9:02:06 UTC

A typical log:

Stderr Ausgabe

<core_client_version>7.9.3</core_client_version>
<![CDATA[
<stderr_txt>
04:57:24 (6476): wrapper (7.5.26014): starting
04:57:24 (6476): wrapper: running worker.sh ()
Jobs starts with 1 cores
STEP OPT : Starting
Create output archive
OPT.out
Normal termination.
13:30:20 (6476): worker.sh exited; CPU time 1.953940
13:30:20 (6476): called boinc_finish(0)

</stderr_txt>
]]>

I only process the more demanding NWChem long tasks with Linux since around Agust 11th. Before, I had all tasks in the works. The last correctly validated "long" tasks were returned on August 9th.

Maybe it is just an issue with these long tasks then?
The project lead has to specifically take a look at these work packets injected into the system around August 11th, I think.

Michael.
President of Rechenkraft.net - Germany's first and largest distributed computing organization.
ID: 1024 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Luigi R.

Send message
Joined: 7 Nov 19
Posts: 31
Credit: 4,245,903
RAC: 0
Message 1025 - Posted: 22 Aug 2020, 10:48:49 UTC - in response to Message 1024.  

[...]
I only process the more demanding NWChem long tasks with Linux since around Agust 11th. Before, I had all tasks in the works. The last correctly validated "long" tasks were returned on August 9th.

Maybe it is just an issue with these long tasks then?
The project lead has to specifically take a look at these work packets injected into the system around August 11th, I think.

Michael.

Michael, there isn't any problem with long tasks.
Mostly "multiple inconclusive"/"validate error" WUs are circulating.
Good tasks are gone, all assigned to someone.
If you request new tasks now, you probably get those tasks that concern unstable molecules.
Sometimes you will get good expired tasks, that could be probably not problematic.
ID: 1025 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Michael H.W. Weber
Avatar

Send message
Joined: 11 Apr 20
Posts: 23
Credit: 442,800
RAC: 0
Message 1026 - Posted: 22 Aug 2020, 14:25:48 UTC - in response to Message 1025.  
Last modified: 22 Aug 2020, 14:34:29 UTC

Again, ALL tasks of type long released after August 11th appear faulty.
For each invalidated task, you will finde wingmen crate-wise to confirm these tasks are buggy. Why re-circulate this often then?

Can anyone show me a properly validated one meeting the specs I described above (Linux, long, released after August 11th)?

Michael.
President of Rechenkraft.net - Germany's first and largest distributed computing organization.
ID: 1026 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Luigi R.

Send message
Joined: 7 Nov 19
Posts: 31
Credit: 4,245,903
RAC: 0
Message 1027 - Posted: 22 Aug 2020, 18:34:40 UTC - in response to Message 1026.  

Again, ALL tasks of type long released after August 11th appear faulty.
For each invalidated task, you will finde wingmen crate-wise to confirm these tasks are buggy. Why re-circulate this often then?
I observed sometimes they get validated.
https://quchempedia.univ-angers.fr/athome/workunit.php?wuid=1383666

Admin said faulty tasks are useful too.
https://quchempedia.univ-angers.fr/athome/forum_thread.php?id=104&postid=941#941



Can anyone show me a properly validated one meeting the specs I described above (Linux, long, released after August 11th)?

Michael.
Personally I can't. I downloaded all my tasks before.

Anyway I can link something:
https://quchempedia.univ-angers.fr/athome/workunit.php?wuid=1360632
https://quchempedia.univ-angers.fr/athome/workunit.php?wuid=1360629
https://quchempedia.univ-angers.fr/athome/workunit.php?wuid=1360365
https://quchempedia.univ-angers.fr/athome/workunit.php?wuid=1382404
https://quchempedia.univ-angers.fr/athome/workunit.php?wuid=1360317
https://quchempedia.univ-angers.fr/athome/workunit.php?wuid=1360885
etc...
Go to top computers and check yourself by setting valid long tasks on Linux systems.
e.g. https://quchempedia.univ-angers.fr/athome/results.php?hostid=884&offset=0&show_names=0&state=4&appid=3
ID: 1027 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 3 Oct 19
Posts: 153
Credit: 32,412,973
RAC: 0
Message 1028 - Posted: 22 Aug 2020, 21:41:31 UTC

In looking through my last 10 valids, one thing sticks out like a sore thumb (American expression):
My Linux (Ubuntu 18.04) machine validates only against other Linux machines. It does not validate against Windows machines running VirtualBox.

That is of no particular interest to me, and perhaps not very surprising. Whether it can be improved on I don't know, but mention it for what it is worth.

The fact that there have not been any new longs since August 11 accounts for the facts that there have not been any new valids since then, and is of no consequence. I can do the shorts.
ID: 1028 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Michael H.W. Weber
Avatar

Send message
Joined: 11 Apr 20
Posts: 23
Credit: 442,800
RAC: 0
Message 1030 - Posted: 23 Aug 2020, 12:42:35 UTC - in response to Message 1028.  
Last modified: 23 Aug 2020, 12:50:57 UTC

Well, for some computations validation Linux vs. Windows do not work.
If this was the case here, too, the project lead should quickly change their server system such that WUs are not cross-OS validated.
A simple test comparison of a set of identical tasks calculated on Linux and Windows machines should do to clarify what is going on.

Michael.

[edit]: In my case, however, the LONG tasks are mainly delivered to Linux systems (only a beta-tester app for Windows has been released for these LONG tasks). Still, ALL tasks are invalidated, so it is not a cross-validation issue here.
President of Rechenkraft.net - Germany's first and largest distributed computing organization.
ID: 1030 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 3 Oct 19
Posts: 153
Credit: 32,412,973
RAC: 0
Message 1031 - Posted: 23 Aug 2020, 13:40:49 UTC - in response to Message 1030.  

If this was the case here, too, the project lead should quickly change their server system such that WUs are not cross-OS validated.
A simple test comparison of a set of identical tasks calculated on Linux and Windows machines should do to clarify what is going on.

Yes, I would think they would want to do that, to save their time and ours.

By the way, I am up to the _6, _7 and _8 on the longs, so they will be running out shortly if they do not add more.
But there is a good supply of shorts.
ID: 1031 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
damotbe
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Help desk expert

Send message
Joined: 23 Jul 19
Posts: 289
Credit: 464,119,561
RAC: 0
Message 1032 - Posted: 24 Aug 2020, 15:02:43 UTC - in response to Message 1031.  

You always make a lot of assumptions... Our validation is not strict but loose. The difference in OS and/or CPU model has not been an issue for a few months now.

On the other hand, we are at the end of the batch and indeed only the most unstable molecular systems remain. I understand how painful it is not to see any task validated. This is inherent to the project and I thank the volunteers who continue to help us despite the conditions.

I wish I could reward your efforts, but I haven't figured out how to tweak the Validator to allow for sorting of results and earning of credits.
ID: 1032 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 3 Oct 19
Posts: 153
Credit: 32,412,973
RAC: 0
Message 1037 - Posted: 24 Aug 2020, 21:57:21 UTC - in response to Message 1032.  

I wish I could reward your efforts, but I haven't figured out how to tweak the Validator to allow for sorting of results and earning of credits.

I don't look at credits, so don't spend your time for me.
But I am wondering if you need 8 attempts at validation? Maybe 5 would do. But you know that best of course.
ID: 1037 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
damotbe
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Help desk expert

Send message
Joined: 23 Jul 19
Posts: 289
Credit: 464,119,561
RAC: 0
Message 1038 - Posted: 25 Aug 2020, 7:29:03 UTC - in response to Message 1037.  

I also find that 8 tries is too much... I don't know if it's possible to change this behavior, but cancelled jobs are considered as error runs...I had been forced to go up to 8 because too many people were taking a lot of workunits and then cancelling.
ID: 1038 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : Validation Inconclusive

©2024 Benoit DA MOTA - LERIA, University of Angers, France