Message boards :
Number crunching :
WU failures
Message board moderation
Author | Message |
---|---|
Send message Joined: 3 Oct 19 Posts: 14 Credit: 32,908,253 RAC: 0 |
I'm running Linux Mint 19 and I'm seeing immediate failures on the Intel_mt WU. The t1 and t2 WU appear to be running although completion time estimates vary between 5 minutes and 20 hours. The failure on the Intel_mt is saying execv() failed: : Permission denied I'm trying 2 different machines: Intel E5-2684 V4 and AMD 2990WX and both fail on the Intel_mt WU. |
Send message Joined: 23 Jul 19 Posts: 289 Credit: 464,119,561 RAC: 0 |
It seems that the executable was not found (or permission issue). Try to detach and reattach the project. |
Send message Joined: 3 Oct 19 Posts: 14 Credit: 32,908,253 RAC: 0 |
I attached another instance and got the same almost instantaneous failure on the 12 intel_mt WU. I opened up the permissions on the project folder to rw for everyone. It failed another 13 WU. The only executable I see in the folder is the wrapper. There are quite a few tar balls. If it wil help HERE are my hosts. I did have a t1 WU complete and validate. I have a few other t1 and t2 WU that have been running for several hours. |
Send message Joined: 3 Oct 19 Posts: 33 Credit: 197,169 RAC: 0 |
>>> dsgdb9nsd_nwchem,bath01,000007288,nwchem,1569028807 That one failed after less than a minute. It has failed on every machine that it has been sent too. The end status is a helpful: Exit status 0 (0x00000000) >>> 2019-10-03 19:07:29 (25304): Successfully started VM. (PID = '26680') 2019-10-03 19:07:29 (25304): Reporting VM Process ID to BOINC. 2019-10-03 19:07:34 (25304): Guest Log: BIOS: KBD: unsupported int 16h function 03 2019-10-03 19:07:34 (25304): Guest Log: BIOS: AX=0305 BX=0000 CX=0000 DX=0000 2019-10-03 19:07:34 (25304): Guest Log: vgdrvHeartbeatInit: Setting up heartbeat to trigger every 2000 milliseconds 2019-10-03 19:07:34 (25304): Guest Log: vboxguest: misc device minor 59, IRQ 20, I/O port d020, MMIO at 00000000f0000000 (size 0x400000) 2019-10-03 19:07:34 (25304): VM state change detected. (old = 'poweroff', new = 'running') 2019-10-03 19:07:39 (25304): Guest Log: vboxsf: g_fHostFeatures=0x1 g_fSfFeatures=0x0 g_uSfLastFunction=20 2019-10-03 19:07:44 (25304): Preference change detected 2019-10-03 19:07:44 (25304): Setting CPU throttle for VM. (100%) 2019-10-03 19:07:44 (25304): Setting checkpoint interval to 600 seconds. (Higher value of (Preference: 60 seconds) or (Vbox_job.xml: 600 seconds)) 2019-10-03 19:07:46 (25304): VM is no longer is a running state. It is in 'poweroff'. 2019-10-03 19:07:46 (25304): VM state change detected. (old = 'running', new = 'poweroff') 2019-10-03 19:07:46 (25304): Powering off VM. <<< Perhaps interesting, not all machines it has been sent to have exited in the same way. One is "Error while computing" this one has an "Intel_mt", those showing "Validate error" have "vbox_t1". Other work units have run to completion, returned and validated. I will leave the project enabled on this machine for the time being at least, as it does not seem to be wasting too much crunching time. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
Send message Joined: 23 Jul 19 Posts: 289 Credit: 464,119,561 RAC: 0 |
Thank you for your help and patience. 24h since we open to new volunteers and new problems. Sad, but not surprising... We are making investigations, multiple code versions are running simultaneously and I am beginner in Boinc project management... Stability will occur... one day ;-) |
Send message Joined: 23 Jul 19 Posts: 289 Credit: 464,119,561 RAC: 0 |
Looking at this specific result and WU. It seems that the software runs but chemistry question crash the software. The chemist responds that yes, it will occurs 10-20% of the time but it will crash quickly ! |
Send message Joined: 3 Oct 19 Posts: 33 Credit: 197,169 RAC: 0 |
I have not had a reply from Will yet, but he may simply be away. I'll hold that matter open. The failing tasks fail very quickly, I hope there is something in the error log there that helps. Feel free to ask if there is something I can do to help, and there are lots of us. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
Send message Joined: 3 Oct 19 Posts: 33 Credit: 197,169 RAC: 0 |
The failing tasks have, until today, failed after less than a minute, but today, I have had two work units that failed after several hours of crunching. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
Send message Joined: 23 Jul 19 Posts: 289 Credit: 464,119,561 RAC: 0 |
The failing tasks have, until today, failed after less than a minute, but today, I have had two work units that failed after several hours of crunching. It's more annoying... Crash are always possible after the start but it's quite uncommon. |
©2024 Benoit DA MOTA - LERIA, University of Angers, France