Host ID 1388 corrupted

Message boards : Number crunching : Host ID 1388 corrupted
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 7 · Next

AuthorMessage
V0d01ey

Send message
Joined: 15 Apr 20
Posts: 31
Credit: 379,600
RAC: 0
Message 1211 - Posted: 25 Nov 2020, 18:10:54 UTC

ID: 1211 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
V0d01ey

Send message
Joined: 15 Apr 20
Posts: 31
Credit: 379,600
RAC: 0
Message 1212 - Posted: 26 Nov 2020, 5:38:06 UTC

ID: 1212 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
V0d01ey

Send message
Joined: 15 Apr 20
Posts: 31
Credit: 379,600
RAC: 0
Message 1214 - Posted: 26 Nov 2020, 21:06:09 UTC

ID: 1214 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
PHILIPPE

Send message
Joined: 4 Jan 20
Posts: 60
Credit: 516,736
RAC: 0
Message 1215 - Posted: 26 Nov 2020, 21:19:54 UTC - in response to Message 1214.  

All the computers listed (that failed) are on ubuntu 20.04.
As jim1348 said , it should be good to understand why the upgrade from 18.04 to 20.04 is not successfull for this project...Time spending ,more and more computers should become unfortunately faulty.
ID: 1215 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
damotbe
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Help desk expert

Send message
Joined: 23 Jul 19
Posts: 289
Credit: 464,119,561
RAC: 0
Message 1216 - Posted: 27 Nov 2020, 10:05:08 UTC - in response to Message 1215.  

At the moment, I have no idea... no informative message to exploit. I'll try to reproduce the bogus.
ID: 1216 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
damotbe
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Help desk expert

Send message
Joined: 23 Jul 19
Posts: 289
Credit: 464,119,561
RAC: 0
Message 1217 - Posted: 27 Nov 2020, 14:36:29 UTC - in response to Message 1216.  

After verification, a lot of Ubuntu 20.04 hosts work perfectly. There is something, but I don't find what !
ID: 1217 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
V0d01ey

Send message
Joined: 15 Apr 20
Posts: 31
Credit: 379,600
RAC: 0
Message 1221 - Posted: 28 Nov 2020, 19:20:34 UTC

ID: 1221 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
damotbe
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Help desk expert

Send message
Joined: 23 Jul 19
Posts: 289
Credit: 464,119,561
RAC: 0
Message 1222 - Posted: 29 Nov 2020, 9:18:41 UTC - in response to Message 1221.  

Done. Thank you
ID: 1222 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
V0d01ey

Send message
Joined: 15 Apr 20
Posts: 31
Credit: 379,600
RAC: 0
Message 1224 - Posted: 2 Dec 2020, 20:54:25 UTC

ID: 1224 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
damotbe
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Help desk expert

Send message
Joined: 23 Jul 19
Posts: 289
Credit: 464,119,561
RAC: 0
Message 1225 - Posted: 3 Dec 2020, 9:50:45 UTC - in response to Message 1224.  

Not an Ubuntu 20.04 ! thanks
ID: 1225 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
V0d01ey

Send message
Joined: 15 Apr 20
Posts: 31
Credit: 379,600
RAC: 0
Message 1231 - Posted: 12 Dec 2020, 20:45:59 UTC

ID: 1231 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
PHILIPPE

Send message
Joined: 4 Jan 20
Posts: 60
Credit: 516,736
RAC: 0
Message 1232 - Posted: 12 Dec 2020, 22:05:55 UTC - in response to Message 1231.  

There is a little change in the stderr file for the last faulty computers :

Before it was :
<core_client_version>7.16.11</core_client_version>
<![CDATA[
<stderr_txt>
09:56:58 (3961308): wrapper (7.5.26014): starting
09:56:58 (3961308): wrapper: running worker.sh ()
Jobs starts with 1 cores
STEP OPT : Starting
Create output archive
OPT.out

Normal termination.
09:57:00 (3961308): worker.sh exited; CPU time 1.060887
09:57:00 (3961308): called boinc_finish(0)

</stderr_txt>
]]>


Now it is :
<core_client_version>7.16.11</core_client_version>
<![CDATA[
<stderr_txt>
04:22:02 (231697): wrapper (7.5.26014): starting
04:22:02 (231697): wrapper: running worker.sh ()
Jobs starts with 1 cores
STEP OPT : Starting
./run.sh: line 41: 231715 Segmentation fault (core dumped) mpirun --allow-run-as-root -np $NP nwchem OPT.nw &> OPT.out
Create output archive
OPT.out

Normal termination.
04:22:04 (231697): worker.sh exited; CPU time 1.438993
04:22:04 (231697): called boinc_finish(0)

</stderr_txt>
]]>


Segmentation fault appears.Always in line 41.
Maybe here are tips to solve it (in french)
ID: 1232 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
damotbe
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Help desk expert

Send message
Joined: 23 Jul 19
Posts: 289
Credit: 464,119,561
RAC: 0
Message 1233 - Posted: 15 Dec 2020, 7:28:57 UTC - in response to Message 1232.  

Thank you.

Unfortunately, "segmentation fault" gives the cause, not the root of the problem :(
At the moment, I suspect Glib or Kernel major modifications.
ID: 1233 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
damotbe
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Help desk expert

Send message
Joined: 23 Jul 19
Posts: 289
Credit: 464,119,561
RAC: 0
Message 1240 - Posted: 17 Dec 2020, 16:34:55 UTC - in response to Message 1233.  

here are all blacklisted Linux hosts. If you have an idea, you are welcome !

MariaDB [boinc]> select id, os_name, os_version  from host where max_results_day=-1 and os_name regexp "Linux.*";
+------+------------------+----------------------------------------------------------------------------------+
| id   | os_name          | os_version                                                                       |
+------+------------------+----------------------------------------------------------------------------------+
|  365 | Linux Debian     | Debian GNU/Linux bullseye/sid [5.9.0-3-amd64|libc 2.31 (Debian GLIBC 2.31-4)]    |
| 1003 | Linux Ubuntu     | Ubuntu 20.04.1 LTS [5.4.0-54-generic|libc 2.31 (Ubuntu GLIBC 2.31-0ubuntu9.1)]   |
| 1006 | Linux Ubuntu     | Ubuntu 20.04.1 LTS [5.4.0-54-generic|libc 2.31 (Ubuntu GLIBC 2.31-0ubuntu9.1)]   |
| 1007 | Linux Ubuntu     | Ubuntu 20.04.1 LTS [5.4.0-54-generic|libc 2.31 (Ubuntu GLIBC 2.31-0ubuntu9.1)]   |
| 1008 | Linux Ubuntu     | Ubuntu 20.04.1 LTS [5.4.0-54-generic|libc 2.31 (Ubuntu GLIBC 2.31-0ubuntu9.1)]   |
| 1388 | Linux Ubuntu     | Ubuntu 18.04.5 LTS [4.15.0-112-generic|libc 2.27 (Ubuntu GLIBC 2.27-3ubuntu1.2)] |
| 1912 | Linux Ubuntu     | Ubuntu 18.04.5 LTS [4.15.0-122-generic|libc 2.27 (Ubuntu GLIBC 2.27-3ubuntu1.2)] |
| 1967 | Linux Ubuntu     | Ubuntu 20.04.1 LTS [5.4.0-54-generic|libc 2.31 (Ubuntu GLIBC 2.31-0ubuntu9.1)]   |
| 3664 | Linux Ubuntu     | Ubuntu 20.04 LTS [5.3.0-59-generic|libc 2.31 (Ubuntu GLIBC 2.31-0ubuntu9)]       |
| 3685 | Linux Ubuntu     | Ubuntu 20.04 LTS [5.4.0-47-generic|libc 2.31 (Ubuntu GLIBC 2.31-0ubuntu9)]       |
| 3712 | Linux Ubuntu     | Ubuntu 20.04 LTS [4.19.107-Unraid|libc 2.31 (Ubuntu GLIBC 2.31-0ubuntu9)]        |
| 3713 | Linux Ubuntu     | Ubuntu 20.04.1 LTS [5.4.74-1-lts|libc 2.31 (Ubuntu GLIBC 2.31-0ubuntu9.1)]       |
| 3733 | Linux Ubuntu     | Ubuntu 20.04 LTS [5.4.0-47-generic|libc 2.31 (Ubuntu GLIBC 2.31-0ubuntu9)]       |
| 4129 | Linux Arch Linux | Arch Linux [5.8.14-arch1-1|libc 2.32 (GNU libc)]                                 |
| 4443 | Linux Ubuntu     | Ubuntu 20.04 LTS [5.4.68-1-lts|libc 2.31 (Ubuntu GLIBC 2.31-0ubuntu9)]           |
| 4505 | Linux Ubuntu     | Ubuntu 18.04.4 LTS [5.4.0-48-generic|libc 2.27 (Ubuntu GLIBC 2.27-3ubuntu1)]     |
| 4790 | Linux Ubuntu     | Ubuntu 20.04 LTS [5.4.0-52-generic|libc 2.31 (Ubuntu GLIBC 2.31-0ubuntu9)]       |
| 4796 | Linux Ubuntu     | Ubuntu 18.04.4 LTS [5.4.0-52-generic|libc 2.27 (Ubuntu GLIBC 2.27-3ubuntu1)]     |
| 4832 | Linux Ubuntu     | Ubuntu 20.04 LTS [4.9.0-12-amd64|libc 2.31 (Ubuntu GLIBC 2.31-0ubuntu9)]         |
| 5132 | Linux Ubuntu     | Ubuntu 20.04.1 LTS [5.4.0-47-generic|libc 2.31 (Ubuntu GLIBC 2.31-0ubuntu9)]     |
| 5157 | Linux Ubuntu     | Ubuntu 20.04.1 LTS [4.4.59+|libc 2.31 (Ubuntu GLIBC 2.31-0ubuntu9.1)]            |
| 5306 | Linux Ubuntu     | Ubuntu 20.04.1 LTS [5.4.0-56-generic|libc 2.31 (Ubuntu GLIBC 2.31-0ubuntu9.1)]   |
+------+------------------+----------------------------------------------------------------------------------+

ID: 1240 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Tullio

Send message
Joined: 5 Sep 20
Posts: 103
Credit: 2,142,600
RAC: 0
Message 1242 - Posted: 18 Dec 2020, 5:38:09 UTC - in response to Message 1240.  

My Linux Virtual Machine enlisted in Science United has a 5.9.14 kernel and runs nwchem.
Tullio
ID: 1242 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[AF>France>Est>Alsace]PFLIEGER...

Send message
Joined: 16 Nov 20
Posts: 21
Credit: 3,661,600
RAC: 0
Message 1243 - Posted: 18 Dec 2020, 6:55:14 UTC

to have good results you need to make the processors at 50% inside of the boinc manager at options compute preference
learn to study such options . inside of QuChem it is the key of the success
ID: 1243 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[AF>France>Est>Alsace]PFLIEGER...

Send message
Joined: 16 Nov 20
Posts: 21
Credit: 3,661,600
RAC: 0
Message 1244 - Posted: 18 Dec 2020, 12:04:42 UTC - in response to Message 1240.  

truly said the tasks must be runned in 2 CPU like i am doing
to do that i put inside of the options of the boinc manager at the computing choice the number of processor at 50% and the computing power at 100% (second line)
The result is to compute with the full hearth and not sepa rate in a lot of processors that can not compute and make bad result. The same system is to use by LHC@home
Please compare my results to other computer.
The gaussian graphics i sended at the university to olivier LAurech came from a data treatment of the computing time from my several computers by Q CHEM

The back result you have i dont know but nobody worry about me!

best regards

maya2
ID: 1244 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Tullio

Send message
Joined: 5 Sep 20
Posts: 103
Credit: 2,142,600
RAC: 0
Message 1245 - Posted: 18 Dec 2020, 12:42:45 UTC

I have a very average CPU, an Intel i5 9400F running Windows 10 and an old HP laptop with an AMD E-450 CPU running SuSE Leap 15.0. Yet I am number 44 in RAC rank. My other CPU, a Linux Virtual Maxchine with SuSE Tumbleweed and kernel 5.9.14 is enlisted in Science United and does not contribute to my credits, but I see it running nwchem.
Tullio
ID: 1245 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
PHILIPPE

Send message
Joined: 4 Jan 20
Posts: 60
Credit: 516,736
RAC: 0
Message 1246 - Posted: 18 Dec 2020, 14:38:23 UTC - in response to Message 1245.  

I notice some computers in your list are the same.
I think ,the owners of some computers faulty retryed to access to Quchempedia project , several times.

If we take care of the features , these different id computers seem to be identical :

(1003<->1007)
(1008<->1967)
(3713<->4443<->5132)
(3733<->4790<->5306)
(4505<->4796)
ID: 1246 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[AF>France>Est>Alsace]PFLIEGER...

Send message
Joined: 16 Nov 20
Posts: 21
Credit: 3,661,600
RAC: 0
Message 1251 - Posted: 18 Dec 2020, 17:24:21 UTC - in response to Message 1246.  

i had 2 account but made evrytime good work.
The first account was deleted because of piracy of my mailbox
The second was created to help my Alsace team inside of l'Alliance francophone
I didn't change computers
I want that France get forward And it is my country
I suffer from a genetic disease and want find solution for such problem
what i saw in the last days is that people don't look thechnical requirement in the forum and do bad work without thinking that they are playing with the life of unknown people
People who compute only errors in 1 To 7 second when tasks need 690 to 14000 second to succed in good result are bad guys
Everybody must have a sensation of responsibility in his engagement
Nobody want bad medicine or wrong computers in the shop .This is the reason why everybody must see that a false game could come back in his flat, car, plane or boat
I didn't want that such politic take place
My computers are 5061,5067,5058 and 5062
I wish it stay now always so.

to everybody put the number of processors in your boinc manager to 50% an the computing time to 100%

Best regards

maya2
ID: 1251 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 . . . 7 · Next

Message boards : Number crunching : Host ID 1388 corrupted

©2024 Benoit DA MOTA - LERIA, University of Angers, France