Workunits failure after upgrading Debian to 11 (bullseye)

Questions and Answers : Unix/Linux : Workunits failure after upgrading Debian to 11 (bullseye)
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
YAG

Send message
Joined: 19 Jun 20
Posts: 10
Credit: 2,792,400
RAC: 0
Message 1458 - Posted: 31 Aug 2021, 0:00:32 UTC

I am a Debian user with three computers (two P.C. and a laptop). I executed successfully tasks in the three computers. Recently, I upgraded Debian from 10 (buster) to 11 (bullseye). When upgrading, the boinc client and other projects (Rosetta & Einstein) are working as expected, but not QuChemPedIA. I tried to detach and attach again the project, and the issue persists.

Now, the details. This is my laptop. All the tasks have "invalid" as output, and all of them last no more than 1s of CPU time. I will mention some examples. I could not see an easy way to fix it in the stderr logs of the tasks:

ID: 1458 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Tullio

Send message
Joined: 5 Sep 20
Posts: 103
Credit: 2,142,600
RAC: 0
Message 1459 - Posted: 31 Aug 2021, 8:25:50 UTC

Your Linux kernel has passed from 4.19 to 5.10.Evidently the Linux executable has not been updated.
Tullio
ID: 1459 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 3 Oct 19
Posts: 153
Credit: 32,412,973
RAC: 0
Message 1460 - Posted: 31 Aug 2021, 12:58:39 UTC - in response to Message 1459.  

I have been running Ubuntu 20.04.3 LTS [5.4.0-81-generic|libc 2.31 (Ubuntu GLIBC 2.31-0ubuntu9.2)] for some time with no problems.
https://quchempedia.univ-angers.fr/athome/results.php?hostid=8557&offset=0&show_names=0&state=4&appid=
ID: 1460 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
YAG

Send message
Joined: 19 Jun 20
Posts: 10
Credit: 2,792,400
RAC: 0
Message 1461 - Posted: 1 Sep 2021, 20:20:00 UTC

Interesting, Jim1348. I do not know the reason, Tullio, and there are other relevant changes in the Debian upgrade.

Damotbe told the generator of the project is EvoMol. It was developed on Ubuntu 18.04+ and it is written in Python. Several packages built with Python 2 were removed in Debian 11 (bullseye):
«Python 2 is already beyond its End Of Life, and will receive no security updates. It is not supported for running applications, and packages relying on it have either been switched to Python 3 or removed. However, Debian bullseye does still include a version of Python 2.7, as well as a small number of Python 2 build tools such as python-setuptools. These are present only because they are required for a few application build processes that have not yet been converted to Python 3.»
I am sure this incident will be solved soon, because 15 of the 20 top computers in average credit are using Debian 10 (Buster).
ID: 1461 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 3 Oct 19
Posts: 153
Credit: 32,412,973
RAC: 0
Message 1462 - Posted: 3 Sep 2021, 16:19:40 UTC - in response to Message 1458.  

All the tasks have "invalid" as output, and all of them last no more than 1s of CPU time.

Now I am seeing the same thing. I just added a new machine, a Ryzen 3700X on Ubuntu 20.04.3 that I just updated to all the latest stuff.
https://quchempedia.univ-angers.fr/athome/results.php?hostid=8719&offset=0&show_names=0&state=5&appid=

And the last seven refuse to run at all. I will reboot, but try another project until this one is fixed.
It may be something to do with the libraries, but that is beyond me.
ID: 1462 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
YAG

Send message
Joined: 19 Jun 20
Posts: 10
Credit: 2,792,400
RAC: 0
Message 1463 - Posted: 3 Sep 2021, 23:00:18 UTC - in response to Message 1462.  
Last modified: 3 Sep 2021, 23:01:00 UTC

Jim1348, your new machine has the same Ubuntu version but a newer Linux kernel: 5.11.0-27. It could be the kernel version, as Tullio said...

And, very important think, the issue is affecting to several Linux distributions, not only Debian.
ID: 1463 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 3 Oct 19
Posts: 153
Credit: 32,412,973
RAC: 0
Message 1464 - Posted: 4 Sep 2021, 10:14:59 UTC - in response to Message 1463.  

It could be the kernel version, as Tullio said...
Yes, certainly. Then it would be between 5.4 and 5.10. Maybe someone can narrow it further. That would help the project find it.
ID: 1464 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Tullio

Send message
Joined: 5 Sep 20
Posts: 103
Credit: 2,142,600
RAC: 0
Message 1465 - Posted: 4 Sep 2021, 17:38:53 UTC - in response to Message 1464.  

As as a notice, I am running SuSE Tumbleweed with a 5.13.13 kernel on Einstein@home and the tasks all complete. In QuChem I am using Windows 10 since I saw it is faster using VirtualBox than most Linux wingmen.
Tullio
ID: 1465 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 3 Oct 19
Posts: 153
Credit: 32,412,973
RAC: 0
Message 1466 - Posted: 4 Sep 2021, 21:52:40 UTC - in response to Message 1464.  
Last modified: 4 Sep 2021, 21:55:41 UTC

But it isn't just a question of the OS version.
I completed a work unit normally on an i9-10900F running Ubuntu 20.04.3 with the 5.11.0 kernel.
https://quchempedia.univ-angers.fr/athome/result.php?resultid=7650634

It ran the full time, and was inconclusive, but all the others who tried it thus far have produced validate errors with short run times. (They were all Intel CPU's too.)
So the CPU type may be another factor that is also important for success now, or maybe something else.
ID: 1466 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
YAG

Send message
Joined: 19 Jun 20
Posts: 10
Credit: 2,792,400
RAC: 0
Message 1467 - Posted: 4 Sep 2021, 22:28:27 UTC - in response to Message 1466.  

It would be nice to know if your computer with the i9-10900F CPU is able to end a task successfully. I saw in the LHC forum the following message:
(...) After upgrading this machine to Linux kernel 5.4.0-58 (from 5.4.0-52) I started getting failures, so I had to abort them.
(...)
There is something strange about that upgrade. It is causing problems for me on QuChemPedIA also on two machines, a Ryzen 3900X and the i7-9700.
(Both are on Ubuntu 18.04.5).
ID: 1467 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 3 Oct 19
Posts: 153
Credit: 32,412,973
RAC: 0
Message 1468 - Posted: 4 Sep 2021, 22:54:33 UTC - in response to Message 1467.  

There is something strange about that upgrade. It is causing problems for me on QuChemPedIA also on two machines, a Ryzen 3900X and the i7-9700.

I have run several other projects on it without problem. At present, it is on TN-Grid, where it has been running the same on the new kernel as before.

As for LHC, that was my post. It turned out not to be the Linux kernel, but the BOINC version.
I posted on it here.
https://quchempedia.univ-angers.fr/athome/forum_thread.php?id=114&postid=1389#1389

That problem was due to using BOINC from the Locutus of Borg repository, and isn't the problem here, since I no longer use that one.
But there could be some other incompatibility of BOINC with something new in the libraries. That is possible.
ID: 1468 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Tullio

Send message
Joined: 5 Sep 20
Posts: 103
Credit: 2,142,600
RAC: 0
Message 1469 - Posted: 5 Sep 2021, 13:04:12 UTC

On my SuSe Tumbleweed Linux I have a 7.18.0 BOINC with the warning that it is a development varsion and it may not work.But it works.
Tullio
ID: 1469 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 3 Oct 19
Posts: 153
Credit: 32,412,973
RAC: 0
Message 1470 - Posted: 5 Sep 2021, 17:58:48 UTC
Last modified: 5 Sep 2021, 18:40:46 UTC

On my Ryzen 3700X, I tried upgrading BOINC from 7.6.11 (from Ubuntu Software) to 7.6.17 (from Locutus-of-Borg).
https://launchpad.net/~costamagnagianfranco/+archive/ubuntu/boinc

Surprisingly, it worked. I am now running normally again.
https://quchempedia.univ-angers.fr/athome/results.php?hostid=8719&offset=0&show_names=0&state=4&appid=

YMMV.
ID: 1470 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
damotbe
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Help desk expert

Send message
Joined: 23 Jul 19
Posts: 289
Credit: 464,119,561
RAC: 0
Message 1471 - Posted: 7 Sep 2021, 8:05:55 UTC

I'm totally busy and I don't have intern or engineer to work on this. The guy who compile nwchem for the project has gone and I'm not able to update the executable. The straightforward workaround is to run an old linux (in a VM) to compute for the project or it works if you have chance....

:( :( :(
ID: 1471 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 3 Oct 19
Posts: 153
Credit: 32,412,973
RAC: 0
Message 1472 - Posted: 7 Sep 2021, 16:21:08 UTC - in response to Message 1471.  
Last modified: 7 Sep 2021, 16:26:16 UTC

It worked for me with Ubuntu 20.04.3 and BOINC 7.16.11 on one machine.
On another machine it did not work on that OS until I upgraded BOINC to 7.16.17.

Some combinations work on some machines and others don't.
ID: 1472 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Tullio

Send message
Joined: 5 Sep 20
Posts: 103
Credit: 2,142,600
RAC: 0
Message 1473 - Posted: 7 Sep 2021, 16:42:49 UTC

On my Windows 10 CPU it uses a Linux Virtual machine starting "other Linux - 64 bit". I don't knows which Linux is this.
Tullio.
ID: 1473 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
damotbe
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Help desk expert

Send message
Joined: 23 Jul 19
Posts: 289
Credit: 464,119,561
RAC: 0
Message 1474 - Posted: 8 Sep 2021, 7:07:56 UTC - in response to Message 1473.  

you can know the Linux version with the command
uname -a

and the OS with
cat /etc/os-release 
ID: 1474 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
YAG

Send message
Joined: 19 Jun 20
Posts: 10
Credit: 2,792,400
RAC: 0
Message 1475 - Posted: 12 Sep 2021, 8:18:06 UTC - in response to Message 1474.  

I finally created a Virtual Machine with VirtualBox, with Debian 9 (Linux
4.9.0-16-amd64, Boinc 7.6.33), and I could execute successfully a test task (200 credit). I am sure Debian 10 will work too, as I was using Debian 10. Virtualization is a valid workaround.

When using virtualization, I prefer to run projects that were created with it in mind from the start, as Cosmology. I will still keeping an eye on further developments of this QuChemPedIA, because it is a good idea. I love to support basic research, and quantic is an uncharted territory. I hope the project will find the resources for keeping it alive.

Thank you all for your contributions to this thread, and thanks to the QuChemPedIA team for this amazing project. I wish you the best.
ID: 1475 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
niarbeht

Send message
Joined: 2 Jan 22
Posts: 10
Credit: 665,400
RAC: 0
Message 1612 - Posted: 3 Jan 2022, 7:33:24 UTC

So, the whole "It's because Debian has a newer kernel" thing doesn't quite pan out, I think, because my Arch Linux boxes run work units just fine, but my Debian (Proxmox) box doesn't.

I dunno. I don't know anything about the build environment.

Doesn't work:
uname -a
Linux serverofpie 5.13.19-1-pve #1 SMP PVE 5.13.19-3 (Tue, 23 Nov 2021 13:31:19 +0100) x86_64 GNU/Linux


Works:
uname -a
Linux HolyPie 5.15.10-arch1-1 #1 SMP PREEMPT Fri, 17 Dec 2021 11:17:37 +0000 x86_64 GNU/Linux


Both have python2 and python3 installed.
ID: 1612 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
niarbeht

Send message
Joined: 2 Jan 22
Posts: 10
Credit: 665,400
RAC: 0
Message 1613 - Posted: 3 Jan 2022, 7:50:04 UTC - in response to Message 1612.  

Never mind, both don't work, they're just failing in different ways.
ID: 1613 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Questions and Answers : Unix/Linux : Workunits failure after upgrading Debian to 11 (bullseye)

©2024 Benoit DA MOTA - LERIA, University of Angers, France