Posts by crashtech

1) Message boards : Number crunching : Out of Work (Message 1792)
Posted 2 Sep 2022 by crashtech
Post:
I took the above news as notice that the project was coming to an end:

https://quchempedia.univ-angers.fr/athome/forum_thread.php?id=183&postid=1783
2) Message boards : Number crunching : Suspicious near-instant results with NWChem long t4 (Message 919)
Posted 2 Jul 2020 by crashtech
Post:
@xii5ku , I'm out of ideas on this one. Thanks for your help, though.
3) Message boards : Number crunching : Suspicious near-instant results with NWChem long t4 (Message 910)
Posted 26 Jun 2020 by crashtech
Post:
@crashtech:
It looks like you have three "good" hosts with Mint 19.3 and boinc version 7.9.3,
and two "bad" hosts with Mint 19.3 and boinc version 7.17.0.
Right?

(On the other hand, when I look at wingmen of my own results, there are circa two hosts which are persistently spamming the project recently with bogus few-seconds results, and these two hosts have Mint 19.3 and boinc version 7.9.3. Their owner is anonymous, hence we have no way to wake up the pilot.)

I'm pretty sure those are two client instances on the same host.
4) Message boards : Number crunching : Suspicious near-instant results with NWChem long t4 (Message 908)
Posted 23 Jun 2020 by crashtech
Post:
@crashtech, in addition to ProtectSystem=full, you could try: PrivateTmp=false

Done, still nothing! One of the other things I tried was comparing boinc-client.service on a working host with the one on the non-working host, and commenting out all of the extra lines that are found in the non-working one. That also did not work. The temptation for me is to move my BOINC data directories to temporary storage, then "nuke and pave" the installation and start fresh. I realize that is more something out of the Windows noob playbook and is possibly offensive to a Linux pro.
5) Message boards : Number crunching : Suspicious near-instant results with NWChem long t4 (Message 904)
Posted 22 Jun 2020 by crashtech
Post:
@crashtech, "df" reports "file system disk space usage", i.e. the used space and available space in the filesystem in which the optionally given file or directory resides. My main intention was to verify how much free space is left in your /tmp. We now know that there is plenty of space left in it. (There are 180 GBytes available in /tmp.)

As for the boinc-client.service unit file: Compared with the boinc-client.service file on my computers, yours has several extra lines. The following four, explained in "man systemd.exec", stick out to me:

ProtectHome=true
    Most likely harmless to the NWChem (...long) application.


PrivateTmp=true

    In theory this should be OK for NWChem long.


ProtectSystem=strict

    This is probably the culprit! As I understand the documentation, this will make /tmp read-only.
    Either relax this from strict to full, or append
      -/tmp

    to the ReadWritePaths line.
    Then restart the boinc-client service. Or maybe you even need to reboot, I don't know.
    Then fetch one QuChemPedIA task and see if it runs normally.


ProtectControlGroups=true

    In theory this should be OK.


(Documentation of the systemd service file format is spread over "man systemd.unit", "man systemd.service", and "man systemd.exec".)


Thank you xii5ku! First I appended -/tmp to the ReadWritePaths line and rebooted, but QuChemPedIA would not run. Then I changed "strict" to "full" and rebooted, but it still won't run! It's a real puzzle.
6) Message boards : Number crunching : Suspicious near-instant results with NWChem long t4 (Message 902)
Posted 21 Jun 2020 by crashtech
Post:

crashtech wrote:
Has there been a resolution to this issue? One of my computers only runs WUs for a few seconds, then marks them as complete

https://quchempedia.univ-angers.fr/athome/results.php?hostid=1227

@crashtech, maybe this host has a full /tmp (like Alien Seeker suspected with the own host). Check with "df -h /tmp" for example.

Taking your suggestions one at a time, it looks as if "df -h /tmp" is not doing what is intended to do in this case, which is to give the size of /tmp. What the command does do, after further experimentation, is give the total usage of /dev/sda5, at least when exucuted on this particular host. It does this no matter which directory is input as a target:

ga7pxsl@GAX570UD_test:~$ df -h /tmp
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda5       228G   37G  180G  17% /
ga7pxsl@GAX570UD_test:~$ df -h /home/ga7pxsl
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda5       228G   37G  180G  17% /
ga7pxsl@GAX570UD_test:~$ df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda5       228G   37G  180G  17% /

It does do something different if no target directory is given, which might provide a clue to someone who knows something:

ga7pxsl@GAX570UD_test:~$ df -h
Filesystem      Size  Used Avail Use% Mounted on
udev             16G     0   16G   0% /dev
tmpfs           3.2G  2.0M  3.2G   1% /run
/dev/sda5       228G   37G  180G  17% /
tmpfs            16G  208K   16G   1% /dev/shm
tmpfs           5.0M  4.0K  5.0M   1% /run/lock
tmpfs            16G     0   16G   0% /sys/fs/cgroup
/dev/sda1       511M  6.1M  505M   2% /boot/efi
tmpfs           3.2G   32K  3.2G   1% /run/user/1000

But, looking at /tmp in the graphical file manager (the thing I sort of know how to use) as root, the Properties tab tells me there is less than 100KB in /tmp.

Or the boinc-client service on this host is set up in a way which does not permit it to create files outside of its data directory, or at least not in /tmp. What does /lib/systemd/system/boinc-client.service contain on this host?


[Unit]
Description=Berkeley Open Infrastructure Network Computing Client
Documentation=man:boinc(1)
After=network-online.target

[Service]
Type=simple
ProtectHome=true
PrivateTmp=true
ProtectSystem=strict
ProtectControlGroups=true
ReadWritePaths=-/var/lib/boinc -/etc/boinc-client
Nice=10
User=boinc
WorkingDirectory=/var/lib/boinc
ExecStart=/usr/bin/boinc
ExecStop=/usr/bin/boinccmd --quit
ExecReload=/usr/bin/boinccmd --read_cc_config
ExecStopPost=/bin/rm -f lockfile
IOSchedulingClass=idle
# The following options prevent setuid root as they imply NoNewPrivileges=true
# Since Atlas requires setuid root, they break Atlas
# In order to improve security, if you're not using Atlas,
# Add these options to the [Service] section of an override file using
# sudo systemctl edit boinc-client.service
#NoNewPrivileges=true
#ProtectKernelModules=true
#ProtectKernelTunables=true
#RestrictRealtime=true
#RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX
#RestrictNamespaces=true
#PrivateUsers=true
#CapabilityBoundingSet=
#MemoryDenyWriteExecute=true

[Install]
WantedBy=multi-user.target
7) Message boards : Number crunching : Suspicious near-instant results with NWChem long t4 (Message 890)
Posted 15 Jun 2020 by crashtech
Post:
Possibly there is a problem with the BOINC installation itself.
It would probably be easier just to upgrade to the latest version, which you can do with this PPA:

sudo add-apt-repository ppa:costamagnagianfranco/boinc
sudo apt-get update

https://launchpad.net/~costamagnagianfranco/+archive/ubuntu/boinc


Thanks, I have done so, and verified it in the Event Log:
Mon 15 Jun 2020 08:42:37 AM MDT |  | Starting BOINC client version 7.17.0 for x86_64-pc-linux-gnu


Alas, the tasks still error out immediately. There don't seem to be any clues in the stderr output of the failed tasks, either. I wonder if there aren't some installed libraries that this project relies on that I might check and/or re-install.
8) Message boards : Number crunching : Suspicious near-instant results with NWChem long t4 (Message 888)
Posted 14 Jun 2020 by crashtech
Post:
Very strange. I don't see anything wrong with your machines.
Maybe memory? Overclocking? It must be something different about that one.

Sometimes files get corrupted though. I would detach from the project, and then re-attach.

Thanks, I have done so more than once, checking the second time to be sure that the project directory was actually removed. There seems to be something about that particular host's configuration that causes QuChemPedIA to fail.
9) Message boards : Number crunching : Suspicious near-instant results with NWChem long t4 (Message 886)
Posted 14 Jun 2020 by crashtech
Post:
I run all of my work units as t1, by setting "Max # CPUs 1" in the preferences.

I have never seen the problem of the short runs that I can remember.
https://quchempedia.univ-angers.fr/athome/results.php?hostid=2356


Hi, based on your post, I set up a location in the preferences page here to only allow one CPU, but all the WUs still end prematurely. For now I can't run the project on it, but would like to figure out why the work is failing. It runs other projects without issues, so I don't think it's hardware related.
10) Message boards : Number crunching : Suspicious near-instant results with NWChem long t4 (Message 884)
Posted 13 Jun 2020 by crashtech
Post:
Has there been a resolution to this issue? One of my computers only runs WUs for a few seconds, then marks them as complete

https://quchempedia.univ-angers.fr/athome/results.php?hostid=1227
11) Message boards : Number crunching : Application 0.15 (t1) (beta test) Result Not Completing Successfully (Message 523)
Posted 7 Feb 2020 by crashtech
Post:
Hi, how do I tell between good and bad units? On a few machines that were experiencing low CPU utilization or unusually long run times, I aborted them all, but I'd rather not make a habit of that.




©2024 Benoit DA MOTA - LERIA, University of Angers, France