Any alternative to the current taskset clobbering?

Author	Message
niarbeht Send message Joined: 2 Jan 22 Posts: 10 Credit: 665,400 RAC: 0	Message 1617 - Posted: 6 Jan 2022, 18:28:07 UTC I've got my boinc systemd unit file set up so that only certain processor cores will be used. This results in F@H GPU units not being bound by waiting for the CPU for whatever reason that F@H OpenCL units need the CPU. Anyway, so I noticed that when I'm running this projects units, they run on every core and thread on my system. Which is, as noted in the first paragraph, not desired behavior, as it results in my F@H GPU units choking. I looked into things, and it appears that you guys are clobbering my processor limitations by using taskset to let your units run on every CPU. I poked and prodded things a bunch, and figured out that this is a workaround for a limitation of working with MPI. I can understand and appreciate this, but... Well, it'd be nice if there were a good way for me to control what taskset is being fed. I don't know how you could set it up, but it sure would be nice if your run.sh files checked for a variable and, if present, tried to feed that to taskset instead. I dunno. As it is, I'm gonna let my current units for this project finish and not get any new units on my desktop for now, I guess. Luckily, I can still let your units run on my server, since it doesn't run F@H GPU units at all. ID: 1617 · Rating: 0 · rate: / Reply Quote

Jim1348 Send message Joined: 3 Oct 19 Posts: 153 Credit: 32,412,973 RAC: 0	Message 1618 - Posted: 6 Jan 2022, 19:21:15 UTC - in response to Message 1617. Your computers are hidden so I am not sure that you will get much advice. And you are doing things the hard way. To reserve a core for the GPU (I do Folding on all my GPUs), just set the BOINC Preference to use less than 100% of the CPUs. For example, on a 12 core Ryzen 3600, I set it to use 95%, which leaves one virtual core free. To run only one task per core, just set QuChemPedIA preferences "Max # CPUs 1". ID: 1618 · Rating: 0 · rate: / Reply Quote

niarbeht Send message Joined: 2 Jan 22 Posts: 10 Credit: 665,400 RAC: 0	Message 1619 - Posted: 7 Jan 2022, 10:41:26 UTC - in response to Message 1618. To reserve a core for the GPU (I do Folding on all my GPUs), just set the BOINC Preference to use less than 100% of the CPUs. For example, on a 12 core Ryzen 3600, I set it to use 95%, which leaves one virtual core free. To run only one task per core, just set QuChemPedIA preferences "Max # CPUs 1". Wow. That is not how any of that works. At all. Setting the usage limits in the BOINC client doesn't actually restrict how many cores get used, it restricts the number of threads that work units can use. As such, if you set that number to 75% on a 12-core, 24-thread system, you wind up with a total number of work units running that equals eighteen threads. If the operating system scheduler doesn't keep those threads contained to specific cores, it results in latency issues for the F@H work units, as context switching takes time. As for "To run only one task per core", that's not the issue. The issue is the project's run script using taskset to work around how mpirun handles unit execution.[/quote] ID: 1619 · Rating: 0 · rate: / Reply Quote

damotbe Volunteer moderator Project administrator Project developer Project tester Project scientist Help desk expert Send message Joined: 23 Jul 19 Posts: 289 Credit: 464,119,561 RAC: 0	Message 1630 - Posted: 13 Jan 2022, 10:14:48 UTC - in response to Message 1619. When we had a developer, it was the only solution that works in conjunction with mpirun and boinc wrapper. That is not perfect, of course ID: 1630 · Rating: 0 · rate: / Reply Quote

niarbeht Send message Joined: 2 Jan 22 Posts: 10 Credit: 665,400 RAC: 0	Message 1631 - Posted: 13 Jan 2022, 16:23:15 UTC - in response to Message 1630. I just looked at things again today. It might be possible to use taskset to retrieve the mask of the currently-running process of the run script, which should be the same as whatever the user decided to limit BOINC to in their systemd service unit, and then apply that mask in place of the current affinity fix. I might take a poke at it later and see if I can make an example script. As a sidenote, the difference between letting units run on any core, but only using 75% of processor threads, and letting units only run on a specific subset of processor threads, is the difference between 1.3 million PPD in F@H and 1.5 million PPD in F@H. This is with AMD OpenCL units. If I remember right, the processor-time requirements are more pronounced for Nvidia units. I'll see if I can't figure out that taskset stuff. ID: 1631 · Rating: 0 · rate: / Reply Quote

niarbeht Send message Joined: 2 Jan 22 Posts: 10 Credit: 665,400 RAC: 0	Message 1632 - Posted: 13 Jan 2022, 16:45:45 UTC - in response to Message 1630. Last modified: 13 Jan 2022, 16:55:15 UTC A lead on where this might go: #!/bin/bash INHERITEDMASK="0x$(taskset -p $$ \| awk '{print $NF}')" echo $INHERITEDMASK This should display the mask of the currently-running process. I'll see about shoving this into the run.sh later. EDIT: I don't know how widespread the "awk" command is, but I suspect it should be on effectively every UNIX system that can run BOINC, so it's probably safe to use. Probably. I mean, I'm pretty sure awk is older than I am. ID: 1632 · Rating: 0 · rate: / Reply Quote

niarbeht Send message Joined: 2 Jan 22 Posts: 10 Credit: 665,400 RAC: 0	Message 1633 - Posted: 13 Jan 2022, 17:04:14 UTC Last modified: 13 Jan 2022, 17:05:17 UTC Here's what a new run.sh might look like: #!/usr/bin/env bash set -m _affinity() { sleep 2 for pid in $(pgrep nwchem); do taskset -a -p $INHERITEDMASK $pid >/dev/null ; done sleep 30 for pid in $(pgrep nwchem); do taskset -a -p $INHERITEDMASK $pid >/dev/null; done } _step_check_run() { step_name=$1 step_nw=$2 step_out=$3 dep_out=$4 if [[ $dep_out == None ]] \|\| ( [[ -e $dep_out ]] && grep -q "Total times cpu" $dep_out ); then echo -n "STEP $step_name : " if [[ -e $step_out ]] && grep -q "Total times cpu" $step_out; then echo "already done" else if [[ -e $step_out ]]; then echo "Continue from last checkpoint" else echo "Starting" fi _affinity & mpirun --allow-run-as-root -np $NP nwchem $step_nw >& $step_out fi fi } echo "STEP OPT : Starting" INHERITEDMASK="0x$(taskset -p $$ \| awk '{print $NF}')" echo "Inherited mask: $INHERITEDMASK" if [[ ! -e OPT.out ]]; then _affinity & mpirun --allow-run-as-root -np $NP nwchem OPT.nw >& OPT.out else _step_check_run "OPT" OPT-restart.nw OPT.out None fi _step_check_run "SINGLE POINT" SP-restart.nw SP-restart.out OPT.out _step_check_run "FREQ" FREQ.nw FREQ.out SP-restart.out _step_check_run "TD (SINGLET)" TD_singlet.nw TD_singlet.out SP-restart.out # fin OPT # fin Solver echo "Create output archive" ALL=$(openssl enc -aes-256-ctr -k secret -P -md sha256 2>/dev/null) KEY=$(echo $ALL \| grep "key=[0-9A-F]" -o \| cut -d '=' -f 2) SALT=$(echo $ALL \| grep "salt=[0-9A-F]" -o \| cut -d '=' -f 2) IV=$(echo $ALL \| grep "iv =[0-9A-F]" -o \| cut -d '=' -f 2) tar zcvf output.tar.gz .out openssl enc -aes-256-ctr -K "$KEY" -S "$SALT" -iv "$IV" -e -in output.tar.gz -out output.inc rm -rf *.out echo "$ALL" \| openssl rsautl -encrypt -pubin -inkey pub.pem -out key.inc DO NOTE that I haven't tested this yet, as it turns out that injecting changes to the run.sh is... really hard, actually. Whoever your software developer was did a really good job ensuring the environment was not something an outside user could easily modify. ID: 1633 · Rating: 0 · rate: / Reply Quote

Jim1348 Send message Joined: 3 Oct 19 Posts: 153 Credit: 32,412,973 RAC: 0	Message 1636 - Posted: 13 Jan 2022, 20:23:03 UTC - in response to Message 1619. Wow. That is not how any of that works. At all. Setting the usage limits in the BOINC client doesn't actually restrict how many cores get used, it restricts the number of threads that work units can use. As such, if you set that number to 75% on a 12-core, 24-thread system, you wind up with a total number of work units running that equals eighteen threads. If the operating system scheduler doesn't keep those threads contained to specific cores, it results in latency issues for the F@H work units, as context switching takes time. As for "To run only one task per core", that's not the issue. The issue is the project's run script using taskset to work around how mpirun handles unit execution. Yes, it is how it works. BOINC reports virtual cores, since that is how the operating system sees them. You need to brush up on virtualization. Your idea the the OS scheduler produces the latency is imaginative though. ID: 1636 · Rating: 0 · rate: / Reply Quote

niarbeht Send message Joined: 2 Jan 22 Posts: 10 Credit: 665,400 RAC: 0	Message 1637 - Posted: 13 Jan 2022, 22:47:46 UTC - in response to Message 1636. https://imgur.com/a/LYC3k7n Maybe step out of the thread, Jim. ID: 1637 · Rating: 0 · rate: / Reply Quote

Jim1348 Send message Joined: 3 Oct 19 Posts: 153 Credit: 32,412,973 RAC: 0	Message 1639 - Posted: 15 Jan 2022, 20:40:55 UTC - in response to Message 1637. Last modified: 15 Jan 2022, 20:41:16 UTC I think you are having a problem with definitions. It is a common condition from what I can see. BOINC reports as virtual cores (as does the OS) what the software people refer to as "threads". I think you need to step out of your narrow thinking. ID: 1639 · Rating: 0 · rate: / Reply Quote