Credits for t8=t4=t2=5.000 seriously?

Message boards : Number crunching : Credits for t8=t4=t2=5.000 seriously?
Message board moderation

To post messages, you must log in.

AuthorMessage
Diplomat
Avatar

Send message
Joined: 7 Feb 20
Posts: 10
Credit: 6,625,400
RAC: 0
Message 804 - Posted: 24 Apr 2020, 12:40:39 UTC
Last modified: 24 Apr 2020, 12:43:07 UTC

Recently returned back to project, saw new NWChem Long tasks.

I dedicated 20 threads and after jumping back and forth server started to give me primarily t8 tasks.

It is really surprising that bigger tasks give you less reward :)


ID: 804 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Luigi R.

Send message
Joined: 7 Nov 19
Posts: 31
Credit: 4,245,903
RAC: 0
Message 805 - Posted: 24 Apr 2020, 13:00:41 UTC - in response to Message 804.  
Last modified: 24 Apr 2020, 13:03:37 UTC

It does not look your cpu used 8 threads for red circled task because runtime=cputime.
Runtime should be way lower than cputime, as you see on t2 tasks.
A reason could be that your cpu is running too many processes/tasks so that every one claims 8 threads, but it uses only 1.
ID: 805 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Diplomat
Avatar

Send message
Joined: 7 Feb 20
Posts: 10
Credit: 6,625,400
RAC: 0
Message 806 - Posted: 24 Apr 2020, 14:29:03 UTC
Last modified: 24 Apr 2020, 14:39:33 UTC

I see what you mean, but It doesn't feel like tasks sitting and not using CPUs, 8+8+4 exactly what I expect



some t8 on validation and CPU vs Run time doesn't look right as well =(
ID: 806 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Zalster

Send message
Joined: 16 Dec 19
Posts: 25
Credit: 11,938,843
RAC: 0
Message 807 - Posted: 24 Apr 2020, 18:08:20 UTC - in response to Message 806.  

Credit for long task are all fixed at the same value.

You maybe overcommitting your CPU is you are using all the threads. You need to leave some overhead just in case. I set mine at 90% of all CPU.

Homepage ---->computer preferences--->computing--->usage limits.

Hopefully that will give better results.

Z
ID: 807 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProDigit

Send message
Joined: 16 Nov 19
Posts: 44
Credit: 21,290,949
RAC: 0
Message 813 - Posted: 26 Apr 2020, 2:28:18 UTC

If you set your CPU to 99%, and can see that one or more threads are running at below 50-75%, setting CPU to 100% is NOT over-committing the CPU!
If on the other hand, some GPU projects report incorrect CPU usage (eg: they report 0.29CPU per GPU, and you have 3 GPUs, but in reality they're using 0.997CPU, then you're over committing the CPU. Setting CPU to 99% will still show 100% CPU utilization in this scenario.

In most scenarios, an over committed CPU will not result in a task taking 3x longer to finish, since threads are always shuffled around.
ID: 813 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Diplomat
Avatar

Send message
Joined: 7 Feb 20
Posts: 10
Credit: 6,625,400
RAC: 0
Message 819 - Posted: 27 Apr 2020, 16:07:54 UTC
Last modified: 27 Apr 2020, 16:09:01 UTC

Indeed I don't see a point in "overcommitting" I was running 100% CPU on different PCs for 5+ years and never saw any problems, maybe because of Linux :)

Sometimes I have to turn off GPU when need to use virtual machines or free up to 2 threads to watch youtube at res higher than 720p30
ID: 819 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Luigi R.

Send message
Joined: 7 Nov 19
Posts: 31
Credit: 4,245,903
RAC: 0
Message 820 - Posted: 27 Apr 2020, 16:45:57 UTC - in response to Message 819.  

Have you tried to monitor processes list? Are there 20 (8+8+4) nwchem running at 100%?
ID: 820 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Diplomat
Avatar

Send message
Joined: 7 Feb 20
Posts: 10
Credit: 6,625,400
RAC: 0
Message 821 - Posted: 27 Apr 2020, 17:28:46 UTC - in response to Message 820.  

Have you tried to monitor processes list? Are there 20 (8+8+4) nwchem running at 100%?


it's not the best example now, only t4 and t2 are running

ID: 821 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProDigit

Send message
Joined: 16 Nov 19
Posts: 44
Credit: 21,290,949
RAC: 0
Message 831 - Posted: 4 May 2020, 13:18:09 UTC
Last modified: 4 May 2020, 13:20:31 UTC

Your CPu isn't overcommitted.
Why do you use 2 CPUs on a WU?
Is this a setting you have, or are you running beta projects?
I only get single core WUs (1 CPU per WU).
If you're not running the CPU at 100% yet, I'd increase the setting so 1 more core would be used.
It appears Goofy, and Prime isn't using a lot of CPU resources.
ID: 831 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Diplomat
Avatar

Send message
Joined: 7 Feb 20
Posts: 10
Credit: 6,625,400
RAC: 0
Message 832 - Posted: 5 May 2020, 5:22:38 UTC - in response to Message 831.  

Hi, ProDigit, there are no extra setting involved, I chose to run NWChem long and server sends me t1-t8 according ti its own judgment.
I'm happy with current CPU workload.
ID: 832 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Luigi R.

Send message
Joined: 7 Nov 19
Posts: 31
Credit: 4,245,903
RAC: 0
Message 833 - Posted: 6 May 2020, 10:00:57 UTC - in response to Message 832.  

Hi Diplomat, I don't understand what's (or if there is something) wrong with your host.

So try to edit your project preferences to get t1 tasks only, as ProDigit suggested.



If your RAC improves, you will know your problem is solved or at least you will have bypassed that. ;)
ID: 833 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProDigit

Send message
Joined: 16 Nov 19
Posts: 44
Credit: 21,290,949
RAC: 0
Message 847 - Posted: 21 May 2020, 3:54:49 UTC

Again, to keep the conversation in line with the original OP's thread,
The LONG WUs should get higher credit.
Especially now that I see more and more units hitting 3 and even 4 or 5 days!
in that timespan I could finish a whole lot of small WUs, and get a much more consistent score.
I think QChemp needs to adjust the PPD scoring on these units.
And perhaps should start thinking in pairing them in cpu core pairs instead (find a way to do parallel computing on those WUs.
They take way too long!

a 3,5 days task can easily be given to systems having 8CPU threads. It should be done in about 10 hours that way, which is way more reasonable!
So find a way to create those WUs using parallel computing of 4 threads or more.
I think 8 threads is easy to get, as most PCs nowadays have 8 threads.
And leave the short WUs for PCs with less than 8 threads, doing WUs on a single thread like is now.

I'm sure you could learn something from other projects, how they do it.
Lots of projects serve 4, 6, 8, 10, 12, and even 16 core WUs.
And they might do 24 soon, as it'll be more common, to find such amount of cores on a PC in the near future!
ID: 847 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Conan

Send message
Joined: 27 Apr 20
Posts: 11
Credit: 714,200
RAC: 0
Message 848 - Posted: 21 May 2020, 6:56:01 UTC
Last modified: 21 May 2020, 6:57:33 UTC

The problem is that no one is sure how long the WU will run, be it for 1, 2 or 3 days or more. So that is why there is a 5,000 fixed credit bonus for these work units.

However if a work unit runs for 3 days using 4 cores, they are tied up for the whole 3 days, that's 24 hours x 3 = 72 and then x4 cores = 288 hours that those cores are in use.

5,000 credits is only 17.36 credits an hour, a bit low for the time spent working on the WU. If it ran for 2 days this jumps to 26 cr/h a bit better.

On average the 5,000 is OK, as most work units don't run for days on end, for the ones that do it is on the low side.

Conan
ID: 848 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Conan

Send message
Joined: 27 Apr 20
Posts: 11
Credit: 714,200
RAC: 0
Message 849 - Posted: 24 May 2020, 2:16:34 UTC
Last modified: 24 May 2020, 2:19:02 UTC

Well 5,000 points would of been great even though low for my long running work unit, but alas it was not meant to be.

After 3 days 14 hours, tying up 4 cores during this time, the WU decided to error out with EXIT_CHILD_FAILED, whatever that means.

Maybe when the site went down and the trickle up messages could not be sent?

Very disappointed This WU

Conan
ID: 849 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Gibson Praise
Avatar

Send message
Joined: 27 Nov 19
Posts: 1
Credit: 1,755,655
RAC: 0
Message 850 - Posted: 25 May 2020, 15:18:17 UTC
Last modified: 25 May 2020, 15:57:56 UTC

I am sure this post will jinx me ..

I have only run max cpus = 2. With the short tasks I had trouble with more than two threads and I never bother to change when the longs came available. The few I've run have run long (up to a day or little longer and down to half that) but I haven't experienced anything such as reported here.

g

Edit: Thinking back, when I moved to maxcpu = 2 is also when I ditched running this on windows. ymmv.
ID: 850 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Credits for t8=t4=t2=5.000 seriously?

©2024 Benoit DA MOTA - LERIA, University of Angers, France