Posts by ProDigit

21) Message boards : Number crunching : 2,5 days long and counting... (Message 751)
Posted 12 Apr 2020 by ProDigit
Post:
Without any official representation here from QuChemPedIA, and my question unanswered,
and without there being any higher PPD allocated for projects running longer than the 'long' ones,

I will abort any WU I see running past 1 day, unless someone official can assure me there's nothing wrong with these long WUs where the percentage counter doesn't seem to work..
We shoot ourselves both in the foot this way, but most 'long' WUs on a 3,5Ghz CPU run no longer than 17 hours.
Even on a 3Ghz CPU you'd be finishing a WU in 20 hours.

So cancelling it is!
22) Message boards : Number crunching : 2,5 days long and counting... (Message 750)
Posted 11 Apr 2020 by ProDigit
Post:
Thread count shouldn't matter, as each WU runs in it's own thread,
But to answer the question, 24.
My other PC with 32 threads is currently also having 2 WUs of 1+ days.
23) Message boards : Number crunching : 2,5 days long and counting... (Message 746)
Posted 10 Apr 2020 by ProDigit
Post:
Current WU has been running for 2 days and 15 hours and counting.
It's at 99.070%, and the counter goes very, very slow.
Cancel, or is this ok for a long WU on a 3,5-4Ghz CPU?

Application
NWChem long 0.19 (t1)
Name
BTXv2_athome_b3lyp-321gd_long,batch02,000001835,nwchem_long,1586073408
State
Running
Received
Tue 07 Apr 2020 01:41:44 PM EDT
Report deadline
Mon 06 Jul 2020 01:41:43 PM EDT
Estimated computation size
500,000 GFLOPs
CPU time
2d 14:00:57
CPU time since checkpoint
00:00:03
Elapsed time
2d 15:13:59
Estimated time remaining
00:35:37
Fraction done
99.070%
Virtual memory size
1.28 GB
Working set size
133.29 MB
Directory
slots/17
Process ID
46850
Progress rate
1.440% per hour
Executable
wrapper_26014_x86_64-pc-linux-gnu
24) Message boards : Number crunching : Supported CPU and OS types? (Message 743)
Posted 9 Apr 2020 by ProDigit
Post:
I just upgraded the 3900x to a 3950x.
Expensive, but worth it!
In the process, I found that the bios needs to have xmp enabled, for the memory to overclock properly to the rated setting. (Yes you read that).
When I was still fiddling with PCs, CPUs, ram, soundcards, a memory's rated speed would be printed on the ram.
And if your system didn't run that speed, it meant either the Mobo or CPU didn't support the speed.

Nowadays, they all run at 2133Mhz. All of them. If they printed 3200 on them, it means the system needs to overclock it to 3200Mhz.
Weird.

Anyway, this was my reason for glitches. Both 3900 and 3950 now work with 3200Mhz ram speed.


XMp is enabled and I have 256 GBs of RAM

Does your Mobo and cpu support that much RAM?
Some are limited to 128GB.
25) Message boards : Number crunching : Results inconclusive? (Message 742)
Posted 9 Apr 2020 by ProDigit
Post:
What does it mean, 'results inconclusive'?
I have several tasks that are categorized as such.

If you click on the wu you'll see it is "unsent" to the minimum number of hosts for validation/replication.

Shouldn't it read 'waiting for validation' then?
26) Message boards : Number crunching : credits for WUs? (Message 741)
Posted 9 Apr 2020 by ProDigit
Post:
For the current batches :
short WUs : 200 credits
long WUs : 5000 credits


Woo Hoo.. Liking those long WUs..

They also take 25x longer to finish.
27) Message boards : Number crunching : Short Tasks run for 14 hours and counting. (Message 740)
Posted 9 Apr 2020 by ProDigit
Post:
I agree Virtual Box is also highly flawed.

And slow!
I prefer to just install a Ubuntu on USB and install boinc from there.

Like another user posted, did you verify if you have VT (for VirtualBox) and XMP enabled in Bios, and enough RAM to run that many threads? (you'll need about 24-32GB of RAM, if you want to run 32 threads of QuChemPedIA tasks).

One of the reasons mine was working so slow, was because I had only 16GB of RAM, for 24 threads, and it was doing a lot of disk accessing (about 2 GB of swap).
28) Message boards : Science : Contribute your expertise to design inhibitors of the SARS-CoV-2 main protease (Message 730)
Posted 8 Apr 2020 by ProDigit
Post:
Little off topic perhaps, but,
Why invest time and energy in CPU bound projects, when GPU projects are heaps faster, and also support (and will continue to increase in supporting) Double Precision?
I'm sure it's interesting to see how CPU projects can be improved upon, but you won't see a 5x improvement, like seen from running a GPU project vs a CPU project; comparing similar priced and similar in power draw CPUs and GPUs
29) Message boards : Number crunching : Waiting for validation (Message 729)
Posted 8 Apr 2020 by ProDigit
Post:
I have wus waiting for validation since 8 March.
Please re-send these wus or give me a wingman

I have about 670 pending, going back to February.
Not worried, results will be validated in time.
30) Message boards : Number crunching : Short Tasks run for 14 hours and counting. (Message 726)
Posted 7 Apr 2020 by ProDigit
Post:
You say 'short tasks', but I only know of standard and 'long' tasks they're sending out.
The standard tasks take about 1 hour to finish on a 3,5-4Ghz processor.
Then there are 'long' files, manager displays them as 'NWChem long 0.19 (t1)'.
Which on a 1950x CPU should run an average of about 20 hours.
If your CPU tasks are 100% occupied, it means they're still crunching.

There are 2 things I would look at.
If there's a way you can check your CPU temperature? If it's above 80C you may have a cooling issue to look at. that cause the CPU to lower frequency. Above 90-95C the CPU would run at a very low speed (like 700Mhz or so).
in this case either increase cooling capability, or decrease CPU voltage/frequency.

Second I would look at, is if you're running any GPUs. If you're doing 32 out of 32 threads for QuChemPedIA, but have some mis-configured GPU projects running (that eg: say 0,2 CPU, but use much more than that), you may overload your CPU. Seeing Boinc tells me you're running Win 10 and 1 GPU, in that case, you may try to either configure the GPU project, or just set your CPU usage to 99% instead of 100%. That should give the CPU enough breathing room.
This would work if your CPU is having a misconfigured GPU project that overloads the CPU.
31) Message boards : Number crunching : credits for WUs? (Message 708)
Posted 31 Mar 2020 by ProDigit
Post:
What are the PPDs assigned to WUs (both normal and long)?
32) Message boards : Number crunching : Results inconclusive? (Message 707)
Posted 31 Mar 2020 by ProDigit
Post:
What does it mean, 'results inconclusive'?
I have several tasks that are categorized as such.
33) Message boards : Number crunching : NWChem 0.11 (vbox64_t1) tasks do not suspend with BOINC preferences (Message 706)
Posted 31 Mar 2020 by ProDigit
Post:
I have BOINC set to suspend processing when the computer is in use so that it will not conflict with people actually using the computers. Most BOINC projects follow this preference and pause computing when appropriate. However, the NWChem 0.11 (vbox64_t1) tasks do not. They continue running and using full CPU and RAM resources even though the BOINC client itself is suspended. This makes the computer completely unusable and will probably result in many participants like myself being unable to contribute resources to this project. The only way to resolve this is to manually kill the VBox processes in Task Manager (which also renders the results invalid) and that should not be necessary if the tasks are configured to follow the BOINC computing preferences.

If you're running the long processes, it may take a while for the client to completely stop. My observations are that it could take up to a full 2 minutes before the client closes connection.
Probably the client stops right away, but is still performing garbage collection (or compress the data or so).


I do want to say that QuChempedIA LONG tasks, don't seem to honor the 600 minute timeout I have set in manager.
Some of the tasks were running past 11 hours non stop.
34) Message boards : Number crunching : No work? (Message 704)
Posted 26 Mar 2020 by ProDigit
Post:
I'd rather run it straight from Linux
35) Message boards : Number crunching : Supported CPU and OS types? (Message 701)
Posted 23 Mar 2020 by ProDigit
Post:
Perhaps related, is an epyc CPU supported?
36) Message boards : Number crunching : Supported CPU and OS types? (Message 700)
Posted 23 Mar 2020 by ProDigit
Post:
I just upgraded the 3900x to a 3950x.
Expensive, but worth it!
In the process, I found that the bios needs to have xmp enabled, for the memory to overclock properly to the rated setting. (Yes you read that).
When I was still fiddling with PCs, CPUs, ram, soundcards, a memory's rated speed would be printed on the ram.
And if your system didn't run that speed, it meant either the Mobo or CPU didn't support the speed.

Nowadays, they all run at 2133Mhz. All of them. If they printed 3200 on them, it means the system needs to overclock it to 3200Mhz.
Weird.

Anyway, this was my reason for glitches. Both 3900 and 3950 now work with 3200Mhz ram speed.
37) Message boards : Number crunching : Supported CPU and OS types? (Message 688)
Posted 12 Mar 2020 by ProDigit
Post:
I don't know about the Windows client, but the Linux client I get this boinc PPA
It works pretty much straight out of the box.
For ryzen and threadripper, setting cpu to 99 seems to work as well.
38) Message boards : Number crunching : Supported CPU and OS types? (Message 680)
Posted 12 Mar 2020 by ProDigit
Post:
The program crashes few seconds after starting. Can you copy/paste the OPT.out file in one of yours slots ?

Ryzen arrived, and I had the same error.

Here are 2 ways to optimize it and have the client work fine:
1- In Linux you'll need at least 1 CPU core available (unused). I can imagine in Windows it's 1 or 2 more (because...windows).

For the Ryzen 9 2900x this means setting CPU to 96% (23 CPU tasks max on a 12 cores 24 threads machine). I do run 3 GPUs, which each take up less than 1 CPU core per GPU, but all in all, there are 21 cores working on QuChemPedIA, and 3 cores on GPUs. Total CPU usage is around 96-98%. Enabling all cores, will crash the client. If you do the math, it sounds like there are 24 cores working, but Boinc combines some CPU threads for GPUs into a single CPU core, if they're not using a lot of CPU processing. Thus, a setting of 96% CPU in my case, runs 23 cores, even if it runs 24 tasks (with GPU combined).

2- The stock cooler is garbage! Buy a $160 watercooler for this CPU! I did.
I currently have to run the CPU in ECO mode (set in bios overclock), where all cores run at ~3,1Ghz instead of 3.8.
But as soon as I have a better cooling solution, I might or might not switch over. If the newer cooling enables higher boost clocks on all cores, I might just leave it in Eco mode.
Running Eco mode on 3,8Ghz on all cores overclock crashes the system.
I can clearly see that up to 12 tasks runs fine at 3,8Ghz. but once I add an additional 4 or 5, the CPU frequency drops to 3,5Ghz. And all the way down to 3,1Ghz on 23 tasks.
39) Message boards : News : NWChem long (Message 677)
Posted 10 Mar 2020 by ProDigit
Post:
Ram nowadays is cheap; especially DDR3.
My last Xeon server got maxed out with 4x4GB. The board doesn't support higher DIMMS. 4x 4GB cost me like $40, since I already had 2x4G DIMMS in there.
40) Message boards : Number crunching : Supported CPU and OS types? (Message 671)
Posted 9 Mar 2020 by ProDigit
Post:
I am running Linux Mint with AMD threadripper and every job is erroring out as follows.


<core_client_version>7.16.3</core_client_version>
<![CDATA[
<stderr_txt>
12:56:13 (50832): wrapper (7.5.26014): starting
12:56:13 (50832): wrapper: running worker.sh ()
Jobs starts with 1 cores
OPT
Create output archive
*** WARNING : deprecated key derivation used.
Using -iter or -pbkdf2 would be better.
OPT.out
Normal termination.
12:56:15 (50832): worker.sh exited; CPU time 0.957374
12:56:15 (50832): called boinc_finish(0)

</stderr_txt>
]]>

That would seriously suck, because I just purchased a Ryzen 9 for this specific job!
I'm currently running Linux (Lubuntu) with a Core I3 9300F, a Core I5 4700, as well as a 9400F and they all work well!

Edit: It seems others are able to run threadripper and Ryzen 7 CPUs
https://quchempedia.univ-angers.fr/athome/cpu_list.php
I guess this list answers my question!
Thanks!


Previous 20 · Next 20

©2024 Benoit DA MOTA - LERIA, University of Angers, France