Long work units.

Message boards : Number crunching : Long work units.
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
adrianxw
Avatar

Send message
Joined: 3 Oct 19
Posts: 33
Credit: 197,169
RAC: 0
Message 696 - Posted: 18 Mar 2020, 19:16:31 UTC
Last modified: 18 Mar 2020, 19:18:10 UTC

Question about the new long work units that are coming, I have set my preferences to use 1 CPU. What I want, is one work unit. What I don't want is eight work units crunching for days in one CPU each. Is this correct, if not, how do I acheive what I want?
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 696 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
UBT - Timbo

Send message
Joined: 8 Dec 19
Posts: 13
Credit: 652,594
RAC: 0
Message 745 - Posted: 10 Apr 2020, 12:38:05 UTC - in response to Message 696.  
Last modified: 10 Apr 2020, 12:41:07 UTC

Go to your "QuChemPedIA@home preferences" and select as folows:

Run only the selected applications:
NWChem: yes
NWChem long: no

Max # CPUs: 1

That might help and you won't get any LONG tasks.

If that doesn't work as you want, then you should create an "app_config.xml" file (in ordinary "txt" format, using say Notepad) in the "BOINC/projects/quchempedia.univ-angers.fr_athome folder of your PC.

This should contain the following:

<app_config>
<app>
<name>nwchem</name>
<max_concurrent>1</max_concurrent>
</app>
</app_config>

Note: I *think* the app name is "nwchem" but it might be something else...so once you create the above file, use "BOINC Manager > Options > Read config files" and then immediately check "BOINC Manager>Tools>Event Log" and it will tell you the correct name of the QuChem app.

So, re-edit the "app_config.xml" file with Notepad and use the correct name instead of "nwchem". Then save the file, and close Notepad. Then use the "Read config files" function again.
ID: 745 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
adrianxw
Avatar

Send message
Joined: 3 Oct 19
Posts: 33
Credit: 197,169
RAC: 0
Message 747 - Posted: 10 Apr 2020, 18:49:19 UTC
Last modified: 10 Apr 2020, 18:51:34 UTC

I think, in fact, am sure, you misunderstand me. I WANT a long work unit, but just one. I have set already 1 CPU.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 747 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
xii5ku

Send message
Joined: 21 Jun 20
Posts: 24
Credit: 68,559,000
RAC: 0
Message 901 - Posted: 21 Jun 2020, 17:19:20 UTC - in response to Message 747.  
Last modified: 21 Jun 2020, 17:21:58 UTC

adrianxw wrote:
I WANT a long work unit, but just one. I have set already 1 CPU.

This "projects/quchempedia.univ-angers.fr_athome/app_config.xml" file limits boinc-client to start at most one "NWChem long" task at any time:
<app_config>
    <app>
        <name>nwchem_long</name>
        <max_concurrent>1</max_concurrent>
    </app>
</app_config>


The following simpler "projects/quchempedia.univ-angers.fr_athome/app_config.xml" file limits boinc-client to start at most one QuChemPedIA task of any of the available applications:
<app_config>
    <project_max_concurrent>1</project_max_concurrent>
</app_config>


The following setting in your project preferences limits you to one QuChemPedIA task in progress on each of your hosts:

    Max # jobs: 1


The limit on tasks in progress is enforced by the server, i.e. works independently of any client-side settings.

ID: 901 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProDigit

Send message
Joined: 16 Nov 19
Posts: 44
Credit: 21,290,949
RAC: 0
Message 913 - Posted: 27 Jun 2020, 8:28:15 UTC

As far as I know, Quchempedia doesn't send out multi cpu WUs.
They're all single thread WUs.
If you just want your PC to crunch on 1 WU, set your CPU count to 1.
If you're sharing your CPU with other projects, adjust the app_config.xml file, to have a max of x-amount of WUs at a time.
ID: 913 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
adrianxw
Avatar

Send message
Joined: 3 Oct 19
Posts: 33
Credit: 197,169
RAC: 0
Message 918 - Posted: 2 Jul 2020, 11:14:25 UTC

I have added the config file and set my preference to only crunch the longs, but still have not received a work unit.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 918 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
xii5ku

Send message
Joined: 21 Jun 20
Posts: 24
Credit: 68,559,000
RAC: 0
Message 926 - Posted: 11 Jul 2020, 12:49:00 UTC - in response to Message 918.  

adrianxw wrote:
I have added the config file and set my preference to only crunch the longs, but still have not received a work unit.
There is only "NWChem long" work available (server_status.php), which requires either Linux, or Windows with VirtualBox in beta testing (apps.php).
Beta applications require "Run test applications?" switched on in the project preferences.
ID: 926 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
adrianxw
Avatar

Send message
Joined: 3 Oct 19
Posts: 33
Credit: 197,169
RAC: 0
Message 929 - Posted: 12 Jul 2020, 12:02:48 UTC

I have toggled that switch now.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 929 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
adrianxw
Avatar

Send message
Joined: 3 Oct 19
Posts: 33
Credit: 197,169
RAC: 0
Message 930 - Posted: 12 Jul 2020, 14:18:28 UTC

Both my systems downloaded a work unit after doing that. One is running, the other is "Postponed: VM job unmanageable, restarting later" after 4 minutes.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 930 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Henk Haneveld

Send message
Joined: 6 Nov 19
Posts: 8
Credit: 156,845
RAC: 0
Message 931 - Posted: 13 Jul 2020, 9:44:01 UTC - in response to Message 930.  

Both my systems downloaded a work unit after doing that. One is running, the other is "Postponed: VM job unmanageable, restarting later" after 4 minutes.

It should restart after 24 hrs but if you are running VirtualBox version 6.xx then it is likely to happen again when your system is very busy.

I advice to go back to VirtualBox version 5.2.38. This version is much more tolerant of this problem.
ID: 931 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
adrianxw
Avatar

Send message
Joined: 3 Oct 19
Posts: 33
Credit: 197,169
RAC: 0
Message 932 - Posted: 13 Jul 2020, 14:44:24 UTC

The job restarted and has continued to run. It has a very long expiary date, (October), so if it hangs for 24 hours every now and again, I don't suppose there is any harm done.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 932 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
adrianxw
Avatar

Send message
Joined: 3 Oct 19
Posts: 33
Credit: 197,169
RAC: 0
Message 964 - Posted: 26 Jul 2020, 8:26:35 UTC

The time remaining field is not reliable. Yesterday morning, I saw a work unit had 1:40 left to crunch, this morning, it is still there with 0:06 left to crunch. If you see a job like that, just leave it alone, the time does trickle down, and with the deadline being so long, this, and the other problem I've commented on should not become issues.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 964 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
adrianxw
Avatar

Send message
Joined: 3 Oct 19
Posts: 33
Credit: 197,169
RAC: 0
Message 966 - Posted: 26 Jul 2020, 18:31:38 UTC

12 Hours later and still crunching away, but now down to 00:00:03 remaining, hours minutes and seconds are relative with these jobs. Would be really good to know what it is they are doing, they are using serious amounts of CPU time.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 966 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Stephen Uitti

Send message
Joined: 23 Dec 19
Posts: 1
Credit: 55,725
RAC: 0
Message 967 - Posted: 26 Jul 2020, 21:37:48 UTC - in response to Message 913.  

I have a 4 core machine. I have a 4 core unit waiting to run - Ready to start (4 CPUs).

What i hear is that these units really churn the disk, and this isn't SSD, it's spinning magnets. But this is a Linux system with 16 GB RAM (i was going for 8, but 16 was the same price...). If i create, say, a 5 GB RAM disk and get work units to somehow use it, it could really speed them up. Anyone do this sort of thing?

I'm not hearing this literally. I've never been in the room when a unit was crunching. I've only run short units to date, and they use just under 5 GB disk, i think. Anyway, i've had a unit abort in under 2 hours having run out of time.

Stephen.
ID: 967 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 3 Oct 19
Posts: 153
Credit: 32,412,973
RAC: 0
Message 968 - Posted: 26 Jul 2020, 21:52:50 UTC - in response to Message 967.  

What i hear is that these units really churn the disk, and this isn't SSD, it's spinning magnets. But this is a Linux system with 16 GB RAM (i was going for 8, but 16 was the same price...). If i create, say, a 5 GB RAM disk and get work units to somehow use it, it could really speed them up. Anyone do this sort of thing?

A reasonably large write cache will do nicely on Linux, and is easier to set up than a ramdisk. On Ubuntu 18.04 (and probably others), this will set up a 2 GB cache with a 30 minute write-delay.
It will also reduce the swapping to disk, but it still will do it if necessary. I have also included the default values, in case you want to return to them.

Swappiness:  sudo sysctl vm.swappiness=0

Set write cache to 2 GB/2.5 GB: for 16 GB main memory 
sudo sysctl vm.dirty_background_bytes=2000000000 (268435456 default)
sudo sysctl vm.dirty_bytes=2500000000 (1073741824  x4 default)
sudo sysctl vm.dirty_writeback_centisecs=500  (checks the cache every 5 seconds)
sudo sysctl vm.dirty_expire_centisecs=180000 (page flush 30 min.; 3000 default)


Note that the first cache value sets the size to 2 GB, while the second value means that if the cache fills up to 2.5 GB before writing the contents to disk (not likely with a fast SSD), then all writes will be halted until the cache is emptied to disk.
These values should easily handle the QuChemPedIA case, which is not very hard I think.
ID: 968 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
swiftmallard
Avatar

Send message
Joined: 13 Oct 19
Posts: 87
Credit: 6,026,455
RAC: 0
Message 969 - Posted: 26 Jul 2020, 23:19:52 UTC - in response to Message 966.  

12 Hours later and still crunching away, but now down to 00:00:03 remaining, hours minutes and seconds are relative with these jobs. Would be really good to know what it is they are doing, they are using serious amounts of CPU time.

Are you certain that your CPU is actually working on this? The time/progress indicator will show that you are even if you are not. A better indicator is the Windows Task Manager - Performance tab. Your number of processors crunching should equal the amount you have set in your Boinc preferences. If the total comes up short, (and it does occasionally happen) then one of your cores is not crunching
ID: 969 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
adrianxw
Avatar

Send message
Joined: 3 Oct 19
Posts: 33
Credit: 197,169
RAC: 0
Message 970 - Posted: 27 Jul 2020, 3:48:44 UTC

>>> Are you certain that your CPU is actually working on this?

Yes. I am really curious about what it is actually doing, I've asked, no reply. It is still running this morning but the remaining field is now down to 1 second.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 970 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
swiftmallard
Avatar

Send message
Joined: 13 Oct 19
Posts: 87
Credit: 6,026,455
RAC: 0
Message 973 - Posted: 27 Jul 2020, 12:38:45 UTC - in response to Message 970.  

>>> Are you certain that your CPU is actually working on this?

Yes. I am really curious about what it is actually doing, I've asked, no reply. It is still running this morning but the remaining field is now down to 1 second.

Ctrl-Alt-Del and then select the Task Manager. Choose the Performance tab. There you will see a graph of your CPU activity. If it does not correspond accurately to what the Boinc Manager says is happening, then one or more of your cores is not actually crunching. Pause all your WUs in the Tasks tab of Boinc and then start them slowly, one by one. Watch the Windows Task Manager to see which one does not cause a rise in CPU activity. (It's most likely to be the one you have been posting about.) That is your problem WU and you should abort it.
ID: 973 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
adrianxw
Avatar

Send message
Joined: 3 Oct 19
Posts: 33
Credit: 197,169
RAC: 0
Message 974 - Posted: 27 Jul 2020, 13:37:58 UTC
Last modified: 27 Jul 2020, 14:02:01 UTC

The task manager shows the CPU pretty much maxed out on all cores/threads, typically 94% wobbles up and down 1-2%, it is what I would expect to see on these machines.
The tasks here have a ridiculously long deadline, so I guess he is expecting long runs, but the "remaining" item is WAY out of order.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 974 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
swiftmallard
Avatar

Send message
Joined: 13 Oct 19
Posts: 87
Credit: 6,026,455
RAC: 0
Message 975 - Posted: 27 Jul 2020, 13:42:09 UTC - in response to Message 974.  

The task manager shows the CPU pretty much maxed out on all cores/threads.

If you are satisfied that all cores are crunching properly, then let the WU run. It would not be the first time someone has had a very long running task.
ID: 975 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : Long work units.

©2024 Benoit DA MOTA - LERIA, University of Angers, France