Compile for AVX-512 VNNI

Message boards : Number crunching : Compile for AVX-512 VNNI
Message board moderation

To post messages, you must log in.

AuthorMessage
Aurum
Avatar

Send message
Joined: 14 Dec 19
Posts: 68
Credit: 45,744,261
RAC: 0
Message 1141 - Posted: 15 Oct 2020, 2:13:21 UTC

I came across this article:
https://www.nas.nasa.gov/hecc/support/kb/cascade-lake-processors_579.html#:~:text=Cascade%20Lake%20also%20introduces%20in,floating%2Dpoint%20operations%20per%20cycle
"In addition to the instruction sets SSE, SSE2, SSE3, Supplemental SSE3, SSE4.1, SSE4.2, AVX, AVX2, AVX-512, and AVX512[F,CD,BW,DQ,VL], which are available in its Skylake predecessor, Cascade Lake also includes the new AVX-512 Vector Neural Network Instructions (VNNI), which provide significant, more efficient deep-learning inference acceleration. Cascade Lake also introduces in-hardware mitigations for the Spectre and Meltdown security flaws.

With 512-bit floating-point vector registers and two floating-point functional units, each capable of Fused Multiply-Add (FMA), a Cascade Lake core can deliver 32 double-precision floating-point operations per cycle.

Use the Intel compiler flag -xCORE-AVX512 for Skylake and Cascade Lake-SP specific optimizations. The optimization flag -qopt-zmm-usage=high -xCORE-AVX512 may benefit floating-point heavy applications running on Skylake and Cascade Lake.

Tip: If you want a single executable that will run on any of the Aitken, Electra, Pleiades and Merope processor types, with suitable optimization to be determined at run time, you can compile your application using the option -O3 -ipo -axCORE-AVX512,CORE-AVX2,AVX -xSSE4.2."

Note that 32 DP FLOPs per cycle is double what we've had.
If you want to compile for AVX-512 VNNI I'll be glad to test it.
ID: 1141 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 14 Dec 19
Posts: 68
Credit: 45,744,261
RAC: 0
Message 1142 - Posted: 15 Oct 2020, 2:15:43 UTC

As an aside, do QuChem WUs use more integer operations or floating point ???
ID: 1142 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
damotbe
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Help desk expert

Send message
Joined: 23 Jul 19
Posts: 289
Credit: 464,119,561
RAC: 0
Message 1144 - Posted: 16 Oct 2020, 6:27:16 UTC - in response to Message 1142.  

I guess that most of the operations are floating point (FMAD, ADD, MUL) with vector (size <=256 bits, determined at runtime probably). I have neither Intel Compiler either the time to start such a project.

But, I'm pretty sure that if a volunteer compile a new version of nwchem for himself and modify local configurations files in quchempedia directory, he can manage to test such things.
ID: 1144 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 14 Dec 19
Posts: 68
Credit: 45,744,261
RAC: 0
Message 1146 - Posted: 16 Oct 2020, 12:00:16 UTC - in response to Message 1144.  

I have neither Intel Compiler either the time to start such a project.
But, I'm pretty sure that if a volunteer compile a new version of nwchem for himself and modify local configurations files in quchempedia directory, he can manage to test such things.

Sometimes that happens, e.g. http://asteroidsathome.net/boinc/ had someone compile it for CUDA 10.2.

If that was something I had or knew how to do I'd gladly give it a try. Looks like those compilers are expensive.

So would your WUs run faster if they were compiled for high end CPUs?
ID: 1146 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[AF] fansyl

Send message
Joined: 31 Jul 19
Posts: 2
Credit: 5,023,564
RAC: 0
Message 1148 - Posted: 16 Oct 2020, 13:39:58 UTC

Are the sources available somewhere? Github?
ID: 1148 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 14 Dec 19
Posts: 68
Credit: 45,744,261
RAC: 0
Message 1151 - Posted: 19 Oct 2020, 15:24:04 UTC - in response to Message 1148.  

Are the sources available somewhere? Github?

Doctor Google suggests: https://github.com/bharismendy/QuChemPedIA
ID: 1151 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
damotbe
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Help desk expert

Send message
Joined: 23 Jul 19
Posts: 289
Credit: 464,119,561
RAC: 0
Message 1155 - Posted: 20 Oct 2020, 9:22:56 UTC - in response to Message 1151.  

Are the sources available somewhere? Github?

Doctor Google suggests: https://github.com/bharismendy/QuChemPedIA



NO, that's a side project (results visualization for the community).

The code of the executable for the applciation, is NWChem : https://nwchemgit.github.io/Compiling-NWChem.html
ID: 1155 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 14 Dec 19
Posts: 68
Credit: 45,744,261
RAC: 0
Message 1156 - Posted: 21 Oct 2020, 16:40:08 UTC

Thanks. I sure hope someone compiles this for Linux Mint 20 (Ubuntu) and I'll be glad to test it.
ID: 1156 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 14 Dec 19
Posts: 68
Credit: 45,744,261
RAC: 0
Message 1157 - Posted: 22 Oct 2020, 17:21:05 UTC

A free compiler may be available:
https://software.intel.com/content/www/us/en/develop/tools/parallel-studio-xe/choose-download/open-source-contributor.html
ID: 1157 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Compile for AVX-512 VNNI

©2024 Benoit DA MOTA - LERIA, University of Angers, France