Message boards :
Number crunching :
Compile for AVX-512 VNNI
Message board moderation
Author | Message |
---|---|
Send message Joined: 14 Dec 19 Posts: 68 Credit: 45,744,261 RAC: 0 |
I came across this article: https://www.nas.nasa.gov/hecc/support/kb/cascade-lake-processors_579.html#:~:text=Cascade%20Lake%20also%20introduces%20in,floating%2Dpoint%20operations%20per%20cycle "In addition to the instruction sets SSE, SSE2, SSE3, Supplemental SSE3, SSE4.1, SSE4.2, AVX, AVX2, AVX-512, and AVX512[F,CD,BW,DQ,VL], which are available in its Skylake predecessor, Cascade Lake also includes the new AVX-512 Vector Neural Network Instructions (VNNI), which provide significant, more efficient deep-learning inference acceleration. Cascade Lake also introduces in-hardware mitigations for the Spectre and Meltdown security flaws. With 512-bit floating-point vector registers and two floating-point functional units, each capable of Fused Multiply-Add (FMA), a Cascade Lake core can deliver 32 double-precision floating-point operations per cycle. Use the Intel compiler flag -xCORE-AVX512 for Skylake and Cascade Lake-SP specific optimizations. The optimization flag -qopt-zmm-usage=high -xCORE-AVX512 may benefit floating-point heavy applications running on Skylake and Cascade Lake. Tip: If you want a single executable that will run on any of the Aitken, Electra, Pleiades and Merope processor types, with suitable optimization to be determined at run time, you can compile your application using the option -O3 -ipo -axCORE-AVX512,CORE-AVX2,AVX -xSSE4.2." Note that 32 DP FLOPs per cycle is double what we've had. If you want to compile for AVX-512 VNNI I'll be glad to test it. |
Send message Joined: 14 Dec 19 Posts: 68 Credit: 45,744,261 RAC: 0 |
As an aside, do QuChem WUs use more integer operations or floating point ??? |
Send message Joined: 23 Jul 19 Posts: 289 Credit: 464,119,561 RAC: 0 |
I guess that most of the operations are floating point (FMAD, ADD, MUL) with vector (size <=256 bits, determined at runtime probably). I have neither Intel Compiler either the time to start such a project. But, I'm pretty sure that if a volunteer compile a new version of nwchem for himself and modify local configurations files in quchempedia directory, he can manage to test such things. |
Send message Joined: 14 Dec 19 Posts: 68 Credit: 45,744,261 RAC: 0 |
I have neither Intel Compiler either the time to start such a project. Sometimes that happens, e.g. http://asteroidsathome.net/boinc/ had someone compile it for CUDA 10.2. If that was something I had or knew how to do I'd gladly give it a try. Looks like those compilers are expensive. So would your WUs run faster if they were compiled for high end CPUs? |
Send message Joined: 31 Jul 19 Posts: 2 Credit: 5,023,564 RAC: 0 |
Are the sources available somewhere? Github? |
Send message Joined: 14 Dec 19 Posts: 68 Credit: 45,744,261 RAC: 0 |
Are the sources available somewhere? Github? Doctor Google suggests: https://github.com/bharismendy/QuChemPedIA |
Send message Joined: 23 Jul 19 Posts: 289 Credit: 464,119,561 RAC: 0 |
Are the sources available somewhere? Github? NO, that's a side project (results visualization for the community). The code of the executable for the applciation, is NWChem : https://nwchemgit.github.io/Compiling-NWChem.html |
Send message Joined: 14 Dec 19 Posts: 68 Credit: 45,744,261 RAC: 0 |
Thanks. I sure hope someone compiles this for Linux Mint 20 (Ubuntu) and I'll be glad to test it. |
Send message Joined: 14 Dec 19 Posts: 68 Credit: 45,744,261 RAC: 0 |
A free compiler may be available: https://software.intel.com/content/www/us/en/develop/tools/parallel-studio-xe/choose-download/open-source-contributor.html |
©2024 Benoit DA MOTA - LERIA, University of Angers, France