What are you calculating? Some explanation

Author	Message
tcauchy Send message Joined: 4 Aug 19 Posts: 11 Credit: 74,704,720 RAC: 0	Message 313 - Posted: 2 Dec 2019, 21:55:06 UTC Hi all! Thanks again for contributing to this scientific project. I will try here to briefly present you the calculations that you are running on your machines. Since we have finished the first re-calculation of the original data set, we are entering in a new phase. We are calculating each time ONE molecule with Quantum mechanics. A molecules is a set of atoms bonded together. For now, we limit ourselves to small molecules with only H, C, N, O and F. In quantum mechanics, we use an approximation of the Schrödinger equation. Therefore, calculations can only be compared if they correspond to the same approximations (usually referred as the level of theory). The first calculation is the optimization of the atomic positions. A geometrical optimization to obtain the 3D atomic positions that is the most stable. If the starting atomic positions are far from the stable ones, this step can take a very long time. Hence, the sometimes unpredictable very long calculations times! (See https://en.wikipedia.org/wiki/Energy_minimization but it is quite mathematical.) Then the second step is to calculate the full derivative of the energy with respect to the position of the atoms. That means to see what are the forces between each atoms. This calculation give us also the Infrared absorption frequencies. Finally the last step is the calculation of the electronic excitations to simulate the UV-visible spectra. It gives valuable information for photo-voltaic application for example. In the BOINC private we have a proprietary program that is more than 10 times faster. But for the public part we use NwChem, an open solution. In NWchem, with the current level of theory the optimization step the average time is around 5h and 25h each for the freq and electronic excitations steps. Since we want to generate a lot of new unknown molecules, calculations could take much more time than before ! So we are searching for a lower level of theory that could help us discriminate the good and bad candidates. In the private part, we will only calculate the good ones. We will need you as a super filter. We could generate several thousands of new molecules per day with probably a lot of errors! By the way, we would like to give you the opportunity to see the drawing of the molecules that you have calculated. But we probably won't have time until next year. ;D If some of you are proficient in python, it could be useful later for some small tasks... ^^ Kindly, Thomas ---------------------------------------------------------------------------------------------------------------------- En français: Salut a tou.te.s ! Merci encore de contribuer à ce projet scientifique. Vu que nous entrons dans la seconde phase de ce projet, je vais tâcher de vous expliquer brièvement ce que vous calculer. Ce projet de chimie quantique s'intéresse à chaque fois à UNE seule molécule. C'est à dire un assemblage d'atomes liés chimiquement les uns des autres. Nous nous limitons au départ à de petites molécules contenant des H, C, N, O et F. En mécanique quantique, nous utilisons des approximations de l'équation de Schrödinger. Il est très important de comparer des calculs ayant le même niveaux d'approximation (souvent appelé niveau de théorie). La première étape correspond à l'optimisation géométrique des positions atomiques. A partir d'un point de départ donné, l'on recherche les positions qui minimisent l'énergie totale. Dès lors si nos positions de départ sont loin de l'état d'équilibre cette étape peut durer longtemps. Ce qui est très peut prévisible. Après, on dérive l'énergie en fonction des positions atomiques afin de connaître les forces entre les atomes. Cela nous donne accès aux fréquences infrarouge absorbées. Finalement, la dernière étape correspond au calcul des états excités électronique. Cette étape est très intéressante car elle nous renseigne sur l'absorption UV-visble de la molécule. Ce qui est primordiale pour des applications comme le photovoltaïque organique. Dans le projet BOINC private, on utilise un programme de calcul propriétaire assez efficace. Mais pour la partie publique nous avons choisi, un code ouvert, NWChem. Or, nous désirons maintenant générer des molécules nouvelles et les calculs risquent d'être encore plus long. Déjà pour la partie optimisation, le temps moyen de 5h et sur les deux autres étapes c'est plus de l'ordre de 25h chaque. Donc nous sommes en train actuellement de recherche un compromis en abaissant le niveau de théorie afin d'écarter rapidement les molécules qui ne sont pas réalistes du tout. Comme on peut sortir facilement 1000 molécules à la journée avec beaucoup d'erreurs, attendez-vous à ce que les temps fassent le yo-yo :D On aimerait bien vous proposer de voir les dessins des molécules que vous calculer mais je ne sais pas si nous aurons le temps avant la fin d'année. Si certains d'entre-vous sont bien compétant en python envoyez-nous un message. ++ Thomas ID: 313 · Rating: 0 · rate: / Reply Quote

Aurum Send message Joined: 14 Dec 19 Posts: 68 Credit: 45,744,261 RAC: 0	Message 343 - Posted: 15 Dec 2019, 14:59:03 UTC - in response to Message 313. Last modified: 15 Dec 2019, 14:59:19 UTC In the BOINC private we have a proprietary program that is more than 10 times faster. Why can't we help you crunch the fast code??? ID: 343 · Rating: 0 · rate: / Reply Quote

damotbe Volunteer moderator Project administrator Project developer Project tester Project scientist Help desk expert Send message Joined: 23 Jul 19 Posts: 289 Credit: 464,119,561 RAC: 0	Message 345 - Posted: 15 Dec 2019, 15:07:11 UTC - in response to Message 343. It is a proprietary software (not our code), very expensive and with a very very restrictive license... At the moment, we can't share this code. ID: 345 · Rating: 0 · rate: / Reply Quote

Aurum Send message Joined: 14 Dec 19 Posts: 68 Credit: 45,744,261 RAC: 0	Message 346 - Posted: 15 Dec 2019, 15:39:55 UTC - in response to Message 345. I appreciate that. Can you say what the software is? I thought that since it's compiled and running in a BOINC wrapper it would be safe. Still glad to help. ID: 346 · Rating: 0 · rate: / Reply Quote

Aurum Send message Joined: 14 Dec 19 Posts: 68 Credit: 45,744,261 RAC: 0	Message 367 - Posted: 20 Dec 2019, 15:03:36 UTC I see a lot of Trickle Up messages. What's trickling up? ID: 367 · Rating: 0 · rate: / Reply Quote

mmonnin Send message Joined: 8 Oct 19 Posts: 13 Credit: 2,548,714 RAC: 0	Message 372 - Posted: 21 Dec 2019, 22:43:10 UTC - in response to Message 313. The first calculation is the optimization of the atomic positions. A geometrical optimization to obtain the 3D atomic positions that is the most stable. If the starting atomic positions are far from the stable ones, this step can take a very long time. Hence, the sometimes unpredictable very long calculations times! (See https://en.wikipedia.org/wiki/Energy_minimization but it is quite mathematical.) [/i] Hi Thomas, Is this the reason every few days that all tasks are resends? Every task is either A) very old like a dsgdb9nsd_nwchem with multiple errors or B) a validate error from another user? For days everything is a new task with _0 then everything is _1 or sometimes _5, _6. During this period of resends it seems like much of our work is wasted along with the many, many....many validate errors. ID: 372 · Rating: 0 · rate: / Reply Quote

tcauchy Send message Joined: 4 Aug 19 Posts: 11 Credit: 74,704,720 RAC: 0	Message 375 - Posted: 24 Dec 2019, 10:30:55 UTC - in response to Message 372. Hi mmonnin, For OD9, we are using a lower level of theory than the first batch we have used. [B3LYP with 3-21G instead of B3LYP with 6-31G(2df,p)] Therefore to be able to compare the results we need to calculate the previous molecules (dsgdb9...). We know that some will crash but the thing is that with different computational parameters, the crash could concern differents molecules. This was never documented at a such large scale! That is why you are seeing "old" calculations reappering. For the numbers, we have generated a new dataset of 211k molecules totally new (with another 200k just in case). With a first rapid estimation we have seen that 30% of the calculations on those seems to fail, and 30% of the calculation change radically the molecule (probably the longest calculations). That means that 40% of the newly generated molecules will be kept. After the holidays, we will try our first Machine leanring predictions on those and maybe generate new ones to reach 211k. Merry Christmas to all of you. Thomas ID: 375 · Rating: 0 · rate: / Reply Quote

Aurum Send message Joined: 14 Dec 19 Posts: 68 Credit: 45,744,261 RAC: 0	Message 378 - Posted: 26 Dec 2019, 10:38:22 UTC - in response to Message 375. And I presume you'd get another set if you changed the solvent polarity. I really like hearing about the science of your project. The more you tell us about what's going on the more crunchers that will follow you. I would especially enjoy hearing about future applications that you dream may come of this work, e.g. catalysts, drugs... ID: 378 · Rating: 0 · rate: / Reply Quote