Preliminary results?

Message boards : Science : Preliminary results?
Message board moderation

To post messages, you must log in.

AuthorMessage
[VENETO] boboviz

Send message
Joined: 13 Sep 19
Posts: 69
Credit: 399,347
RAC: 0
Message 243 - Posted: 2 Nov 2019, 21:52:00 UTC

The first month of project has gone.
Any preliminary results? Is the code ok? Do you need more computational power??
ID: 243 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
damotbe
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Help desk expert

Send message
Joined: 23 Jul 19
Posts: 288
Credit: 463,652,361
RAC: 23,661
Message 248 - Posted: 4 Nov 2019, 9:59:33 UTC - in response to Message 243.  

We wrote an article (Dataset’s chemical diversity limits the generalizability of machine learning predictions) that was accepted and will be published (in open access) soon.
At the moment, we (crunchers and us) are computing an better dataset to train AI and we work on several AI (an article is waiting for review).
We also work on an open database and tools for quality controls and chemical information enrichment.

Computational power ? Never enough ! ;-p
ID: 248 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
tcauchy

Send message
Joined: 4 Aug 19
Posts: 11
Credit: 74,704,720
RAC: 0
Message 252 - Posted: 4 Nov 2019, 21:37:48 UTC

Hello,

I am the chemist of this project. The publication mentioned by damotbe was written when we launch the boinc project. But I can extract some sentences of this article to show what we have in mind :

"Abstract: The QM9 dataset has become the golden standard for Machine Learning (ML) predictions of various chemical properties. QM9 is based on the GDB, which is a combinatorial exploration of the chemical space. ML molecular predictions have been recently published with an accuracy on par with Density Functional Theory calculations. Such ML models need to be tested and generalized on real data. PC9, a new QM9 equivalent dataset (only H, C, N, O and F and up to 9 "heavy" atoms) of the PubChemQC project is presented in thisarticle. A statistical study of bonding distances and chemical functions shows that this new dataset encompasses more chemical diversity. Kernel Ridge Regression, Elastic Net and the Neural Network model provided by SchNet have been used on both datasets. The overall accuracy in energy prediction is higher for the QM9 subset. However, a model trained on PC9 shows a stronger ability to predict energies of the other dataset."

The QM9 dataset has around 130k small molecules, when our PC9 has 119k (but was extracted from another type of calculations). The problem is that the full results of the QM9 are not openly available. They have extracted some results of the costly quantum mechanics calculations and trashed the log. We are not satisfied by PC9 that was a simple demonstration that more diversity is needed.

For the moment the boinc project is aiming at recalculating the interesting molecules of QM9 and PC9 with the same level of calculation this time. All the results will be available at the quchempedia document base https://quchempedia.univ-angers.fr when this platform will be a little bit more robust (beginning 2020) in par with our quality control tool as written by my colleague.
We are not fully happy with NWChem yet. With the same boinc project damotbe and myself, are using Gaussian (proprietary) which is much efficient. But Nwchem is open source...
We have calculated roughly 130 k over 200 k thanks to your help!
For December we hope to propose to the community to calculate new molecules that maybe don't even exist and are not stable in order to help machine learning tool to generalize better. Those new molecules will be generated by a machine learning procedure. Too long to explain here right now.

If you have any question...
Kindly
Thomas
ID: 252 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dataman
Avatar

Send message
Joined: 7 Oct 19
Posts: 10
Credit: 650,307
RAC: 0
Message 253 - Posted: 4 Nov 2019, 22:29:50 UTC

Many thanks to both of you for taking the time to explain your research. Please keep us posted when you publish.
Cheers

ID: 253 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[AF>Le_Pommier] Jerome_C2005

Send message
Joined: 26 Aug 19
Posts: 15
Credit: 1,265,326
RAC: 1,342
Message 254 - Posted: 4 Nov 2019, 23:05:31 UTC

Thanks a lot for this very interesting feedback, even if most of us won't understand "a variable part of it", this is typically the kind of "high level information" that pleases the crunching community and shows us we are helping an initiative with a lot of value !

Bon courage, nous sommes là pour aider :)
ID: 254 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 3 Oct 19
Posts: 146
Credit: 28,969,973
RAC: 33,093
Message 255 - Posted: 5 Nov 2019, 0:03:45 UTC - in response to Message 254.  

Exactly so. It is cutting-edge, and there will be problems along the way. That is why we are here.
ID: 255 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rapture

Send message
Joined: 3 Oct 19
Posts: 2
Credit: 70,877
RAC: 0
Message 256 - Posted: 5 Nov 2019, 5:38:46 UTC - in response to Message 252.  

Thanks for the detailed explanation. This makes participating in this new project so worthwhile. I hope you will be able to give more reports on a regular basis in the future.
ID: 256 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[VENETO] boboviz

Send message
Joined: 13 Sep 19
Posts: 69
Credit: 399,347
RAC: 0
Message 257 - Posted: 6 Nov 2019, 11:04:16 UTC - in response to Message 252.  

We have calculated roughly 130 k over 200 k thanks to your help!
For December we hope to propose to the community to calculate new molecules that maybe don't even exist and are not stable in order to help machine learning tool to generalize better. Those new molecules will be generated by a machine learning procedure. Too long to explain here right now.


Great!! We are ready!
ID: 257 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Science : Preliminary results?

©2022 Benoit DA MOTA - LERIA, University of Angers, France