The project was developed a few years ago with an intern and a very limited budget. Despite many project submissions, we did not get any funding. Sometimes we are told that our project is too ambitious, sometimes not enough... It is discouraging! The truth is that the public funding of research in France is extremely low.
Our prototype programs are outdated. At the beginning, I struggled with VMs, then security changes in the Linux kernel started to prevent the application from running properly.
Despite everything and thanks to you (thank you!), we are coming to the end of a computing campaign and the integration of the data will take me a lot of time. The project is obviously not finished and we could still submit many calculations. But today the priority is to analyze and valorize what we already have. I'll go through the last calculations that had problems and see if I can resubmit some tasks, but very soon there will be no more WU.
We have made a lot of progress with little funding, we will continue to share our scientific publications with you. Moreover, we are currently finishing writing an article and I can already tell you that we have developed a way to greatly reduce the calculations of unstable molecules.
See you soon
29 Aug 2022, 9:42:32 UTC · Discuss
Presensation of our work
In the context of the Molecular Modeling and Drug Discovery (M2D2) talks, organized by the Valence Discovery lab, Thomas Cauchy made a presentation of our work. This video is intended for researchers but Thomas makes, as always, an important effort to popularize the subject. I hope you will enjoy it !
31 Mar 2022, 9:14:23 UTC · Discuss
New scientific publication
We are very pleased to announce the release of our latest publication associated with the quchempedia project.
Link : https://jcheminf.biomedcentral.com/articles/10.1186/s13321-021-00554-8
The article is freely available. Feel free to have a look at it even if some parts of the article are quite technical and intended for specialists.
The most important is that the results of this article are based on the calculations of this BOINC project!
With the molecules we generated and you calculated, it was possible to probe the vast space of chemistry.
You are mentioned in the acknowledgements but we would like to renew our gratitude to you here.
We have many more projects.
Benoit and Thomas
5 Oct 2021, 8:31:59 UTC · Discuss
Server shutdown for maintenance
For electrical checking reasons (in the whole university), the server must be shut down from tomorrow August 16th for about a week. I hope to be able to restart everything on August 23rd.
15 Aug 2021, 20:08:02 UTC · Discuss
Big server failure
The server has been offline for 5 days due to a failure on the system disks. With a lot of work, we managed to get the server back online without any data loss. The disk redundancy is back online.
Sorry for the lack of news. The current campaign is still ongoing and we are also working on scientific publications. The health situation gives us a lot of extra work, but we don't give up!
17 Mar 2021, 14:51:21 UTC · Discuss
Scientific publication and news
Thank you very much for your help.
I am pleased to announce the publication of our latest open access article describing EvoMol, our opensource molecule generator.
EvoMol: a flexible and interpretable evolutionary algorithm for unbiased de novo molecular generation
The objective of this work is to design a molecular generator capable of exploring known as well as unfamiliar areas of the chemical space.
Our method must be flexible to adapt to very different problems. Therefore, it has to be able to work with or without the influence of prior data and knowledge. Moreover, regardless of the success, it should be as interpretable as possible to allow for diagnosis and improvement.
We propose here a new open source generation method using an evolutionary algorithm to sequentially build molecular graphs. It is independent of starting data and can generate totally unseen compounds. To be able to search a large part of the chemical space, we define an original set of 7 generic mutations close to the atomic level.
Our method achieves excellent performances and even records on the QED, penalised logP, SAscore, CLscore as well as the set of goal-directed functions defined in GuacaMol. To demonstrate its flexibility, we tackle a very different objective issued from the organic molecular materials domain. We show that EvoMol can generate sets of optimised molecules having high energy HOMO or low energy LUMO, starting only from methane. We can also set constraints on a synthesizability score and structural features. Finally, the interpretability of EvoMol allows for the visualisation of its exploration process as a chemically relevant tree.
You can find for free the full article :
or here :
You can help us a little bit more by visiting these pages to give us visibility and you can also share on your teams forums and/or social websites like tweeter.(@b_damota)
We have been working for some time now on the following article. It will deal in particular with the calculations you have made since the beginning of the project. The result will be an open access dataset. While we are writing this article we are also working on the next parts. Without divulging, I can only tell you that your calculations will help us to propose a unique tool that is particularly useful for chemists. We will of course keep you informed! Currently, two campaigns are in progress and do not concern the work mentioned above. The "nwchem long" units and the new "nwchem" with tasks with prefix "CL9" will also bring new results for articles probably end of 2021 or 2022.
29 Sep 2020, 8:09:19 UTC · Discuss
Scientific publication and new WUs
First of all, thank you very much for your help and for your interest in our research.
We are proud to announce the imminent publication of our work on the generation of molecules with AI. You can already read the first draft here : https://www.researchsquare.com/article/rs-36676/v1. It's a raw version with almost no formatting. We have a more polished version that should come out in few weeks. The article will be in open access, the molecule generator in open source and the data in open data. Open Science!
Our exploration of chemical space continues and we have just generated more than 2.5 million small molecules. I know that some people are waiting eagerly for the return of the short WU and here they are! As before, many calculations will be considered invalid because of unstable molecules, but this is the price for unbiased cartography of the chemical space. The first results are very encouraging and we hope that these 2.5 million new molecules will help to provide extremely useful tools for many chemists.
18 Jul 2020, 9:31:44 UTC · Discuss
NWChem long no longer in beta soon
The latest tests for long units (a few dozen hours) are conclusive with a good success rate. I don't plan to put this application in a VM for Windows and Mac, short units are already enough of a problem. The "NWChem Long" application will therefore come out of its beta phase soon. If you don't want to get these units that can last really long, don't forget to update your preferences.
Thank you for your help and your patience which allowed us to find good parameters.
24 Feb 2020, 17:43:47 UTC · Discuss
Thank you very much ! You are a really great help.
Initially, only two hundred "NWchem long" workunits should have been submitted for testing. Your numerous results and also the numerous failures allow us to better evaluate parameters for this new simulation. I still have to write some scripts for the new inputs, but I should be able to submit jobs soon. Don't hesitate to cancel in progress workunits (NWchem long).
18 Feb 2020, 9:01:04 UTC · Discuss
Molecules are coming!
The new batches of molecules are coming! With them come the new credit system (200 credits per WU) and quorum validation. the expected runtime is 2-3 hours on a recent personal computer.
You can also see a new beta application (NWChem long) that will be used for the bigger calculations we talked about. The inputs are ready since the poll, but I still have to perform some tests. Stay tuned!
3 Feb 2020, 16:10:53 UTC · Discuss
Credits and Gridcoin
Yesterday, I had to suspend an account for two weeks and remove credits, for
obvious credit cheating investigations. I'm quite annoyed that instead of doing science, I have to deal with this kind of behavior. We're small and we're short on time and it doesn't help scientific research...
EDIT : after investigations and fruitful exchanges, the problem has been identified and I'm sorry to have been a bit rough with this user.
The current credit system is too easy to fool, so I'm going to move to something simpler, robust and more generous on average: fixed credits. For short tasks (such as od9), I'm going to award 200 credits. This change requires draining the task queue. At that time, I will submit new tasks. These new tasks will be the opportunity to deploy the new code with checkpoints, system signals and affinity management for large systems (>32 cores). Some errors are to be expected, I can't test everything.
The last point concerns the requests for Gridcoin. I've been asked by the developers and by some of you. I am not against this possibility, but three points do not allow for the moment to be whitelisted. First, I can't guarantee to always have tasks waiting to be calculated. Secondly, the incentive to cheat will increase and I find that increasing the quorum is a waste of resources. Thirdly, I'm struggling with the server to keep it up. The upcoming arrival of larger molecules should settle the first point. For the second point, we are thinking about a validation by analyzing the result. I have already made many optimizations for the third point, at the moment it's much better.
Benoit Da Mota
30 Jan 2020, 9:03:31 UTC · Discuss
New Linux app and new WU
I have written a new version of the application for Linux (0.12), which is deployed in beta. The checkpoints have been added, but the display of the task progress is not correct. Don't worry, the computation is back to where it was. Moreover, I've added an adhoc management of system signals, to interrupt and resume tasks correctly. WARNING, this code is in beta and has a very high chance to fail. Please only use it if you want to monitor what is going on and help debugging. if the code does not cause a problem, it will quickly become the new reference code (ie. not in beta).
For Mac and Windows users, I am currently looking for workarounds for problems with Virtual Box.
I'll soon be putting short tasks for small molecules in the od9 series. Stay tuned !
28 Jan 2020, 11:08:23 UTC · Discuss
Updates and poll
Dear Quchempedia crunchers!
First generation of our newly generated small molecules is almost finished. Thanks again.
We have two propositions for the new phase of calculations :
1. Make a pause (maybe a month or so), in order to parse and treat the recent calculations, learn from the success and failures of the calculations and then generate new small molecules. Probably with a little bit more than 9 atoms.
2. Take some of the newly generated compounds, add them to a core (BTX) used in the chemistry lab here in Angers (see the abstract of this article https://pubs.rsc.org/en/content/articlelanding/2019/nj/c9nj05804d/unauth#!divAbstract) to demonstrate how we can use our newly generated molecules inside a real system, to show how a fragment can modify the core properties and to serve as a screening example. These calculations are very interesting and can lead to very nice applications (drugs and materials).
Beware that the second choice, means that the molecules will have more than 9 heavy atoms, probably more than 30 and so calculations could take days. The good news is that the next workunits will implement checkpointing. Boinc will not be able to display the real level of progress and will think that the calculation starts again from the beginning. But we've run some tests and the calculations restart from the very last step. The expected calculation times will always be very approximate and unreliable, we will voluntarily choose a slightly high value.
If you choose the first option, we will calculate the BTX ones with our private ressources and we will post a news when we will have treated and generated new small molecules.
Thank you for giving your choices and opinions under this post.
Thomas and Benoit
14 Jan 2020, 14:24:40 UTC · Discuss
Our article titled "Dataset’s chemical diversity limits the generalizability of machine learning predictions" was accepted and published ! It is an Open Access article :
If you have any question, feel free to contact us on the forum of the project (under this message).
Here is a message from Thomas Cauchy about our reseach :
I am the chemist of this project. The publication mentioned by Benoit Da Mota was written when we launch the boinc project. But I can extract some sentences of this article to show what we have in mind :
"Abstract: The QM9 dataset has become the golden standard for Machine Learning (ML) predictions of various chemical properties. QM9 is based on the GDB, which is a combinatorial exploration of the chemical space. ML molecular predictions have been recently published with an accuracy on par with Density Functional Theory calculations. Such ML models need to be tested and generalized on real data. PC9, a new QM9 equivalent dataset (only H, C, N, O and F and up to 9 "heavy" atoms) of the PubChemQC project is presented in thisarticle. A statistical study of bonding distances and chemical functions shows that this new dataset encompasses more chemical diversity. Kernel Ridge Regression, Elastic Net and the Neural Network model provided by SchNet have been used on both datasets. The overall accuracy in energy prediction is higher for the QM9 subset. However, a model trained on PC9 shows a stronger ability to predict energies of the other dataset."
The QM9 dataset has around 130k small molecules, when our PC9 has 119k (but was extracted from another type of calculations). The problem is that the full results of the QM9 are not openly available. They have extracted some results of the costly quantum mechanics calculations and trashed the log. We are not satisfied by PC9 that was a simple demonstration that more diversity is needed.
For the moment the boinc project is aiming at recalculating the interesting molecules of QM9 and PC9 with the same level of calculation this time. All the results will be available at the quchempedia document base https://quchempedia.univ-angers.fr when this platform will be a little bit more robust (beginning 2020) in par with our quality control tool as written by my colleague.
We are not fully happy with NWChem yet. With the same boinc project Benoit Da Mota and myself, are using Gaussian (proprietary) which is much efficient. But Nwchem is open source...
We have calculated roughly 130 k over 200 k thanks to your help!
For December we hope to propose to the community to calculate new molecules that maybe don't even exist and are not stable in order to help machine learning tool to generalize better. Those new molecules will be generated by a machine learning procedure. Too long to explain here right now.
If you have any question...
Errors and failures
Thank you for your participation and patience.
We are facing new problems and this was expected with the arrival of so many volunteers. Don't worry about failures. The ones I am concerned about are software (errors), but there will always be errors related to the question asked in chemistry (invalid). At the moment, I'm not sure if the server correctly classifies these two types of failures. We are working to make the code more stable. The project is not yet in a stable version and many versions of the code will coexist for some time. If you use VM (Windows and Mac) and you notice a lot of errors you can try two things. First, install the latest versions available (Boinc and virtualbox), second, check if your processor accept virtualization instructions (and is enabled).
Thank you for your comprehension.
4 Oct 2019, 10:23:53 UTC · Discuss
We are pleased to announce the official opening of the quchempedia@home project.
Thank you for your precious help !
3 Oct 2019, 12:41:42 UTC · Discuss
©2023 Benoit DA MOTA - LERIA, University of Angers, France