Andrew Dalke: mmpdb crowdfunding consortium
(published: Oct. 3, 2019, noon)
How can we raise money to fund open source software development in cheminformatics? It's a hard question. Asking for donations doesn't work – companies might not even have a mechanism to make donations. Consultant-based funding doesn't work that well either, because the cost of developing a general-purpose tool is several times more expensive than developing a tool which only meets the specialized needs of one client, and few clients are willing to subsidize the rest of the field. Proprietary software development solves the problem by getting many people to pay for the same product. Can we learn from the success of proprietary software to get the funds which would certainly be useful in improving open source software?
I have started the mmpdb
crowdfunding consortium to see if crowdfunding can
be used to fund further development of the matched molecular pair
The deadline to join is 1 Febrary 2020 – join now!
mmpdb is an open source success story. It started as the
program developed by Jameed Hussain and Ceara Rea. Their employer, GSK
contributed it to the RDKit project. There was no more GSK funding,
but others could study and improve the code.
Roche then funded me, Christian Kramer, and Jérôme Hert to add several improvements:
- better support for symmetry, which results in fully canonical pair descriptions
- support for chirality, including matching chiral with prochiral structures
- can include the chemical environment when finding pairs
- generate property change statistics for each pair, environment, and property type
- parallelized fragmentation
- fragmentation can re-use fragmentations from a previous run
- performance speedups during indexing
- pair, environment, and property statistics are stored in a SQLite database
- analysis tools to propose possible transforms to an input structure, or to predict property shifts between two structures
Mmpdb is popular. Several people at the 2019 RDKit User Group meeting in Hamburg presented work which used it or at least referenced it.
But, who supports it? Who adds features? There is no more funding from GSK or Roche, so all we have a precious and scarce volunteer time. Others might fund their own developers to improve mmpdb, but the code is pretty complicated and it will take a while for new developers to get up to speed.
There is a long and ongoing discussion about how to fund open source projects. I won't even attempt to summarize them here, though I will point to Roads and Bridges: The Unseen Labor Behind Our Digital Infrastructure as one starting point.
My question is, are mmpdb users willing to fund its further development? If not, the project is not sustainable. I believe they are willing; the problem is that it's hard to justify paying money for software anyone can download for free.
I previously tried to develop chemfp as a purely open source commercial product. When customers bought the product, they got the software under the MIT license. This ended up being difficult, for reasons I'll likely blog about later. I now also offer chemfp with proprietary licensing, at a cheaper price.
With mmpdb, I am trying crowdfunding, along the lines of Kickstarter. The basic goals are:
- Postgres support
- new commmand-line option ("proprulecat") to export property tables as CSV
Beyond that are stretch goals. The one many people want is to store the chemical environment in the database as a fragment SMILES, rather than a hex-encoded SHA256 hash of the rooted Morgan fingerprints.
As more people sign up, I'll develop mmpdb further. Many of the stretch goals are related to documentation and testing. Mmpdb was developed as a research project, and needs those sorts of infrastructure improvements to allow future growth.
If enough people join, there will definitely be future crowdfunding efforts, perhaps a web interface, or support for categorial statistics, or other features people have asked me about.
I don't think people will pay for features that are available for free, so these changes will not be made available to the public until specific funding goals are reached.
How do you explain
crowdfunding to accounting?
Don't. (Unless you really want to.) Tell them you are going to purchase a new version of mmpdb with Postgres and "proprulecat" support. You will receive these within two weeks of sending me – that is, my Sweden-based software company – a purchase order.
In addition, purchase includes membership in the mmpdb consortium. As more people join, and additional funding goals met, I will continue to improve mmpdb, and you will get those improvements as part of your membership.