Back in 2008 I posted a thread on my forum this thread related in many ways tot the OLXportal I was thinking/creating at the time. But it also reflected in a small footnote an idea I had also wanted to create since University and had spent some time chatting with the CDS people about on and off at Daresbury over lunch/coffee.

I once dreamed of creating my own - perhaps now would be the best time to do that? I wanted to make a system available via opensource and the system I planned on developing would also work in a hive manner to **allow users of it to post bond lengths, angles, cell parameters to a central system anonymously (without reference to the actual compound/structure) unless authorised to do so.** Don't get me wrong all their stuff would live on their local network they could if they wanted then get their local server to talk to the brain and the brain would then share its information with them. That way they could be used to create databases of useful structural information for searching. Often it is the trends that need predicting through having good basis sets irrespective of the actual structures them self, unless an outlier of course. Then the information would refer to a reference code which would allow them to contact the hive node and the node would contact the submitter and access them if they wanted to contact the requester or ignore or decline or something.

The concept was simple it was a structural database but with a difference. Rather than keeping all the data in a central location - such as the CSD database and updating via periodic updates via disc each “user” (user is probably the wrong word here, client perhaps or node?) would have their own hosted system in their institution.

The system would provide for them a totally self-contained database system utilising many of the aspects I identified in the original post:

  1. A lab management system - e-lab book type thing
  2. A crystal structure database system - ideally linked to the above
  3. A both to be portable or at least create portable files to allow working off the grid and then snapshots later
  4. A way to upload all my existing work
  5. The ability to generate summaries/reports
  6. Connectivity with olex2
  7. Also not forgetting a multi OS approach that is I want it for Linux

By doing this they benefit from having the system. Their network hosts the local traffic and they pay a small fee for the privilege of using the system. Why only a small fee?

Importantly “finished” structures would be split up into important key descriptors such as bond lengths and angles, torsions, hbond tables, non-bonding interactions, solvents and that information would be sent to a central system and then distributed out over the network. This means that the information could be made available to commercial interests and they would pay a larger fee for the same system.

So what we would have is a Central Server as shown below: Concept of Node - Hive Structure This server talks to all the “node servers” - the nodes sit at each institution that operates the system. They are populated with the data from each of the different institutions. It could also host under license other databases such as the CSD and COD. In fact each node can also talk to each other node.

This produces a comprehensive list of crystallographic and structural parameters which can be used to generate complex basis-sets for computational “in silico” modelling projects. Something I would expect is worth a little money to various parties. It would also obviously allow each host a fully searchable repository for all their in-house data. Nothing lost when a PhD student leaves, etc.

The amount of information released into the network would be governed at each structure. But the system would allow you to setup blanket generics. The defaults would be failsafe to private unless changed to be something less so. So there would be a choice to allow publication into the network unmasked or a structure could be masked - say if an author is getting ready to publish or is worried about being scoped.

If you carried out a search and found 1000 hits for a particular coordination geometry and wanted to get the full structural information on say all with a particular metal type you could. But in that list would be data that is not say “accessible” data that is not yet ready for public consumption. But it may be a key bit of data. You can request the system contact the originator of the data - all anonymously and they can respond or not. Neither party knows the details of the request per say other than a request has been made - with an option to include a comment obviously.

The key is that each parameter is uniquely identified and so if you were smart and tried to find a new material by looking for all the instances of a particular code that would fail. You couldn’t use a structure code to find all its parameters and then rebuild the structure unless it had been made “public”.

Unit cell searches and simulated powder patterns are of course slightly more problematic and again these would be controlled by the node institution as to whether they release them for “non-published” data or not. Either way they would have access to them in-house.

In my head everybody wins with this system. Institutions get a robust storage facility with many great features. Their data is protected within the hive and can be drilled down say if a supervisor wants to get all his groups/students structures he could. Or if the departmental crystallographer needed to get statistics on their department for budgetary reasons - whatever. The information would be there and stored in a protected manner.

The commercial and importantly community also wins as they get access to structures that end up being locked away in filling cabinets or falling between the cracks because the PhD student made something unexpected and uninteresting to them so it will never be published but turns out is of interest to someone else doing something else somewhere else.

I would have really liked to develop this system - I don’t think I would get the chance. It is interesting to note that the new “CDS” from the RSC may be taking a leaf out of my book with their service who knows.

An aside:

A good friend of mine had also talked with me before leaving Liverpool about her idea of a analysis system and I can see correlations between the two approaches. Where I had focused only on single crystal and powder information in the “nexus of information” they had thought about other analysis techniques such as IR/Raman, etc. This concept could also be brought into the system so that IR/Raman and other pattern matching techniques could be hosted and disseminated just like the structural data adding a whole new dimension to the concept.

An Update

I’ve been thinking about what to do with all the structures I have done over the years that have never been published. I was just going to submit them on here, COD, and CSD. BUT then after reading the CSD guidelines it struck me I had completely forgotten about the publication of structural data in a database could jeopardize the publication of the same structure in a journal. Interesting dilemma but one that would be solved by the system I describe above. I guess the issue is only relevant if the paper is just about reporting the structure if not then it should not be too much of an issue.