BACKGROUND

(Note: I’ve managed to lose some of the pictures from this post - I will update and replace them in due course)

A system was required to deposit structural data in a format which could be readily accessed by users of the single crystal X-ray service at UMIST. The data had to be protected and available to every user via software found on any computer running the most common operating systems, e.g. Linux, Mac-OS, NT, Windows 95, 98, and 2000.

After looking at various sources for crystallographic software, e.g. sincris, it was decided to develop an in house software package to handle this problem.

Why?

  • Commercial software solutions found for example pre-quest were out of the X-ray departments yearly budget,

  • No free/shareware software could readily be found at that time.

It was also decided that an internet interface should be used to distribute the data as all the operating systems have a web browser and therefore a universal system could readily be developed. The internet was the ideal medium for delivering this system.

The program Depo (data _depo_siting software) was developed with the initial idea to make it compatible with a data server hosted through a website interface which could be run using any common web server e.g., Apache. The cross compatibility was important so that even Microsoft personal webserver (available with Win98) and IIS server available with Win2000 professional could be used to distribute the data if necessary.

DESIGN BRIEF

In order to develop the source code a design brief needed to be constructed. This design brief would set the initial goals of the software being written.

The initial model followed the manual movement of files into a logical folder hierarchy via a networked connection initially using mapped network drives.

old_filemenuEach single crystal structure was defined by an unique six digit code; A three letter prefix from the initials of the supervisor of the group submitting the structure followed by a two digit number defined by the number of structures submitted and finally a single letter suffix to define the year of submission, a = 2000, b = 2001, c = 2002, etc.

The program had also to go further than the manual movement of files which was already being carried out in department (although slow and hard to teach to new users) in the construction of an individual html file (for use as a webpage) containing important information on the crystal structure.

Finally the program would also have to generate a database file by extracting information from the CIF, this information would also be added to the individual webpage. The design brief was simplified into three objectives of the software. Follow the links to see how each objective was achieved in the application of the design brief:

OBJECTIVES

  • Data Management

  • Presentation of Data

  • Database

Data Management

INITIAL TRANSFER PROTOCOL

The initial program had the file paths stored within the source code and the program was called from a local PC using a batch file which linked to a remote server. Users had to define themselves via an individual username to provide the correct path connections for their machine. This was due to different machines having different mapped drives and devices therefore different drive letters. The file structure maintained the same system used when manual movement of data was employed.

This approach was basic and meant the source code needed recompiling every time a new user or machine was added to the Depo system and although the year code suffix was used data from all years where stored in the same root directory under the supervisor initials.

FILE MANAGEMENT

This was amended in the current version of Depo with a massive rewrite of the source code and a new approach to the file management and transfer protocols. Now all data would be split into those files just used for publication (www) and those files used for structure solution and processing (data).data_www

This approach meant that archiving of data could be done much more efficiently by allowing the data part of the server which is memory intensive to be copied then removed whilst maintaining the www information which would always be online via the webserver and considerably smaller than the data part. Also archiving thewww part of the server would be faster as only the essential files would be present.

This split in data handling meant that a further step could be installed into the file transfer, this was the compression of the raw data and processing files into a single zip file further reducing memory usage.

treeThe data management was further improved by taking advantage of the redundant year suffix and adding a new layer into the file hierarchy. This reduced clutter of data and increased ease of access.

By using this extra layer in both the data and www parts of the server older years could be removed preferentially when archiving and aided in backing-up and storage management.

The existing source code will allow up to 26 years worth of data to be stored using single letter suffixes by which point I would hope someone else has come up with something better!

PATH MANAGEMENT

With the increased complexity in file structure and the need to alter the source code to implement it further alterations were made to the file management part of the program by modifying the path handling protocols. The use of paths written into the source code was as mentioned above most undesirable.

Depo would still be hosted on a remote server along with all the files needed to compress the x-ray data and database files but when Depo was run on a new system it would look for and create if not present a temporary folder to increase file maniplication speeds and a configuration file. The configuration file contains a set of default paths and instructions when created and the user alters them for their machine. This removed user IDs and the need to recompile Depo for new users and machines.

<code>*** Please delete this line. If these values do no work please edit depo.conf and change the default paths ***
 edit this to just the path Line 1: Default path to the server data :g:\data
 edit this to just the path Line 2: Default path to the server www :g:\www
 edit this to just the path Line 3: Default path to the database :g:\info\database.txt
 edit this to just the path Line 4: Default path to ortep.exe :c:\wingx\ortep32\ortep32.exe</code>

Furthermore this new system would also allow UNC format paths and removed the need to map network drives:

\\xray\......\data \\xray\.......\www \\xray\......\database.txt c:\wingx\ortep32\ortep32.exe

Presentation of Data

HTML PAGE

The program automatically extracts important crystallographic information from a standard archive CIF generated from WINGX and writes it into a database and custom webpage for each crystal structure. If an archive CIF is not present the program will request another suitable file and prompt the user to fill in any fields not successfully retrieved.

FILE NAMING AND CHIME

The program also create links within the webpage to the CIF file, a rtf format ciftable file extracted from the CIF tables using CIFTAB and two XYZ files using the unique structure code to name the files. The program attempts to check the existence of the files being linked to within the html document and as a safeguard if no XYZ files are found the program will invoke ortep and load the current CIF file (or any file) so that the user could generate XYZ files. XYZ files can be associated with the MDLplug-in CHIME and in a browser enhanced with CHIME a user can readily extract bond lengths, angles and other important statistic directly from the website.

The program can also accept a comment from the crystallographer and this is written into the document along with a link to a subdirectory created if the picture file option is activated by the user.

INVOICE SUBROUTINE

Example InvoiceA further feature is the creation of an invoice file for each structure, this is loaded into WordPad by Depo from which the user can print and e-mail the invoice to the person who submitted the crystals for analysis to notify them of completion. The invoice can also be reopened and reprinted at a later date.

The importance of presenting the data spread from the Depo output into the webserver to provide a more user friendly and informative service.

Database

SIMPLE DELIMITED

Originally, Depo output the important crystallographic information extracted from the CIF file into a simple semicolon delimited text file and a custom search program was to be written. However, as the text file can be read into database packages such as MS access and lotus approach or into spreadsheet packages such as MS excel or Lotus 123 and search capability being accomplished from within these programs it was seen as not a priority.

PHP AND MYSQL

With recent updates to the x-ray webserver a new method of access to the Depo database was developed using MySQL and php custom database scripts.

With little modification to the data in the Depo database it could be loaded into a custom MySQL database this allowed manipulation of the database either via a php webpage or directly via an authorized mySQL user.

A reduced capability trial version of the search engine and database interface are also available via the database link on the webpage.

Database Maintenance

A description of the database system for depo was given previously, here only discussion of the management of the database will be given via the groups private website.

Both the administrative and group functions allow manipulation of the database and are only accessible through the private group area. On selecting administration a list of all structures currently in the database is generated. Below this list is a manual data entry form allowing data to be added manually without submitting a structure via depo.

From the automatically generated list each structure can be viewed/updated or deleted. On clicking on a structure code the data is called up for viewing or editing.

These functions are mirrored in the group feature however, without the ability to delete an entry from the database. Access to the data is done by supervisor code from a drop down menu.

Note: The current version of the code operates with the debug active which will create a debug.txt file when run. The program was compiled for win32 based OSs (I know it was before I really got into Linux!) Use at your own risk! Download link coming soon.