WO1991016682A1 - A method of structuring or storing data within a file - Google Patents

A method of structuring or storing data within a file Download PDF

Info

Publication number
WO1991016682A1
WO1991016682A1 PCT/GB1991/000666 GB9100666W WO9116682A1 WO 1991016682 A1 WO1991016682 A1 WO 1991016682A1 GB 9100666 W GB9100666 W GB 9100666W WO 9116682 A1 WO9116682 A1 WO 9116682A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
file
structuring
name
preceded
Prior art date
Application number
PCT/GB1991/000666
Other languages
French (fr)
Inventor
Sydney Reading Hall
Original Assignee
International Union Of Crystallography
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Union Of Crystallography filed Critical International Union Of Crystallography
Publication of WO1991016682A1 publication Critical patent/WO1991016682A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor

Definitions

  • a method of structuring or storing datawithin a file is a method of structuring or storing datawithin a file.
  • the present invention relates to handling data and more particularly to a method of structuring or storing text data within a file and to a file containing such data.
  • a method of structuring or storing data within a file comprising the following steps:
  • the nature of the file is preferably such that it is visually readable as text in addition to being machine readable.
  • Each text line contains up to a pre-set maximum number of visible ascii characters. The limit will normally be set at eighty.
  • Each data item may be directly preceded by the respective data name. Alternatively, a plurality of data names in a group may be followed by a like plurality of data items repeated a desired number of times.
  • the first common feature may be the text string 'data_', and the members of the first set are of the form 'data_blockcode' where 'blockcode' is a unique block code in each case.
  • the second common feature may be just an underline '_', and the members of the second set are of the form '_name' where 'name' is a respective data name.
  • the data handled may relate to any desired subject, but the method is especially suitable for crystallographic data. Another suitable use is for chemical data
  • the method is especially suitable for the archiving of data and for inputting data to data-bases because of its facility for upwards compatibility and flexibility.
  • the method is also particularly advantageous for the electronic transport of text and data, via computer networks or magnetic media. It is particularly well- suited for submitting publications to technical journals.
  • each data item being stored in the file as a text, character or numerical quantity is uniquely identified by a text name.
  • the text name serves as an identifier which can be interpreted visually as well as by machine.
  • the data items may appear in any order.
  • the method relates to a process for handling text data; although it is primarily intended for computer application it does not itself relate to a computer program. Nor does it relate to a method of presenting information because its format of presentation is arbitrary. According to a second aspect of the present invention there is provided means for structuring or storing data within a file comprising:
  • a data file comprising a plurality of data blocks, each preceded by a respective data block code, and, within each block, a plurality of data items each preceded by a respective data name, wherein the data block codes are taken from a first predetermined set, may occur in any order, and have a first common feature, and wherein the data names are taken from a second predetermined set, may occur in any order, and have a second common feature, the first and second common features being readily distinguishable.
  • a fourth aspect of the present invention there is provided a method of retrieving data from a file of the above type, comprising listing the requested data items and outputting the requested data items in the order requested, the output file having the same format as the accessed file.
  • the BCCAB archive file is used by the Cambridge Data Centre (U.K) to prepare the packed crystallographic organic structural data base file ASER.
  • Appendix 1 is an extract from one entry of the BCCAB file.
  • the format is "free” in the sense that many lines have an identifying code (e.g. #Author) which provides flexibility in the order of lines, and for optional line input. Certain data items are "free” in that they are separated either by a single blank or comma. However, all line identifying codes, and many data sequences, are predefined and have a fixed function within the BCCAB definitions. Software processing this format expects predefined protocols to be observed. Violations of this protocol, or the presence of foreign data, will out of necessity be treated as a processing error and terminate data access.
  • the second example of a "pre-defined free format” file is that used by the XTAL3.0 Crystallographic Program System (Hall & Stewart, 1990), as shown in Appendix 2. It is classed as a "free format” file because every line, and many individual data items, are tagged with an identification code. This provides for variations in the order of line input but only within strict guidelines.
  • the program initiation lines (those with the line codes in upper case letters) may be in order but the optional control lines (codes in lower case letters) are specific to a particular program.
  • Data items, and data codes are also specific to a line. Violation of an input rule will terminate data processing of this file. These types of restrictions are typical of those placed on many "predefined free format" files.
  • STAR Self-defining Text Archive and Retrieval
  • This file contains standard ascii text which defines both the data structure (i.e. the arrangement of the data) and the data items. Each data item is explicitly identified by a name and these may be stored in any order. Simple syntactical rules applied to the data names provide access to each data item in a STAR file. No other knowledge of the data items is required.
  • a STAR file is normal text data that can be edited and read with a text editor. Its contents are intelligible as text and can be stored or transmitted electronically without conversion.
  • the structure of a STAR file is simple. Each file is divided into a sequence of data blocks which contain individual data items. The identity of each data item is determined by a preceding data name. It is possible to repeat data items by placing them within simple looping structures.
  • a STAR file can be defined by only a few simple rules. This ensures maximum flexibility in data storage and its widest possible applicability. No assumptions are made about the order of the data blocks or data items, other than the requirement that identifying names be unique. There are no rules regarding the placement of data names or data items within a data block, other than the requirement that the name must precede the item. Access to data in a STAR file is made simply by requesting a specific data name within a specific data block. No prior knowledge is needed about either the data type, whether the item is looped, or whether an item exists in the file. As an introduction to STAR file concepts, here are some examples of data syntax. A data block is identified by a unique string with the construction 'data_blockcode'.
  • a data item is identified by a unique data name which starts with an underline '_'.
  • a data item may be repeated individually or in a group. These are referred to as looped data items and are specified with a 'loop_'string. Here is an example of looped data items.
  • a STAR file is a formatted sequential file containing text lines of standard visible ascii characters. It may be viewed or edited with any standard text editor.
  • a STAR file is divided into any number of sequential data blocks. The information within a data block defines the data structure (i.e. the data order), and the data items. All of this information is intelligible as text.
  • the "save frame” command will now be described.
  • the principal purpose of the "save frame” command is to define a block of data items that can be internally referenced within a data block via a single code.
  • This code is the "save frame” code which is used within the data block as a character string preceded by a "$" character.
  • the save frame command enables data definitions to be repeated within a data block, and yet these definitions are insulated from one another.
  • a save frame definition may precede or follow its reference as a $ ⁇ frame-code>.
  • Frame codes may be also referenced within other save frames. Recursive references to save frames are not perm it t ed. The following nine syntax rules provide the specifications for a STAR file.
  • a text string is defined as either a sequence of non-blank characters, a sequence of characters bounded by matching single or double quotes (i.e. ⁇ '> or ⁇ ">), or a sequence of lines bounded by a semicolon ⁇ ;> as the first character of a line.
  • a text string must not span more than one line, except if bounded by semicolons.
  • a data name is a text string starting with an underline'_'.
  • a data item is a text string not starting with an underline '_', and preceded by the identifying data name.
  • a data loop i ⁇ a list of data names, followed by a repeated list of data items, and preceded by the text string 'loop_'.
  • a save frame is a sequence of data names, data items and data loops preceded by the text string 'save_ framecode' where 'framecode' is a unique identifying code within a data block.
  • a save frame sequence is closed by another save frame command, by the text string 'stop_' or by a data block command.
  • a data block is a sequence of data names, data items, data loops and save frames preceded by the text string 'data_blockcode' where 'blockcode' is a unique identifying code within a STAR file.
  • the data block sequence is closed by another data block command or the en d o f t h e S T A R f i l e .
  • a data name must be unique within each save frame sequence and a data block sequence.
  • a save frame declaration must be unique within a data block sequence.
  • the save frame code may be referred to within a data block as the data item '$framecode'.
  • the key to accessing a STAR file is the data name. It is essential that the data names needed for a given application be defined carefully and precisely in a distributed Glossary. Data names and their definitions must not be changed in the lifetime of the archive file, but new names and definitions may be added as needed. A glossary does not restrict the data that can be stored in a STAR file; it is only to provide information about data items in general use.
  • One application of the STAR file is as a basis for a
  • Crystallographic Information File This application will be used to illustrate the STAR file concepts.
  • a data item is assumed to be of type number if it is not bound by matching single or double quotes, and starts with digit 0-9, a plus '+ ' , a minus '-', or a period '.'.
  • a number may be in integer, real or scientific format. If a number is concatenated with another number bounded by parentheses, it is taken to be the standard deviation [e.g. nn.nnn(m)].
  • a data item is assumed to be of type text if it extends over more than one line.
  • a data item is assumed to be of type character if it is surrounded by matching single and double quotes and is not either of type number or type text.
  • Appendix 4 shows an example of a CIF file containing two data blocks 'manuscript' and 'crystal-structure'.
  • Data is retrieved from a STAR file by locating its data name. This would normally be done by 'parsing' the file and locating a request list of data names.
  • Existing software called QUASAR uses this approach to access a STAR file. Data items and data blocks are output by QUASAR in the order requested. The QUASAR output file is also in STAR format. For a given data block the same data item may be requested up to 5 times. The STAR file is always checked for logical integrity.
  • the names of the archive file (i.e the input STAR file) and output file are specified as the strings 'star_arc' and 'star_out', respectively. These are entered at the start of the requested list. In the example request list shown in Appendix 5 these files names are 'qtest.arc' and 'qtest.out'.
  • Appendix 6A and 6B shows the file 'qtest.out' which is output after entering the request list of Appendix 5.
  • the output is itself a STAR file that can also be processed by a request list. Note that requested items missing from the archive file are flagged with '??'.
  • Appendix 7A and 7B shows examples of save frame commands relating to a standard molecular data format.
  • the above-described file formats and the associated method of handling data have the advantage of generality, upwards compatibility and flexibility.
  • the file is machine-independent and portable so that data items are accessible quite independently of their point of origin . It is fundamental that the file allows for future data to be incorporated without the need to modify existing files.
  • the STAR file format meets the requirements of a "universal" archival file. It may be used for archiving all types of text and numerical data, in any order. It is particularly suited to electronic transmission purposes.

Abstract

A method of structuring or storing data within a file has the following steps: (i) arranging the file into a plurality of data blocks each preceded by a respective data block code; and (ii) arranging the data within each block into a plurality of data items each preceded by a respective data name; wherein the data block codes are taken from a first predetermined set, may occur in any order, and have a first common feature, and wherein the data names are taken from a second predetermined set, may occur in any order, and have a second common feature, the first and second common features being readily distinguishable. The file is visually readable as text in addition to being machine readable.

Description

A method of structuring or storing datawithin a file.
The present invention relates to handling data and more particularly to a method of structuring or storing text data within a file and to a file containing such data.
Many existing procedures for computer archiving use a 'fixed format' file in which the data structure is determined by specific data requirements. A fixed format file is simple and fast to access but the data structure cannot be modified without reformatting existing files.
Other archival files are based on 'pre-defined free formats'. This approach does not restrict data to specific positions in the file. Data 'keys' are often used to aid in data recognition and this permits fewer restrictions on the ordering of data lines and items. This is an important advantage over the fixed format files. Access to free format files currently in use still requires some advance knowledge of the expected data types and the data structure. The addition of any new data types or structures also requires that processing software be modified. This means that existing data processing software must be altered to provide common access to files which pre- and post-date the file changes. The term 'free format' is therefore misleading because it really refers to an improved flexibility within a relatively restricted data structure.
The inflexibility of the two traditional archival approaches described above restricts the exchange of data, even within the same discipline, especially if the number and nature of data types changes rapidly and continually. This is the case in many data processing fields and as a result a vast repertoire of specialized and 'local' file formats has evolved over the years. A diversity of file formats is tolerable when electronic data transfer is infrequent and processing speeds require that file formats be finely tuned to specific applications. Rapid increases in computing power and in computer networks have signalled an end to this rationale. In the era of widespread data exchange, global data bases, electronic mail and electronic publication submission, the critical need is for a general, flexible and extensible file format. The present invention seeks to provide an improved file format and an associated method of handling data which overcome one or more of the above problems.
According to a first aspect of the present invention there is provided a method of structuring or storing data within a file comprising the following steps:
(i) arranging the file into a plurality of data blocks each preceded by a respective data block code; and
(ii) arranging the data within each block into a plurality of data items each preceded by a respective data name; wherein the data block codes are taken from a first predetermined set, may occur in any order, and have a first common feature, and wherein the data names are taken from a second predetermined set, may occur in any order, and have a second common feature, the first and second common features being readily distinguishable.
The nature of the file is preferably such that it is visually readable as text in addition to being machine readable. Each text line contains up to a pre-set maximum number of visible ascii characters. The limit will normally be set at eighty. Each data item may be directly preceded by the respective data name. Alternatively, a plurality of data names in a group may be followed by a like plurality of data items repeated a desired number of times.
The first common feature may be the text string 'data_', and the members of the first set are of the form 'data_blockcode' where 'blockcode' is a unique block code in each case. The second common feature may be just an underline '_', and the members of the second set are of the form '_name' where 'name' is a respective data name.
The data handled may relate to any desired subject, but the method is especially suitable for crystallographic data. Another suitable use is for chemical data
(especially molecular data) in the chemical, and pharmaceutical fields.
The method is especially suitable for the archiving of data and for inputting data to data-bases because of its facility for upwards compatibility and flexibility.
The method is also particularly advantageous for the electronic transport of text and data, via computer networks or magnetic media. It is particularly well- suited for submitting publications to technical journals.
An advantage of the method is that each data item being stored in the file as a text, character or numerical quantity is uniquely identified by a text name. Thus the text name serves as an identifier which can be interpreted visually as well as by machine. Furthermore, the data items may appear in any order.
It will be seen that the method relates to a process for handling text data; although it is primarily intended for computer application it does not itself relate to a computer program. Nor does it relate to a method of presenting information because its format of presentation is arbitrary. According to a second aspect of the present invention there is provided means for structuring or storing data within a file comprising:
(i) means for arranging the file into a plurality of data blocks each preceded by a respective data block code; and
(ii) means for arranging the data within each block into a plurality of data items each preceded by a respective data name; wherein the data block codes are taken from a first predetermined set, may occur in any order, and have a first common feature , and wherein the data names are taken from a second predetermined set, may occur in any order, and have a second common feature, the first and second common features being readily distinguishable. According to a third aspect of the invention there is provided a data file comprising a plurality of data blocks, each preceded by a respective data block code, and, within each block, a plurality of data items each preceded by a respective data name, wherein the data block codes are taken from a first predetermined set, may occur in any order, and have a first common feature, and wherein the data names are taken from a second predetermined set, may occur in any order, and have a second common feature, the first and second common features being readily distinguishable.
The many types of information to which the file is well suited include crystallographic data. According to a fourth aspect of the present invention there is provided a method of retrieving data from a file of the above type, comprising listing the requested data items and outputting the requested data items in the order requested, the output file having the same format as the accessed file.
A preferred embodiment of the present invention will now be described, by way of example. First of all it will be of assistance to review three existing "pre-defined free format" files.
The BCCAB archive file is used by the Cambridge Data Centre (U.K) to prepare the packed crystallographic organic structural data base file ASER. In Appendix 1 is an extract from one entry of the BCCAB file. The format is "free" in the sense that many lines have an identifying code (e.g. #Author) which provides flexibility in the order of lines, and for optional line input. Certain data items are "free" in that they are separated either by a single blank or comma. However, all line identifying codes, and many data sequences, are predefined and have a fixed function within the BCCAB definitions. Software processing this format expects predefined protocols to be observed. Violations of this protocol, or the presence of foreign data, will out of necessity be treated as a processing error and terminate data access.
The second example of a "pre-defined free format" file is that used by the XTAL3.0 Crystallographic Program System (Hall & Stewart, 1990), as shown in Appendix 2. It is classed as a "free format" file because every line, and many individual data items, are tagged with an identification code. This provides for variations in the order of line input but only within strict guidelines. In this file the program initiation lines (those with the line codes in upper case letters) may be in order but the optional control lines (codes in lower case letters) are specific to a particular program. Data items, and data codes, are also specific to a line. Violation of an input rule will terminate data processing of this file. These types of restrictions are typical of those placed on many "predefined free format" files. The last example of a "pre-defined free format" file is the Standard Crystallographic File Structure (Brown, 1988) as shown in Appendix 3. This is an archival file structure which is more restrictive than the previous two examples. There is some flexibility in the order data sequence (note the end-of-sequence code *EOS) but the data items and the character positions within a sequence are fixed. The addition of extra date types to a SCFS file is almost impossible without invalidating the format of previously archived data.
Turning now to the present invention, a Self-defining Text Archive and Retrieval (STAR) file, is proposed especially for the computer archiving and electronic transmission of text and numerical data. This file contains standard ascii text which defines both the data structure (i.e. the arrangement of the data) and the data items. Each data item is explicitly identified by a name and these may be stored in any order. Simple syntactical rules applied to the data names provide access to each data item in a STAR file. No other knowledge of the data items is required.
A STAR file is normal text data that can be edited and read with a text editor. Its contents are intelligible as text and can be stored or transmitted electronically without conversion. The structure of a STAR file is simple. Each file is divided into a sequence of data blocks which contain individual data items. The identity of each data item is determined by a preceding data name. It is possible to repeat data items by placing them within simple looping structures.
It should be noted that a STAR file can be defined by only a few simple rules. This ensures maximum flexibility in data storage and its widest possible applicability. No assumptions are made about the order of the data blocks or data items, other than the requirement that identifying names be unique. There are no rules regarding the placement of data names or data items within a data block, other than the requirement that the name must precede the item. Access to data in a STAR file is made simply by requesting a specific data name within a specific data block. No prior knowledge is needed about either the data type, whether the item is looped, or whether an item exists in the file. As an introduction to STAR file concepts, here are some examples of data syntax. A data block is identified by a unique string with the construction 'data_blockcode'.
An example follows. data_crystal_structure
A data item is identified by a unique data name which starts with an underline '_'. Three examples of data names followed by their associated data items, follow.
_cell_volume 2310 (2)
_chemical_formula 'C23 H36 07' _publication_author_address
; Prof Barry O'Connell
Department of Chemistry
University of Kalamazoo
Michigan U.S.A. ;
A data item may be repeated individually or in a group. These are referred to as looped data items and are specified with a 'loop_'string. Here is an example of looped data items. loop_
_exptl_crystal_face_h
_exptl_crystal_face_k
_exptl_crystal_face_l
_exptl_crystal_face_distance
0 0 1 0. 01 2 0 0 1 0 . 012
1 0 0 0 02 3 -1 0 0 0 . 023
A STAR file is a formatted sequential file containing text lines of standard visible ascii characters. It may be viewed or edited with any standard text editor. A STAR file is divided into any number of sequential data blocks. The information within a data block defines the data structure (i.e. the data order), and the data items. All of this information is intelligible as text.
The "save frame" command will now be described. The principal purpose of the "save frame" command is to define a block of data items that can be internally referenced within a data block via a single code. This code is the "save frame" code which is used within the data block as a character string preceded by a "$" character.
The save frame command enables data definitions to be repeated within a data block, and yet these definitions are insulated from one another. A save frame definition may precede or follow its reference as a $<frame-code>.
Frame codes may be also referenced within other save frames. Recursive references to save frames are not perm it t ed. The following nine syntax rules provide the specifications for a STAR file.
1. A text string is defined as either a sequence of non-blank characters, a sequence of characters bounded by matching single or double quotes (i.e. <'> or <">), or a sequence of lines bounded by a semicolon <;> as the first character of a line. A text string must not span more than one line, except if bounded by semicolons.
2. A data name is a text string starting with an underline'_'.
2 . A data item is a text string not starting with an underline '_', and preceded by the identifying data name.
4. A data loop iε a list of data names, followed by a repeated list of data items, and preceded by the text string 'loop_'. 5. A save frame is a sequence of data names, data items and data loops preceded by the text string 'save_ framecode' where 'framecode' is a unique identifying code within a data block. A save frame sequence is closed by another save frame command, by the text string 'stop_' or by a data block command.
6. A data block is a sequence of data names, data items, data loops and save frames preceded by the text string 'data_blockcode' where 'blockcode' is a unique identifying code within a STAR file. The data block sequence is closed by another data block command or the en d o f t h e S T A R f i l e . 7. A data name must be unique within each save frame sequence and a data block sequence. A save frame declaration must be unique within a data block sequence. The save frame code may be referred to within a data block as the data item '$framecode'.
8. Except if contained within a text string, a sequence of blank or tab characters is used only to separate text strings. 9. Except if contained within a text string, a single sharp '#' signals that the characters following on a line are used for comment only.
The key to accessing a STAR file is the data name. It is essential that the data names needed for a given application be defined carefully and precisely in a distributed Glossary. Data names and their definitions must not be changed in the lifetime of the archive file, but new names and definitions may be added as needed. A glossary does not restrict the data that can be stored in a STAR file; it is only to provide information about data items in general use.
One application of the STAR file is as a basis for a
Crystallographic Information File (CIF). This application will be used to illustrate the STAR file concepts.
Since the CIF is intended only for crystallographic data and text, this application has imposed some formatting constraints, other than those of the STAR syntax, which simplify data handling but do not inhibit flexibility. These constraints involve certain data typing and the text string limitations which may be of use in other scientific applications and are cited here. 1. Lines may not exceed 80 characters in length.
2. Data names and block codes may not exceed 32 characters in length. 3. A data item is assumed to be of type number if it is not bound by matching single or double quotes, and starts with digit 0-9, a plus '+ ' , a minus '-', or a period '.'. A number may be in integer, real or scientific format. If a number is concatenated with another number bounded by parentheses, it is taken to be the standard deviation [e.g. nn.nnn(m)]. 4. A data item is assumed to be of type text if it extends over more than one line.
5. A data item is assumed to be of type character if it is surrounded by matching single and double quotes and is not either of type number or type text.
6. Only one level of loop_ data is permitted. Additional levels of repeated data must be stored as lists within a single text string.
Appendix 4 shows an example of a CIF file containing two data blocks 'manuscript' and 'crystal-structure'. Data is retrieved from a STAR file by locating its data name. This would normally be done by 'parsing' the file and locating a request list of data names. Existing software called QUASAR uses this approach to access a STAR file. Data items and data blocks are output by QUASAR in the order requested. The QUASAR output file is also in STAR format. For a given data block the same data item may be requested up to 5 times. The STAR file is always checked for logical integrity. The names of the archive file (i.e the input STAR file) and output file are specified as the strings 'star_arc' and 'star_out', respectively. These are entered at the start of the requested list. In the example request list shown in Appendix 5 these files names are 'qtest.arc' and 'qtest.out'.
Appendix 6A and 6B shows the file 'qtest.out' which is output after entering the request list of Appendix 5. The output is itself a STAR file that can also be processed by a request list. Note that requested items missing from the archive file are flagged with '??'. Appendix 7A and 7B shows examples of save frame commands relating to a standard molecular data format.
The above-described file formats and the associated method of handling data have the advantage of generality, upwards compatibility and flexibility. The file is machine-independent and portable so that data items are accessible quite independently of their point of origin . It is fundamental that the file allows for future data to be incorporated without the need to modify existing files.
The STAR file format meets the requirements of a "universal" archival file. It may be used for archiving all types of text and numerical data, in any order. It is particularly suited to electronic transmission purposes.
The advantages of upwards compatibility and flexibility are two very desirable properties for any new archive system. These properties are especially important for fields, such as crystallography, where there is a wide diversity of data types, and where the archival requirements may vary from site to site. It is essential that data files written in one laboratory can be read easily in another, independent of the software on which it was generated. It is also important that these files can be easily "viewed" without the need for sophisticated archival software.
Also important for the long term is the flexibility and the eye-readable nature of the STAR format. Because a CIF may contain "local" as well as "global" data items, it is ideal for internal as well as external data communication purposes. Existing program systems, such as XTAL, currently use self-defining binary files internally because these are faster and more compact than character files. As computer technology improves the value of a flexible, eye-readable, and easily editable, character format outweighs speed and disc considerations.
If parts of a file are lost, e.g. during electronic communication, the whole file is not corrupted; thus the file format has the advantage of being robust. With a data-base such loss of characters might cause corruption.
Figure imgf000018_0001
Figure imgf000019_0001
Figure imgf000020_0001
Figure imgf000021_0001
APPENDIX 5
star_arc_qtest.arc
star_out_qtest.out
data_manuscript
_manuscript_summary data_crystal_structure
_chemical_name
_publication_title
_publication_author_name
_publication_author_address
_cell_a
_cell_b
_cell_c
_cell_alpha
_cell_beta
_cell_gamma
_chemical_name
_symmetry_space_group
_symmetry_pos_in_XYZ
_atom_site_label
_atom_site_x/a
_atom_site_y/b
_atom_site_z/c
_atom_site_U_iso
_atom_site_label
_exptl_radiation_wave_length
_exptl_radiation_type
_exptl_crystal_face_distance
_exptl_dummy
_exptl_crystal_face_h
_exptl_crystal_face_k
_exptl_crystal_face_l
_atom_site_label
_atom_site_U_iso
_publication_author_name
data_manuscript
_manuscript_summaay APPENDIX 6A
data_manuscript
_manuscript_sultimary
;
This is some dummy text to show how a multiple data-block STAR ; file works !
# -----end-of-data-block------ data_crystal_structure
_chemical_name
;
3-(2,5-dihydro-4-hydroxy-5-oxo-3-phenyl-2-furyl)propionic acid ;
_publication_title
;
Structure of WF-3681,
3-(2,5-Dihydro-4-hydroxy-5-oxo-3-phenyl-2-furyl)propionic Acid. ; loop_
_publication_author_name
_publication_author_address
"O'Connell- Barry"
; Department of Chemistry
University of Kalamazoo
Michigan U.S.A.
'Clark, Joan I.'
; University of Washington
Seattle WA 98195
U.S.A.
;
_cell_a 18.757(8)
_cellb_b 7.282(2)
_cell_c 17.511(8)
_cell_alpha 90
_cell_beta 91.20(3)
_cell_gamma 90 _chemical_name
;
3- (2, 5-dihydro-4-hydroxy-5-oxo-3-phenyl-2-furyl)propionic acid ;
_symmetry_space_group '-C 2yc'
loop_
_symm etry_pos_in_xyz
x,y,z'
-x,-y,-z'
-x,y,1/2-z'
x,-y,1/2+z'
1/2+x,1/2+y,z'
1/2-x,1/2-y,-z'
1/2-x,1/2+y,1/2-z'
1/2+x,1/2-y,1/2+z'
APPENDIX 6B
loop_
_atom_site_label
_atom_site_x/a
_atom_site_y/b
_atom_site_z/c
_atom_site_u_iso
_atom_site_label
C1 .6237(1) -.2055(4) -.3119(2) .053 C1
C2 .6022(2) -.2468(6) -.2322(2) .059 C2
O5' .7504(1) .0454(3) .0417(1) .056 O5'
_exptl_radiation_wave_length 1.54179
_exptl_radiation_type ?? # requested item not present loop_
_exptl_crystal_face_distance
_exptl_dummy # ?? requested item not present
_exptl_crystal_face_h
_exptl_crystal_face_k
_exptl_crystal_face_l
0.012 ?? 0 0 -1
0.012 ?? 0 0 1
0.023 ?? 1 0 0
0.023 ?? -1 0 0
loop_
_atom_site_label
_atom_site_u_iso
C1 .053
C2 .059
O5' .056
loop_
_publication_author_name
"O'Connell, Barry"
'Clark, Joan I.'
# ------ end-of-data-block----- data_manuscript
_manuscript_summary
;
This is some dummy text to show how a multiple data-block STAR file works ! ;
# -----end-of-data-block------ APPENDIX 7A data_SMD_Example_3
#---------------------------------
_table_of_contents
This example illustrates the description of a simple chemical reaction in which one of the reactants and the product are expressed as generic structures.
;
_atom_bond_order_convention simple
save_methyl
loop_
_atom__identity_node
_atom_identity_symbol 1 C 2 C
loop_
_atom_bond_node_1
_atom_bond_node_2
_atom_bond_order 1 2 sin
loop_
_attached_hydrogen_node
_attached hydrogen_count 1 3
save_ethyl
loop_
_atom_identity_node
_atom_identity_symbol 1 C 2 C 3 C
loop_
_atom_bond_node_1
_atom_bond_node_2
_atom_bond_order 1 2 sin 1 3 sin
loop_
_attached_hyάjogen_node
_attached hydrogen_count 1 3 2 3
save_R1
loop_
_variable_alternative_number
_variable_identifier_symbol
_variable_node 1 $methyl 1 2 $ethyl 1 save_carboxylic_acid
loop_
_atom_identity_node
_atom_identity_symbol 1 $R1 2 C 3 0 4 0
loop_
_atom_bond_node_1
_atom_bond_node_2
_atom_bond_order 1@1 2 sin 2 3 dou 2 4 sin loop_
_attached_hydrogen_node
_attached hydrogen_count 2 0 3 0 4 1 APPENDIX 7B
save_alcohol
loop_
_atom_identity_node
_atom_identity_symbol 1 C 2 C 3 O
loop_
_atom_bond_node_1
_atom_bond_node_2
_atom_bond_order 1 2 sin 2 3
loop_
_attached_hydrogen_node
_attached hydrogen_count 1 3 2 2 3 1 save_ester
loop_
_atom_identity_node
_atom_identity_symbol 1 $R1 2 C 3 O 4 O 5 C 6 C
loop_
_atom_bond_node_1
_atom_bond_node_2
_atom_bond_order 1@1 2 sin 2 3 dou 2 4 sin 4 5 sin 5 6 sin loop_
_attached_hydrogen_node
_attached hydrogen_count 2 0 3 0 4 0 5 2 6 3
stop_
loop_
_reaction_component_number
reaction_component_symbol
reaction_component_type
1 Scarboxylic_acid reactant
2 Salcohol reactant
3 $ester product
loop_
_reaction_pathway_reactant
_reaction_pathway_product
1.1 .1
1.2 .2
1.3 .3
1.4, 2.3 .4
2.1 .6
2.2 .5

Claims

Claims
1. A method of structuring or storing data within a file comprising the following steps:
(i) arranging the file into a plurality of data blocks each preceded by a respective data block code; and
(ii) arranging the data within each block into a plurality of data items each preceded by a respective data name; wherein the data block codes are taken from a first predetermined set, may occur in any order, and have a first common feature, and wherein the data names are taken from a second predetermined set, may occur in any order, and have a second common feature, the first and second common features being readily distinguishable.
2. A method of structuring or storing data within a file according to claim 1, wherein the file is readable as text in addition to being machine readable.
3. A method of structuring or storing data within a file according to claim 2, wherein each text line contains up to a pre-set maximum number of visible ascii characters.
4. A method of structuring or storing data within a file according to any preceding claim, wherein each data item is directly preceded by the respective data name.
5. A method of structuring or storing data within a file according to any of claims 1 to 3, wherein a plurality of data names in a group are followed by a like plurality of data items repeated a desired number of times.
6. A method of structuring or storing data within a file according to any preceding claim , wherein the first common feature is the text string 'data_', and the members of the first set are of the form 'data_blockcode' where 'blockcode' is a unique block code in each case.
7. A method of structuring or storing data within a file according to any preceding claim wherein the second common feature is an underline '_', and the members of the second set are of the form '_name' where 'name' is a respective data name.
8. A method of structuring or storing data within a file according to any preceding claim, wherein the data handled is crystallographic data.
9. Means for structuring or storing data within a file comprising: (i) means for arranging the file into a plurality of data blocks each preceded by a respective data block code; and
(ii) means for arranging the data within each block into a plurality of data items each preceded by a respective data name; wherein the data block codes are taken from a first predetermined set, may occur in any order, and have a first common feature, and wherein the data names are taken from a second predetermined set, may occur in any order, and have a second common feature, the first and second common features being readily distinguishable.
10. A data file comprising a plurality of data blocks, each preceded by a respective data block code, and, within each block, a plurality of data items each preceded by a respective data name, wherein the data block codes are taken from a first predetermined set, may occur in any order, and have a first common feature, and wherein the data names are taken from a second predetermined set, may occur in any order, and have a second common feature, the first and second common features being readily distinguishable.
11. A data file according to claim 10 which is readable as text in addition to being machine readable.
12. A data file according to claim 10 or 11, wherein the file relates to crystallographic data.
13. A method of retrieving data from a data file according to any of claims 10 to 12, comprising listing the requested data items and outputting the requesteddata items in the order requested, the output file having the same format as the accessed data file.
PCT/GB1991/000666 1990-04-26 1991-04-26 A method of structuring or storing data within a file WO1991016682A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB9009447A GB2243467B (en) 1990-04-26 1990-04-26 Handling data
GB9009447.5 1990-04-26

Publications (1)

Publication Number Publication Date
WO1991016682A1 true WO1991016682A1 (en) 1991-10-31

Family

ID=10675075

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB1991/000666 WO1991016682A1 (en) 1990-04-26 1991-04-26 A method of structuring or storing data within a file

Country Status (5)

Country Link
EP (1) EP0526516A1 (en)
JP (1) JPH05509183A (en)
AU (1) AU7763591A (en)
GB (1) GB2243467B (en)
WO (1) WO1991016682A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1994027232A1 (en) 1993-05-12 1994-11-24 Apple Computer, Inc. Storage manager for computer system
WO1995010091A1 (en) * 1993-10-04 1995-04-13 Robert Dixon Method and apparatus for data storage and retrieval

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10177508A (en) * 1996-12-18 1998-06-30 G & G Pharma Kk Data storage structure for computer

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4878167A (en) * 1986-06-30 1989-10-31 International Business Machines Corporation Method for managing reuse of hard log space by mapping log data during state changes and discarding the log data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Compsac 87, IEEE Proceedings, Computer Software & Applications Conference, 7-9 October 1987, Tokyo, JP, M.M. Blattner et al.: "Data structure and format conversion using syntactive inference", pages 416-421 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1994027232A1 (en) 1993-05-12 1994-11-24 Apple Computer, Inc. Storage manager for computer system
US5857207A (en) * 1993-05-12 1999-01-05 Apple Computer, Inc. Storage manager for computer system
US5870764A (en) * 1993-05-12 1999-02-09 Apple Computer, Inc. Method of managing a data structure for concurrent serial and parallel revision of a work
WO1995010091A1 (en) * 1993-10-04 1995-04-13 Robert Dixon Method and apparatus for data storage and retrieval
AU695765B2 (en) * 1993-10-04 1998-08-20 Robert Dixon Method and apparatus for data storage and retrieval
US5799308A (en) * 1993-10-04 1998-08-25 Dixon; Robert Method and apparatus for data storage and retrieval

Also Published As

Publication number Publication date
EP0526516A1 (en) 1993-02-10
GB2243467B (en) 1994-03-09
AU7763591A (en) 1991-11-11
JPH05509183A (en) 1993-12-16
GB2243467A (en) 1991-10-30
GB9009447D0 (en) 1990-06-20

Similar Documents

Publication Publication Date Title
US7293006B2 (en) Computer program for storing electronic files and associated attachments in a single searchable database
US6226630B1 (en) Method and apparatus for filtering incoming information using a search engine and stored queries defining user folders
US8484236B1 (en) Method and/or system for processing data streams
US9754017B2 (en) Using anchor points in document identification
JPS61193266A (en) Information retrieving system
US20080250425A1 (en) Systems and methods for interfacing multiple types of object identifiers and object identifier readers to multiple types of applications
Harten et al. The FITS tables extension
EP1590749B1 (en) Method and system for mapping xml to n-dimensional data structure
Sayers et al. Building customized data pipelines using the entrez programming utilities (eUtils)
KR20060088473A (en) Mechanisms for transferring raw data from one data structure to another representing the same item
WO1991016682A1 (en) A method of structuring or storing data within a file
JP5420317B2 (en) Conversion parameter generation system and conversion program
WO2002059726A2 (en) Method of performing a search of a numerical document object model
Hulse The ALADDIN atomic physics database system
Hamelers et al. A full text collection of COVID-19 preprints in Europe PMC using JATS XML
Tollefson Importing and Creating Data
JPH11353316A (en) Abbreviated word supplementing device
LaPointe Jr GDP: Generalized Document Processing
JPH10301940A (en) Information processor and its method
JPS5820073B2 (en) Thesaurus construction method
Cutler et al. Input, Output, and the Web
Hall et al. Genesis of the Crystallographic Information File
Blumer The Burrows-Wheeler Transform with applications to bioinformatics
Bose et al. Interventions for post‐transplant anaemia in kidney transplant recipients
Vaghela et al. Unicode Based Multilingual Catalogue Module: A New Feature of SOUL

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AT AU BB BG BR CA CH DE DK ES FI GB HU JP KP KR LK LU MC MG MW NL NO PL RO SD SE SU US

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE BF BJ CF CG CH CM DE DK ES FR GA GB GR IT LU ML MR NL SE SN TD TG

WWE Wipo information: entry into national phase

Ref document number: 1991908220

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 1991908220

Country of ref document: EP

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

NENP Non-entry into the national phase

Ref country code: CA

WWW Wipo information: withdrawn in national office

Ref document number: 1991908220

Country of ref document: EP