AUTOMATED FILE PRUNING
TECHNICAL FIELD
The invention concerns a scheme for automated file pruning and an automated file pruning facility for use in computer systems and networks.
BACKGROUND OF THE INVENTION
A typical computer system comprises a central processing unit, memory, peripheral devices such as data input/output devices and storage media such as floppy disks or hard disks. The devices communicate with each other via a computer operating system. Computer operating systems include several different operational modules. One such module might be a file manager. Client data stored on a storage medium is organized as a file system having an associated format, and the file manager is employed to maintain the available file systems and control access to them. The file manager provides a set of system calls in the form of Application Programmer Interfaces (APIs) to allow clients access to the file system and files, to carry out operations such as file creation, deletion, opening for read or write, and so on.
The file system stores, organizes and describes client data. The client data is stored in files. The files are arranged in each file system in a particular format. Each file system maintains its organization and some sort of description information, herein referred to as metadata, in format specific structures. Examples of such formats are HFS, MS-DOS FAT, ProDOS, High Sierra, ISO 9660, NTFS, and so on. The term format encompasses both physical disk format and network server access protocols.
To read and write data from and to the file system, the file manager must be able to recognize the format of the file system.
It is a well known disadvantage of current file systems that as the number of files increases the handling becomes more and more difficult. Fast access or retrieval and a clear overview of the whole file system are crucial for its daily use.
Despite the growing capacity of hard disks, the memory of a computer system remains in short demand, specially in light of increasing use of applications that require larger amounts
of storage, such as multimedia applications. The presence of too many files not only consumes unnecessary memory capacity but - because of the time it takes to find and access the files stored - substantially worsens system performance in terms of response time.
Each computer user has a huge variety of different kinds of files. In current file systems all these files are treated the same. If the computer system is used in a commercial environment, there are usually so-called record retention rules which define the different kinds of documents, where and how they have to be stored in memory, and - most important - how long the respective documents have to be archived. Current file systems do not support a user in adhering to such record retention rules.
Pruning facilities exist for removing files which have exceeded a common age but these are uniformly applied over the entire file system. For example, in a UNIX system, optional, periodically running software can be programmed to remove files in a list of folders (or directories) which are older than a set number of days. It is a disadvantage of such a global approach that all files are treated the same, no matter how important they are. Furthermore, systematic removal of files as described above is not a mandatory part of the operating system but a local customized option: with the increase use of unadministered personal workstations or laptops, the growth of files and the subsequent disk memory requirements can be substantial.
It is an object of the present invention to provide a scheme which makes efficient use of the limited memory resources of a computer system.
It is an object of the present invention to provide a scheme which allows archiving of files according to predefined rules, such as record retention rules, for example.
SUMMARY OF THE INVENTION
The objectives of the present invention have been accomplished by the method, computer program products, computer program elements, and system as claimed.
Advantages of the present invention are addressed in connection with the detailed description or are apparent from the description.
DESCRIPTION OF THE DRAWINGS
The invention is described in detail below with reference to the following schematic drawings. FIG. 1 is an illustration of an exemplary embodiment of the present invention.
DESCRIPTION OF PREFERRED EMBODIMENTS
In the following, the basic concept of the present invention is described. Before addressing different embodiments, the relevant terms and expressions are defined and explained.
The expression "element" is herein used to describe an element of a database. Examples of elements are: files (e.g., computer or machine readable files), partial information to be added to an existing file, an image from an image repository, for example, a multimedia object (e.g., an avi or mpeg movie).
The word "database" is herein used to describe any collection, library, or repository of information, such as a file database, or an image repository. Examples of databases are a DOS or Windows file directory. A database system is the system which houses the database. A database system and/or a database can be distributed.
An element content is herein referred to as data. An element might for example comprise client data, data generated by an application program, computer code for execution by a processor, and so forth.
The elements are arranged in each database system in a particular format and each database system maintains its organization and some sort of description information, herein referred to as metadata, in format specific structures. The metadata may contain the following exemplary information: file attributes, file name, date of creation, date of last revision, users with access permission, file size, file type (e.g., Lotus WordPro), etc. The metadata can be kept in the same memory as the elements' data or they can be kept in a separate memory, e.g., a cache memory. The metadata are like a card catalog that is found in the library, with the library being analogous to the database system.
In general, a database system enables the user to store, manage, share and reuse information about the elements which are stored in it. The repository enables the user to store more than just the elements' data. For example, the metadata stored in the database system may comprise information about the development of applications; including descriptions of data, programs and system objects. It may also include information about relationships among data, programs and system objects; as well as the semantics and use of the information.
The basic idea of the present scheme is to provide an automated file pruning facility which automatically removes elements (e.g. files) no longer needed from a database system. There are three main components: one for element (file) generation, one for element (file) representation, and one for element (file) disposal. These three components are described in connection with a file database comprising files.
File Generation: During file generation, e.g. when creating a file or adding a file to a database, a field is added to the file's metadata (e.g. to the file descriptor). This new field, which specifies the lifetime of the file, has to be initially set at the time a new file is generated or before the file is stored in the file database.
When creating a file 21 (steps 14, 12, and 10 in Figure 1) or adding a file 21 to a database 20 which comprises a memory with limited storage capacity, an individual lifetime is defined in the metadata (step 12) of this element. Then, the element's data and its metadata are stored in the memory. Note that an individual lifetime is assigned to each element.
File Representation or Marking: When a file 21 is first created the user has to select a lifetime (step 12). This can be done by asking the user to accept a default file lifetime, to
specify a specific file lifetime, or to declare the file as "permanent", as illustrated in box 16. Likewise, the user might also be prompted by the system to categorize the file 21 using a predefined categorization scheme (box 11). If the file 21 is categorized in a group of "legal documents", the default lifetime used for all legal documents (e.g. 20 years from data of creation) is assigned to the respective file 21. If the user categorizes the file 21 into a directory with private documents, a shorter period might be assigned, just to give an example.
Another alternative which is addressed later is illustrated in box 17 of Figure 1.
The lifetime can be defined by a user or an application program. Information concerning record retention rules might be provided to the user prior to defining the lifetime of a particular file.
File Disposal: When the file lifetime limit is reached, the file 19 will be erased, as indicated on the right hand side of Figure 1. This could be done by code scanning file descriptions in search for files which have reached their end-of-life (step 13). This code be run automatically at a specified frequency. An optional feature could be employed that asks the user to approve the list of files to be erased.
The metadata are examined from time-to-time to determine whether the lifetime of an element expired such that an element can be removed from said memory if its lifetime expired. Likewise, a user can be prompted if the lifetime of an element is about to expire or after it expired but before it is deleted by the system. In this case, the user has the option to chose whether the element is to be removed from said memory, or whether to change the element's lifetime.
Implementation: In the following a first embodiment is given. This embodiment is based on an object-oriented framework.
In a file database system using an object-oriented approach where each file is represented as an object class, the normal class "file" can be extended (e.g., through the inheritance mechanism in object-oriented programming) to a new class. This new class can be called "limited lifetime" class for example. The "limited lifetime" class comprises all those files which the user would like to have automatically removed at a later date. When a file of type
"limited lifetime" is generated, the user would be asked to accept a default lifetime or to specify a particular file lifetime. Together, the current date and the lifetime specify akill time for the file at which time it can be deleted. When the file is created, it is registered with the database or file manager as a "limited lifetime" type. The database or file manager will periodically scan its list of files with the type of "limited lifetime" and remove any whose kill time had expired.
Another embodiment is described in connection with Figure 1. This Figure shows a schematic representation of a file system 20 with several files. It furthermore shows a sequence of steps 15 for adding a new file 21, and a process for pruning (step 13) of the file system 20. According to the present embodiment, there are three steps that are carried out if a new file 21 is needed (step 14). A lifetime has to be selected (step 12), according to the present invention. Three options (box 11, box 16, and box 17) are offered for selection of the lifetime.
As indicated in box 11, there is a pool of file containers (e.g. represented by icons on a display). Each container has a preassigned lifetime. There might be a legal container that has a lifetime of 20 years, a temporary container with a lifetime of 10 days, and an accounting container with a lifetime of 10 years, for example. If the user wants to create a new file 21 that contains legal information, or that belongs in the legal category, he selects the respective legal container during step 12. The container can be selected by clicking on it (if a graphical user interface is employed), by using hot keys, or the like. By doing so the legal container's preassigned lifetime is assigned to the new file 21.
A different approach for selection of the lifetime is illustrated in box 16. The user may select a file, a file icon, or use an application program to create a new file 21. In this context he is presented with a dialogue to accept a default lifetime, specify a specific file lifetime, or to declare this file as permanent. This can be done by using appropriate icons, hot keys, or other well known means.
The processes depicted in boxes 11 and 16 are well suited for systems which help the user to observe record retention rules. The approach of box 1 1 is particularly well suited because the user does not need to worry about record retention rules at all. He just needs to put his newly created files in the right category. The rest is done by the system.
Another alternative is illustrated in box 17 where the system call to create a file 21 is extended to include the specification of the desired lifetime for the file 21. This can be done through the application programming interface (API).
Once the lifetime is selected, the actual file is created (step 10) and the lifetime is stored in the file's metadata. It is obvious that the sequence of steps can be changed. It is conceivable, for example, that a file 21 is created if needed by using an application software (e.g. a text processing software). When saving the newly created file to disk, or when adding it to the file system 20, the user may be asked to select a lifetime.
From time-to-time, either on a regular basis or at random intervals, the file system is automatically pruned. This pruning process is indicated by box 13. If during the pruning process a file 19 is discovered whose lifetime expired, then the respective file is discarded. As a optional safety net, the user might be prompted prior to this. Instead of prompting the user, all the files that are about to be deleted can be put in a waste basket. Here the files remain for a certain period of time. This gives the user an opportunity to intervene. If he does not remove the files from the waste basket, then the files are finally deleted.
A system in accordance with the present invention can also offer a feature which allows the user to act upon an element (e.g. a file) that is stored in a database (e.g. a file system). For this purpose the metadata of the respective element is fetched and displayed. Then, the user can act upon the element. Typical actions are: accessing, retrieving, opening, displaying, moving, or copying the element. The user can also changing the element's lifetime or he can link the lifetime with the lifetime of another element. The linking can be done by a drag-and-drop operation where the element is moved into a container with a desired preassigned lifetime, for example.
The present invention can be realized in hardware, software, or a combination of hardware and software. Any kind of computer system or other apparatus adapted for carrying out the methods described herein is suited. A typical combination of hardware and software could be a general purpose computer system with a computer program that, when being loaded and executed, controls the computer system such that it carries out the methods described herein. The present invention can also be embedded in a computer program product, which comprises
all the features enabling the implementation of the methods described herein, and which - when loaded in a computer system - is able to carry out these methods.
The present invention can also be used in systems with parallel data sharing access to files residing on network attached shared disks.
A special viewer can be used to review the lifetime of the files in a database. The viewer might offer a tool that organizes the files on a screen or within a window by their lifetime, for example. The viewer can be used to act upon the files and/or their lifetimes. Let us assume that one company decides to take legal action against another company. In such a situation all the documents that are of relevance are to be produced. It is important to avoid that these documents are deleted. All relevant documents can thus be displayed by the viewer and the user can act upon them. He might for example assign a new lifetime to each such document to ensure that the are kept for another five years.