WO1999030230A1

WO1999030230A1 - Naturally parallel computing system and method

Info

Publication number: WO1999030230A1
Application number: PCT/US1998/026436
Authority: WO
Inventors: Gustav O'keiff
Original assignee: Cacheon, L.L.C.
Priority date: 1997-12-12
Filing date: 1998-12-11
Publication date: 1999-06-17
Also published as: EP1058878A1; AU1724899A

Abstract

This invention is a system for executing naturally parallel programs on at least one processor. The system comprises of a loading means (130) for processing source code (120) and network messages (125) to create an execution graph (140) that resides in a memory means (105). The execution graph (140) comprises at least one node, each node having an associated address (165). A loading means (130) processes each node to create an execution graph (140). The execution graph (140) is an arbitrary arrangement of linked nodes that creates a network. All nodes at the same level in a vertical grouping are considered parallel events and may be executed in any order. A queuing means (150) takes nodes that are ready for execution from the execution graph (140) and places the addresses (165) of these nodes on a queue (160). One of more execution means (170) take addresses (165) off the queue (160) and execute the node that is stored at that address (165).

Description

NATURALLY PARALLEL COMPUTING SYSTEM AND METHOD

The present patent application claims priority to U.S. Provisional Patent Application Serial No. 60/069,433, entitled "C20S COMPUTING ENGINE," filed on December 12, 1997, and U.S. Provisional Application Serial No. 60/069,428 entitled "UNIFIED COMMUNICATION & COMPUTATION SYSTEM," filed on December 12, 1997.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to a method and system for executing naturally parallel programs on one or more processors.

2. Description of the Prior Art

Parallel computation can be defined as the simultaneous execution of multiple program instructions. Simultaneous execution is accomplished through the use of one or multiple processors. Since single processor computers can only execute one program's instruction at a time, to simulate simultaneous execution the operating system of the computer must take turns executing instructions from each active program. This process is called multi-tasking. When a single program is broken into multiple components that can be simultaneously executed, the process is called multi-threading, multi-tasking, or preemption. A program with multiple threads requires a programmer to use special utilities that direct the operating system how and when to execute these components.

Massively parallel computing involves computers with many processors. The programs developed for these computers are generally customized for the explicit computer and tailored to the communication protocols of the computers. Programs written for these massively parallel computers do not port well to computers with one or few processors.

The system and method of this invention achieves parallel computation over many programs without the use of multi-tasking or multi-threading. In addition, programs written to run on the system disclosed by this invention axe highly portable.

Other means of parallel execution involve Data-Flow techniques, Petri Nets and Finite State Machines. These other techniques are designed around specific control strategies and/or involve the use of data as a fundamental element in their

design. This invention is purely an execution method for parallel computation and is independent of system data and how it is used. In all existing computational systems with the exception of neural nets, program functionality depends on the existence of many operation types for operating on data. This computing system allows programs to be developed where functionality is dependent on the shape of the program as well as on the operations used.

SUMMARY OF THE INVENTION

This invention is a system for executing naturally parallel programs on at least one processor. Naturally parallel programs are programs that may be divided into nodes, each node having a finite set of instructions and data. The system comprises of a loading means for processing source code and network messages to create an execution graph that resides in a memory means. The execution graph comprises of at least one node, each node having an associated address. A loading means processes each node to create an execution graph. The execution graph is an arbitrary network and may be a hierarchical representation of the ordering which the nodes are to be executed. All nodes at the same level in a vertical grouping are considered parallel events and may be executed in any order. A queuing means takes nodes that are ready for execution from the execution graph and places the addresses of these nodes on a queue. One or more execution means take addresses off the queue and execute the

node that is stored at that address.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram illustrating the components of the invention.

FIG. 2 is a diagram illustrating the execution graph. FIG. 3 is a block diagram illustrating the executable queue.

FIG. 4 is a block diagram illustrating the loading means.

FIG. 5 is a block diagram illustrating the communication to external applications.

FIG. 6 is a block diagram illustrating same queue execution.

FIG. 7 is a block diagram illustrating round robin execution.

FIG. 8 is a block diagram illustrating fixed cell execution.

DETAILED DESCRIPTION OF THE INVENTION

As shown in FIG. 1, this invention is a system for executing naturally parallel programs on at least one processor. Naturally parallel programs are programs that may be divided into nodes, each node being a finite set of instructions and data. The system 100 comprises of a loading means 130 that processes source code files 120 and network messages 125 into nodes, as for example N1-N5. The nodes are placed in an execution graph 140. The execution graph resides in memory means 105 that contains an execution space 135. In the preferred embodiment, the execution space 135 may be virtual memory and may contain utilities 141 to add, delete or modify nodes, an execution graph 140, and a list of free nodes 142. The execution graph 140 contains at least one node, each node having a finite set of mstructions and data and wherein each node has an associated address 165. The execution graph 140 is any arbitrary arrangement of linked nodes that creates a network. Note that the arbitrary arrangement of linked nodes in an execution graph 140 only needs to be meaningful.

The arbitrary network may include, mter alia, a hierarchical representation of the ordering which the nodes are to be executed, a neural net, cells in a numerical method of finite element analysis, or any arbitrary structure that achieves the programmer's goals. A queuing means 150 queues at least one address associated with one of the at least one node responsive to the execution graph 140 thereby creating a queue 160 of addresses 165 to nodes which may be executed in parallel. One or more execution means 170 reads addresses from the queue 160, de-references the address 160 and executes the node in the memory means 105 stored at that address.

This system executes programs consisting of networks. A network is a collection of nodes or objects that are linked together. Nodes and objects are used interchangeably in this document. A node is a finite set of instructions and data. Nodes may be executable entities, information, or any combination of the two. Nodes may be Windows, buttons, sliders, mathematical expressions, etc. When a node executes, it performs an action, such as adding two numbers together or flashing a message. When a node finishes executing it causes all nodes linked to it to be executed. Neither data nor messages need to be passed to the linked nodes for them to execute, and unlike conventional object oriented programming, explicit method calls are not necessary to execute an object.

In languages like Java or C++, there is no executable infrastructure. The order in which logic is executed is defined within the walls of a method or function. A program's logic must call the explicit method of an object when it is to be executed.

In contrast, in this system, the execution graph 140, explained below, provides a high- level execution framework. 1. Execution Graph

This system executes naturally parallel programs through the use of an event- driven asynchronous execution graph 140. The execution graph 140 is a network of nodes representing an arbitrary network. Note that a set of nodes may not have to be executed in any order. The execution graph provides the medium of node execution and the medium of node organization. Nodes are further defined under the queuing means, below, but as stated earlier, a node is a finite set of instructions and data.

The execution graph 140 can be linear, hierarchical, or any conceivable shape that achieves the necessary computational results. The use of an execution graph makes a program's functionality dependent on the type of nodes being executed as well as the shape of the graph executing the nodes. For example, an arbitrary network may be a hierarchical representation of the ordering which the nodes are to be executed. The execution graph 140 may also represent a neural net, or simulate a set of complex interacting components such as mechanical gears. Another example is that the nodes in an execution graph 140 may also represent cells in a numerical method of finite element analysis. In other words, the execution graph may be any structure that achieves the programmer's goals.

Referring to FIG. 2, an arbitrary network may be a hierarchical representation of the ordering which the nodes are to be executed. If the execution graph 140

represents a hierarchy, then the execution of nodes in the execution graph proceeds from left to right. All nodes at the same level in a vertical grouping are considered parallel events and may be executed in any order. Since all nodes in a vertical grouping are in memory, the nodes that are already in memory are executed first. This makes processing efficient because one of the biggest bottlenecks in computation is reading and writing objects from the disk.

As shown in FIG. 2, in a hierarchical representation of an arbitrary network, executable nodes are linked together to form an execution graph 200 which is stored in memory means 105. To set up for execution the nodes on the execution graph 200, the queuing means 150 places the addresses of nodes ready for execution on the queue 160. For example, the address of node 201 is first placed on the queue for execution by putting its address into the execution queue 160. The queuing process could be the press of a button, the arrival of a message, the opening of a window, etc. After node 201 executes, nodes 203, 205, and 207 are queued for execution. After these nodes execute, node 209 and nodes 211 and 213 execute. Node 216 is called a tie node because it will execute after the execution of nodes 209 and node 215. After tie node 216 executes, node 217 is executed and the program represented by execution graph 200 is complete.

As shown by node 216, in the preferred embodiment a method for controlling the dependent execution of a node is by using a special object type called a 'tie'. The

tie is linked to the nodes that must complete execution. Each linked node is

represented by a bit within the tie node. After a linked node executes it queues a message indicating its predefined bit number and the address of the tie. This in turn makes the tie node get queued for execution. When the tie executes, it sets the corresponding bit of the node that triggered it and then test to see if all bits have been set. If so, the dependent node is queued for execution and the bits are cleared.

Nodes in any portion of the graph can be triggered for execution. There are no "do-while", "for", "go-to" or "switch" statements. As explained in the section of the queuing means, the flow of execution depends on the connectivity between nodes and the properties of the node being executed.

2. Nodes

2.1 Node Reuse

An important consequence of this system is the reduction in program size and complexity. Unlike conventional applications that must include all objects and all links to all objects required for them to perform their designed functionality, objects added to this system are immediately available for use by any and all other objects.

2.2 Node's Executable Logic

A node's executable logic consists of a finite sequence of binary mstructions. To execute the node, the execution means 170 calls the address of the first instruction

in the sequence. The sequence executes up to the last instruction, which must be a

return such as 0xc3 (Pentium hex value). The sequence of binary instructions may contain jump instructions that advance the execution mean's 170 instruction pointer to an instruction within the sequence, but there can never be a jump backward instruction in the sequence. The inclusion of a jump backward could cause execution of the sequence to be endless and control would never be returned to the execution mean's main program allowing the execution mean's to fetch and execute another node.

Therefore, the assembler used by the execution means 170 to produce a node's executable logic does not allow backward jump to occur.

2.3 Loops

In conventional languages, a loop is accomplished using a backward jump instruction. If a programmer makes a mistake and fails to write a correct terminating condition, the loop will execute forever. If this happens the user must force the operating system to end the program. For example in Windows this is accomplished using by simultaneously pressing the keys <cntl> <alt> and <del>.

In this system, the simplest of loops is accomplished by linking the node to itself so that when it finishes executing it queues its own address. Addresses are always added to the end of the queue. This simple technique allows the system to be executing many infinite loops without 'hanging' the system.

2.4 Programming Language

The system's programming language is used to describe the properties of an

object. With the exception of writing mathematical expressions, the language uses keywords to specify object types and datums that are to receive values. In the example below, the first statement defines an object of type 'form' (equivalent to a Window) and the second statement places an object of type 'button' on the form: Form: Example rgn:10 10 500 400 brushcolorl : 0 0 255 brushcolor2 : 0 255 0 style :sys;

>> button: * title:Press Me rgn: 230 190 40 20 style:push;

The form's name is "Example" it is positioned at x =10, y=10 and is 500 by 400 pixels in size. It has a color gradient that goes from blue (RGB value - 0 0 255) to green (RGB value - 0 255 0) and it has a system button in the upper right hand corner allowing it to be closed. The button does not have a name but it has the title "Press Me". It is located in the center of the form at x=230, y=190 with size of 40 x 20 pixels. It is a push button (vs. a toggle button).

The operator "»" tells the loading means 130 to link the button to the form (the button becomes a child of the form).

The first word is the name of predefined class type in the system (also nodes) and the keywords following it are the names of datums or member methods that process the arguments following the keywords. The keywords can be defined in any order.

Mathematical expressions are written the way one would normally write them (A = B + C;). If the loading means 130 does not detect the use of the colon

immediately following the first word, it assumes the object is of type math. 3. Surgical Modification of Executing Logic

Because nodes have a well-defined location in the execution graph they can be easily modified, replaced or deleted while the application they are a part of continues to execute. This capability is known as surgical modification. It allows the

performance of an executing program to be modified without having to inconvenience the user by shutting down and restarting the program.

Surgical modification is important for doing remote computing. A user may step into the execution graph residing on a different computer (provided they have access) test logic using an interactive debugger and make corrections as necessary. Once corrections have been made, these fixes can be propagated to all computers in a network without inconveniencing the user.

To surgically modify an execution graph a network message 125 is sent to the loading means 130 containing the desired graph. The message includes a query identifying the node to be effected, the operation to occur (add, delete or modify) and a node or a piece of an execution graph. If the node being modified is pending execution, the system will wait for it to complete before making the modification.

Surgical modification is a feature allowing a user to fix problems with an application without having to ship whole executables or dynamic link libraries. To ensure security, any time a surgical modification is performed, the system registers the identity of the system that caused the change to occur. 4. Queuing Means

The queuing means 150 queues addresses 165 associated with the nodes on the execution graph 140 that are ready for execution. As shown in FIG. 3, in the preferred embodiment the queue 160 may be a FIFO (First In First Out) queue and a link object.

In the preferred embodiment, the FIFO queue may be an array of 65536 addresses.

To execute a node the address of the node is placed into the bottom of the queue and an "end" index pointer 368 is incremented. The execution means 170 reads an address of a node from the queue 160. The "start" index-pointer 365 is then incremented (incrementing an index pointer past Oxffff returns the value to zero). The address of the node is de-referenced and a call is made to transfer execution to the first executable instruction in the node 370. The object's logic is executed at 375. If the node is not executable it should not have been queued, but if it had been, the first instruction should be a <return>.

If there are nodes linked to the one executmg, the last portion of the logic queues the address of the node's link at 380. This proceeds until there are no more instructions in the queue in which case the system stops execution.

In the preferred embodiment, to keep the system alive in an interactive environment, a special node is queued when the system starts up. This special node

contains logic required to read messages from the Operating System. As messages are

received, the system determines which objects they apply to (buttons, sliders, fields,

windows, etc.) and adds the object's address to the execution queue. When there are no more messages, the node queues itself for later execution. A link is an executable node. When a link executes it can either queue the address of the nodes on the queue or execute the node directly. The later method reduces overhead.

Because nodes are persistent within the system, they are immediately available for execution at all times. Objects can be executed as quickly as they can be swapped into memory. The overhead associated with creating a task, allocating a process id, allocating system resources, opening an executable file, mapping it into virtual memory and performing dynamic run time linking is gone.

The following code is an example of code that may be used to read an executable graph from a file, put it in memory means 105, and to place the nodes ready for execution on the queue. The numbers in brackets "[]" to the left of the code are reference numerals that are referred to in the explanation of the code, below.

***

// Name: SAMPLE QUEING MEANS *** int PASCAL WinMain( HINSTANCE crnt, HINSTANCE prev, LPSTR cmnd, int show ) { ui2 qend; //unsigned 2-byte integer ui2 qstart; //unsigned 2-byte integer ui4 queue [65536] //65536 element array

//--Setup

[1010] alloacte_memory() ; [1015] load_graph_into_memory ( ) [1017] queue_first_address 0 ; //

// Start of memory - ebp

// Address of object - ebx

_asm{

[1019] ov ebp, memory // move memory address to ebp

[1020] read_queue:

[1021] mov ebx, DWORD PTR [qstart] // get qstart index

[1023] mov ebx, queue [ebx] // get address of next object from queue

[1025] add ebx, ebp // compute address to current object

[1027] inc qstart // increment start of queue pointer index

[1029] call ebx // execute the object

[1031] jmp read queue // do it again

} return (0) ;

}

Memory is allocated at step [1010]. The executable graph is loaded into memory at step [1015]. The addresses of the nodes ready for execution are queued at step [1017]. The address for the beginning of memory is moved to the register ebp at step [1019]. Step [1020] starts the beginning of a loop named read_queue. The index for the start of the queue is put into register ebx at step [1021]. The address of the next node on the queue is retrieved at step [1023]. The address of the node to be executed is computed by adding the address of the node to the address of the beginning of memory at step [1025]. Note all addresses of the nodes may be dereferenced by adding the address of the node on the queue to the beginning of memory. The index to the queue is incremented at step [1027]. The node is executed at step [1029]. The read_queue loop is repeated at step [1031].

Nodes are objects instantiated from classes that are themselves objects.

Objects can be functions, external applications, controls, forms, mathematical expressions, etc. Data can flow into a node in the form of buffered messages or arguments passed to a function. Data can also be referenced as operands in mathematical expressions. The way data is handled by a node or propagates through the system is independent of how the logic executes.

There are two types of nodes, executable nodes that contain instructions and link nodes that contain links to executable nodes. In the preferred embodiment executable nodes may have the following structure:

1) six byte far jump to beginning of action logic;

2) header defining the node's length and other properties;

3) binary instructions to be executed; and

4) binary instructions that add the address of the node's link into the execution queue.

When the "call ebx" instruction at step [1029] above is executed, control branches to the jump command which increments the processor's instruction pointer to the first instruction in the node. The last instruction in the node is the address of the link. This address is placed into the queue. The C language representation of queuing an instruction is:

++qend;

queuefqend] = address_of_link; If a node is linked to another it must have an associated link. In the preferred embodiment, the structure of a link is as follows:

1 ) six byte far jump to beginning of link action;

2) header defining the links length and other properties;

3) list of addresses of linked nodes; and

4) binary instructions that comprise the links action.

The code sequence below is the link instructions. First, the address of the link node is stored in register "ebx". The offset to the list of addresses (LINK_ADDR) is added to the register at step [2010]. An address from the list is put into register "ecx" and if zero then the end of the list has been reached and control returns to the main program, step [2016]. If it is not zero, the node is executed at step [2020].

[2010] add ebx, LINK_ADDR // move to the first address

[2012] anuderone

[2014] mov ecx, DWORD PTR[ebx] // put address of node into ecx

[2016] jecxz returnx // if zero return

[2018] add ecx, ebp // move to object

[2020] call ecx // execute object

[2022] add ebx, 4 // move to next address

[2024] jmp anuderone

[2026] returnx

5. Memory Means

The memory means 105 may be system executable memory that can reside on

one computer or span many computers in a network. Programs or applications do not exist as separate executable files that are loaded and dynamically linked prior to being executed. Instead, all applications exist as a collection of linked objects or nodes and any object or node that is placed on the queue is immediately executable. In the preferred embodiment, the memory means 105 may be virtual memory and manages objects or nodes up to 2-gigabytes and a flat addressable space of 2- gigabytes per system. However, the nodes in an execution graph can be linked to nodes in any other execution graph creating an executable virtual space equal in size to all the space managed by all system enabled computers that are network connected. This creates a fine grained global execution space in which any node can discreetly trigger any other node.

Unlike conventional systems that use files and directories, in this system, all information, regardless of its content or use, must reside in objects linked into the execution graph. The execution graph provides the method of organization as well as execution. This organizational approach is referred to as an object centric environment vs. a file centric environment.

In the object centric environment there are no applications to start. If a user wants to edit a document, for example, the object representing the document is queued and executed by selecting the object.

This invention is a complete computing system. It unifies elements of computation, communication, and data management. It executes its own logic, has its own memory manager, has its own scaleable quality of service protocol and uses a

virtual processor/assembler to generate platform dependent binary executable objects. The system does not use a classic assembly or compilation process to load objects. Objects arrive into the system via network messages 125 or source code stored in files 120. The definition of an object is descriptive, not binary. An object's description is processed by the loading means 130 and a binary object with data and instruction is produced and linked into the execution graph 140. Objects combine data and binary executable instructions in the same entity. An object's binary instructions are produced by a compilation process that is part of the loading means

130. The compilation process is table driven making use of a "generic" assembly language. The assembly language is defined using an execution graph 140 that defines its syntax and semantic rules. Additionally, the table includes the processor's native hex instruction codes that are produced as a consequence of the compilation process. The native hex codes are processor dependent and can be updated for any type of processor. This makes the execution graph 140 completely portable and platform independent.

To make the assembly language portable over different environments, register names and operations have been abstracted. Doing this allows one to develop efficient code without having to worry about portability issues. For example the system makes special use of Pentium registers edi, esi and ebp. Other registers including eax, ebx, ecx are general purpose. The use of a generic assembler insulates the programmer from these special use registers. General use registers are labeled

regl ... reglO. Even though the Pentium does not support this many registers, the generic assembler emulates the additional registers using cache memory. The goal of the generic assembler is to allow a developer to write efficient low level code that is portable across incompatible processors.

Another advantage to the generic assembler is the prevention of virus attack. Viruses occur when illegal sequences of code invade the system. The code sequences can be designed to destroy or modify information or simply hang the system. When a program is loaded into this system's environment, it is converted into persistent native executable objects. This means that programs are loaded only once. After a program is loaded it resides in the environment until it is deleted, modified or replaced. Programs become a part of the system and require no process loading procedure to execute them.

Objects being loaded into the system do not have binary executable logic associated with them. The executable portion of the object is generated at the time of load. This tactic makes it difficult to introduce illegal code sequences into the system.

If an illegal sequence does find its way into the execution graph 140 and causes a problem that can be detected by the system or identified by a user, the system responds by marking the object as non-executable.

Another approach to detecting illegal code sequences is for the loading

means to scan the executable portions of the objects in its execution graph. The binary codes in the executable stream can be compared to valid codes in the execution graph. If illegal instructions are found the object can be marked nonexecutable.

Finally, the system is constantly verifying the integrity of objects in pages of memory. If a page of memory or an object is corrupted the memory manager flags the page or object so that it will not be used.

6. Loading Means

Before a node can be executed it must be loaded into the execution graph by the loading means 130. As shown in FIG. 4, nodes can be loaded from source files 120 or as messages received over the network 125. The first step in loading nodes is to determine where they will reside in the execution graph. In the preferred embodiment, a query 422 precedes a node definition. The query references an existing node within the network. Next an editing operation must be specified. There are three editing operations: add, remove and modify. If no operation is specified the system assumes the node is being added.

Once the location and editing operation has been determined, the loading means 130 parses 424 the node to determine its type, size and instantiation values. The loading means 130 allocates memory 426 in the memory means 105 for the node,

loads any data values into the node and invokes the math grammar parser or assembler

428 to generate the executable portion of the node. Finally the node is linked into the execution graph 140 kept in the execution space 430 where it is immediately available for execution. This process continues until all objects have been processed. 7. Execution Means

When the system starts up, it allocates memory in the memory means 105 for its page frame buffer, opens the file containing the execution graph and loads the graph into memory means 105. Next all external utilities and objects are registered and linked into the system. In other words, the run time addresses of all external executables are stored in the execution graph 140 as another node. This technique allows a developer to write executable logic in a language other than the system language. As shown in FIG. 5, the addresses of external applications 540 and 550 are added to the execution graph 140 in the execution space 135. In other words, the run time addresses are stored in the execution graph as another node. External applications may be Windows DCOM and Unix CORBA. The queuing means 150 then queues the addresses of all nodes that are to be executed when the system starts. Finally control branches to the main program of the execution means 170.

This invention supports more than one execution means 170. Therefore, all nodes whose addresses are on the queue may be executed in parallel because the nodes are independent of the number of processors available to execute them or the number of computers over which they can be partitioned and executed on.

Note that in this system programs no longer exist as bound executables but rather as a collection of objects that are part of the system and that any other program

may make use of. In other words, programs developed for the system become a permanent part of the system that extend the system's functionaUty. After objects are loaded into the execution graph 140, they reside in the execution space 135 permanently until they are deleted. Objects do not require re-loading each time they are to be executed. The result of this system is a unified computing environment where there is no disctinction between the operating system, executable programs and data files.

7.1 Load Balancing Metric

Additionally, the queue provides a metric for the instantaneous computing load required of any computer at any instant in time. The number of addresses in the queue multiplied by the execution time for the object represents an execution backlog. In a distributed computing environment where a program's functionality is spread across several network computers, this metric provides the information on how to and when to partition a program and distribute its execution to achieve optimal load balancing. The metric also allows the system to determine when a computer has reached a critial backlog.

7.2 Multiple Processor Topologies

Depending on the number of processors, three basic configurations for parallel computing may be used. These are: Same queue execution; Round robin; and Fixed

cell.

The same queue execution method is most suitable for machines with no more than four execution means. The reason for this is because each execution means must share memory with other processors and each processor takes turns retrieving a node address from the instruction queue. As shown in FIG. 6, the same queue execution method is implemented by one queue being accessed by multiple execution means

170 or processors. Initially queue 510 contains nodes N1-N3. Processor 515 executes node Nl and when finished node N4 is placed at the end of queue 510. Processor 520 executes node N2 and when finished places node N5 at the end of queue 510. All nodes in the queue 510 are considered parallel events and can be executed in any order. This ensures that nodes already in memory can be executed while other nodes are being swapped into memory. This allows the implementation of an optimal page swapping algorithm.

Round Robin is most suitable for systems with more than four execution means 170. This scheme also assumes that all execution means have access to all memory. As shown in FIG. 7, the organization of round robin is a matrix of processors such as 710, 720, 730, 740 and 750. Each cell in the matrix contains a processor and an execution queue. For example matrix 710 contains processor 711 and queue 712. The processor 711 reads node addresses from its own queue 712 and executes them but queues addresses into adjacent queues. For example any nodes to be placed on the queue by processor 711 will be placed in the queue of matrix 740. This mechanism provides a natural means of achieving load balancing across the execution space.

The fixed cell scheme is most suitable for massively parallel computers in which global memory sharing is not required. Like round robin, processors and queues exists in a matrix. As shown in FIG. 8, matrices are 810, 820, 830, 840, 850,

860, and 870. Each processor reads from, executes and queues addresses to its own queue. The execution space is partitioned across the processor space. When a node is to trigger a node outside of its cell, it must send a signal to the processor cell containing the node along with any information required to complete execution.

8. Between Nodes

Unlike client server applications that uses ports or DCOM and CORBA objects that uses registries, this system allows direct communication between nodes within its execution graph.

As explained earlier, the execution graph in the execution space is a collection of linked nodes, which are objects. Objects can be controls, mathematical expressions, functions and external applications. The system communicates with other system installations by sending messages to nodes. This allows any application running on a system to communicate directly with nodes within another system's execution space. For example, if a form has two-hundred fields on it that display results from two-hundred remote computations, each field would be linked to a discrete node within a discrete body of logic on the remote computer(s) and vice

versa. This tight coupling allows logic to be precisely partitioned and placed exactly

where it needs to be for parallel distributed execution to occur most efficiently.

This technique of linking nodes between applications is also applied when linking external applications and objects. When one external object signals another, the system does so by sending messages to the node that represents the external object. All external applications used by the system must be represented by a node in the system. The node identifies the name of the object and its location in the

Operating System's directory. Messages are sent to the external object using a shared memory buffer. An Operating System message is then generated causing the external object to activate and read the message from its buffer. The message is then put into shared memory and the external object is triggered.

To communicate between nodes, the preferred embodiment implements a node type called a 'refr'. Refr's are used in class definitions to create conceptual objects whose data can be distributed across a network or they can exist as standalone objects that tie together portions of the execution graph 140.

Refrs are used in a program's logic to reference objects and datums. Datums are fields within an object, arguments passed to a function or stand alone variables. Refrs act like the glue within the execution graph 140 allowing whole groups of objects to be triggered when the value they reference changes. When a portion of the execution graph 140 is distributed, the partitioning occurs at refr nodes. When the graph portion has been re-distributed, references are automatically updated with the IP addresses of the remote computer and the virtual address of the corresponding remote refr. 9. External Applications

This system allows one to register external applications. An enabled application makes a call to a register utility for permanent registration or to a logon utility for temporary registration. These utilities are part of a link library that must be included during the development of the application.

When an application registers itself with the system, two shared memory buffers are created. One buffer is used for sending messages and the other for receiving them. Buffers appear as data structures within C programs and objects within C++ programs. Applications use a collection of utilities or methods to connect, request, receive and send messages. The preferred embodiment of this system also uses these buffers.

The buffers are designed to allow the application and the system to simultaneously read and write at the same time without the use of semaphores. There is no need to copy a message from one address space to another as is classically done in intra-process communications. This technique allows out-of-process calls to execute almost as fast as in-process calls.

The above-described embodiments are given as illustrative examples only. It

will be readily appreciated that many deviations may be made from the specific embodiments disclosed in this specification without departing from the scope of the invention. Accordingly, the scope of the invention is to be determined by the claims below rather than being limited to the specifically described embodiments above.

Claims

1. A system for executing naturally parallel programs wherein each program is a set of nodes, comprising:

(A) memory means for storing at least one node, each node having a finite set of information and wherein each node has an associated address;

(B) loading means for processing each node to create an execution graph, such execution graph being an arbitrary network;

(C) queuing means for queuing at least one address associated with one of the at least one node responsive to the execution graph, thereby creating a queue of nodes which may be executed in parallel; and

(D) at least one execution means for executing a node associated with an address queued on the queue.

2. The system of claim 1, wherein the execution graph represents a hierarchical representation of the ordering which the nodes are to be executed.

3. The system of claim 1, wherein the execution graph represents a neural network.

4. The system of claim 1, wherein the execution graph represents a simulation of a set of complex interacting components.

5. The system of claim 1, wherein the execution graph represents cells in a numerical method of finite element analysis.

6. The system of claim 1, wherein the finite set of information in each node includes instructions and data.

7. The system of claim 1, wherein the means for queuing creates a first in first out queue.

8. The system of claim 7, wherein the queue is an array of 65536 addresses.

9. The system of claim 1 , wherein the memory means comprises of an execution space.

10. The system of claim 9, wherein the execution space spans one physical device.

11. The system of claim 9, wherein the execution space spans more than one physical device.

12. The system of claim 1, wherein the execution means translates the instructions to generic assembly language.

13. The system of claim 1, wherein the loading means performs the steps of:

(A) determining placement of the node in the execution graph;

(B) parsing the node to determine its type, size and instantiation values;

(C) allocating memory for the node;

(D) loading data values into the node;

(E) invoking the parser to generate executable instructions; and

(F) linking the node into the execution graph.

14. The system of claim 13 wherein the loading means includes an editing means for modifying the execution graph.

15. The system of claim 1, wherein there are multiple executing means executing in a round robin topology.

16. The system of claim 1, wherein there are multiple executing means executing in a fixed cell scheme.

17. The system of claim 1, wherein there are multiple executing means executing on the same queue.

18. A method for executing naturally parallel programs wherein each program is a set of nodes, comprising:

(A) storing at least one node, each node having a finite set of information and wherein each node has an associated address;

(B) loading each node to create an execution graph, such execution graph being an arbitrary network;

(C) queuing at least one address associated with one of the at least one node responsive to the execution graph, thereby creating a queue of nodes which may be executed in parallel; and

(D) executing at least one node whose address is queued on the queuing means.

19. The method of claim 18, wherein the ordering of the nodes on the execution graph is a hierarchical representation of the ordering which the nodes are to be executed.

20. The method of claim 18, wherein the ordering of the nodes on the execution graph represents a neural network.

21. The method of claim 18, wherein the ordering of the nodes on the execution graph represents a simulation of a set of complex interacting components.

22. The method of claim 18, wherein the ordering of the nodes on the execution graph represents cells in a numerical method of finite element analysis.

23. The method of claim 18, wherein the finite set of information in each node includes instructions and data.

24. The method of claim 18, wherein the means for queuing creates a first in first out queue.

25. The method of claim 18, wherein the queue is an array of 65536 addresses.

26. The method of claim 18, wherein the memory means comprises of an execution space.

27. The method of claim 26, wherein the execution space spans one physical device.

28. The method of claim 26, wherein the execution space spans more than one physical device.

29. The method of claim 18, wherein the execution means translates the instructions to generic assembly language.

30. The method of claim 18, wherein the loading means performs the steps of:

(A) determining placement of the node in the execution graph;

(B) parsing the node to determine its type, size and instantiation values;

(C) allocating memory for the node;

(D) loading data values into the node;

(E) invoking the parser to generate executable instructions; and

(F) linking the node into the execution graph.

31. The method of claim 18, wherein the loading means includes an editing means for modifying the execution graph.

32. The method of claim 18, wherein the nodes are executed in a round robin topology.

33. The method of claim 18, wherein the nodes are executed in a fixed cell scheme.