US20100141665A1 - System and Method for Photorealistic Imaging Workload Distribution - Google Patents

System and Method for Photorealistic Imaging Workload Distribution Download PDF

Info

Publication number
US20100141665A1
US20100141665A1 US12/329,586 US32958608A US2010141665A1 US 20100141665 A1 US20100141665 A1 US 20100141665A1 US 32958608 A US32958608 A US 32958608A US 2010141665 A1 US2010141665 A1 US 2010141665A1
Authority
US
United States
Prior art keywords
server
rendering
load balancing
processed
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US12/329,586
Other versions
US9270783B2 (en
Inventor
Joaquin Madruga
Barry L. Minor
Mark R. Nutter
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Priority to US12/329,586 priority Critical patent/US9270783B2/en
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION reassignment INTERNATIONAL BUSINESS MACHINES CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MADRUGA, JOAQUIN, MINOR, BARRY L., NUTTER, MARK R.
Priority to JP2011539018A priority patent/JP5462882B2/en
Priority to PCT/EP2009/066257 priority patent/WO2010063769A2/en
Priority to CN200980148614.1A priority patent/CN102239678B/en
Publication of US20100141665A1 publication Critical patent/US20100141665A1/en
Priority to US15/049,102 priority patent/US9501809B2/en
Application granted granted Critical
Publication of US9270783B2 publication Critical patent/US9270783B2/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • H04L67/1023Server selection for load balancing based on a hash applied to IP addresses or costs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/005General purpose rendering architectures
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/75Indicating network or usage conditions on the user display
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2210/00Indexing scheme for image generation or computer graphics
    • G06T2210/52Parallel processing
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09GARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
    • G09G2352/00Parallel handling of streams of display data

Definitions

  • the present invention relates generally to the field of computer networking and parallel processing and, more particularly, to a system and method for improved photorealistic imaging workload distribution.
  • Modern electronic computing systems such as microprocessor systems, are often configured to divide a computationally-intensive task into discrete sub-tasks.
  • some systems employ cache-aware task decomposition to improve performance on distributed applications.
  • the gap between fast local caches and large slower memory widens, and caching becomes even more important.
  • typical modern systems attempt to distribute work across multiple processing elements (PEs) so as to improve cache hit rates and reduce data stall times.
  • PEs processing elements
  • ray tracing a photorealistic imaging technique
  • ray tracing tasks can nevertheless have a very high spatial and temporal locality.
  • a cache aware task distribution for ray tracing applications can lead to high performance gains.
  • each PE grabs new tiles after it has processed its prior allotment. But since the PEs grab the tiles from a general pool, the tiles are less likely to have a high spatial locality. Thus, in a work-stealing system, the PEs regularly flush their caches with new scene data and are therefore cold for the next frame, completely failing to take any advantage of the task's spatial locality.
  • a graphics client receives a frame, the frame comprising scene model data.
  • a server load balancing factor is set based on the scene model data.
  • a prospective rendering factor is set based on the scene model data.
  • the frame is partitioned into a plurality of server bands based on the server load balancing factor and the prospective rendering factor.
  • the server bands are distributed to a plurality of compute servers. Processed server bands are received from the compute servers.
  • a processed frame is assembled based on the received processed server bands. The processed frame is transmitted for display to a user as an image.
  • a system comprises a graphics client.
  • the graphics client is configured to receive a frame, the frame comprising scene model data; set a server load balancing factor based on the scene model data; set a prospective rendering factor based on the scene model data; partition the frame into a plurality of server bands based on the server load balancing factor and the prospective rendering factor; distribute the plurality of server bands to a plurality of compute servers; receive processed server bands from the plurality of compute servers; assemble a processed frame based on the received processed server bands; and transmit the processed frame for display to a user as an image.
  • FIG. 1 illustrates a block diagram showing an improved photorealistic imaging system in accordance with a preferred embodiment
  • FIG. 2 illustrates a block diagram showing an improved graphics client in accordance with a preferred embodiment
  • FIG. 3 illustrates a block diagram showing an improved compute server in accordance with a preferred embodiment
  • FIG. 4 illustrates a high-level flow diagram depicting logical operational steps of an improved photorealistic imaging workload distribution method, which can be implemented in accordance with a preferred embodiment
  • FIG. 5 illustrates a high-level flow diagram depicting logical operational steps of an improved photorealistic imaging workload distribution method, which can be implemented in accordance with a preferred embodiment
  • FIG. 6 illustrates a block diagram showing an exemplary computer system that can be configured to incorporate one or more preferred embodiments.
  • the present invention may be embodied as a system, method or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, the present invention may take the form of a computer program product embodied in any tangible medium of expression having computer usable program code embodied in the medium.
  • the computer-usable or computer-readable medium may be, for example but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium.
  • the computer-readable medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CDROM), an optical storage device, a transmission media such as those supporting the Internet or an intranet, or a magnetic storage device.
  • a computer-usable or computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted, or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.
  • a computer-usable or computer-readable medium may be any medium that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
  • the computer-usable medium may include a propagated data signal with the computer-usable program code embodied therewith, either in baseband or as part of a carrier wave.
  • the computer usable program code may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc.
  • Computer program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
  • LAN local area network
  • WAN wide area network
  • Internet Service Provider for example, AT&T, MCI, Sprint, EarthLink, MSN, GTE, etc.
  • These computer program instructions may also be stored in a computer-readable medium that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable medium produce an article of manufacture including instruction means which implement the function/act specified in the flowchart and/or block diagram block or blocks.
  • the computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • a data processing system suitable for storing and/or executing program code will include at least one processor coupled directly or indirectly to memory elements through a system bus.
  • the memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution.
  • I/O devices can be coupled to the system either directly or through intervening I/O controllers.
  • Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modems and Ethernet cards are just a few of the currently available types of network adapters.
  • FIG. 1 is a high-level block diagram illustrating certain components of a system 100 for improved photorealistic imaging workload distribution, in accordance with a preferred embodiment of the present invention.
  • System 100 comprises a graphics client 110 .
  • Graphics client 110 is a graphics client module or device, as described in more detail in conjunction with FIG. 2 , below. Graphics client 110 couples to display 120 .
  • Display 120 is an otherwise conventional display, configured to display digitized graphical images to a user.
  • Graphics client 110 also couples to a user interface 130 .
  • User interface 130 is an otherwise conventional user interface, configured to send information to, and receive information from, a user 132 .
  • graphics client 110 receives user input from user interface 130 .
  • user input comprises a plurality of image frames, each frame comprising scene model data, the scene model data describing objects arranged in an image.
  • user input also comprises camera movement commands describing perspective (or “eye”) movement from one image frame to another.
  • graphics client 110 also couples to network 140 .
  • Network 140 is an otherwise conventional network.
  • network 140 is a gigabit Ethernet network.
  • network 140 is an Infiniband network.
  • Network 140 couples to a plurality of compute servers 150 .
  • Each compute server 150 is a compute server as described in more detail in conjunction with FIG. 3 , below.
  • graphics client 110 couples to the compute servers 150 through network 140 .
  • graphics client 110 couples to one or more computer servers 150 through a direct link 152 .
  • link 152 is a direct physical link.
  • link 152 is a virtual link, such as a virtual private network (VPN) link, for example.
  • VPN virtual private network
  • system 100 operates as follows.
  • User 132 through user interface 130 , directs graphics client 110 to display a series of images on display 120 .
  • Graphics client 110 receives the series of images as a series of digitized image “frames,” for example, by retrieving the series of frames from a storage on graphics client 110 or from user interface 130 .
  • each frame comprises scene model data describing elements arranged in a scene.
  • graphics client 110 partitions the frame into a plurality of server bands, each server band associated with a particular compute server 150 , based on a server load balancing factor and a prospective rendering factor.
  • Graphics client 110 distributes the server bands to the compute servers 150 .
  • Each compute server 150 (comprising a plurality of processing elements (PEs)) divides the received server bands (received as “raw display bands”) into PE blocks, each PE block associated with a particular PE, based on a PE load balancing factor.
  • the compute servers 150 divide the server bands into PE blocks based on the PE load balancing factor and prospective rendering information received from the graphics client 110 .
  • the compute servers 150 distribute the PE blocks to their PEs.
  • the PEs process the PE blocks, rendering the raw frame data and performing the computationally intensive work of turning the raw frame data into a form suitable for the target display 120 .
  • rendering can include ray tracing, ambient occlusion, and other techniques.
  • the PEs return the processed PE blocks to their parent compute server 150 , which assembles the processed PE blocks into a processed display band.
  • the compute servers 150 compress the processed display bands for transmission to graphics client 110 . In some embodiments, one or more compute servers 150 transmit the processed display bands without additional compression. Each compute server 150 determines the time each of its PEs took to render its PE block and the total rendering time for the entire raw display band.
  • the compute servers 150 adjust their PE load balancing factor based on the individual rendering times for each PE. In one embodiment, each compute server 150 also reports its total rendering time to graphics client 110 .
  • Graphics client 110 receives the processed display bands and assembles the bands into a processed frame. Graphics client 110 transmits the processed frame to display 120 for display to the user. In one embodiment, graphics client 110 modifies the load balancing factor based on reported rendering times received from the compute servers 150 .
  • graphics client 110 distributes unprocessed server bands to compute servers 150 based in part on the relative load between the servers and in part on prospective rendering information received from the user.
  • the compute servers 150 divide the unprocessed server bands into PE blocks based on the relative load between the PE blocks and the prospective rendering information.
  • the PEs process the blocks, which the compute servers 150 combine into processed bands and return to the graphics client 110 .
  • Graphics client 110 assembles the received processed bands into a form suitable for display to a user. Both the compute servers 150 and graphics client 110 use rendering times to adjust load balancing factors dynamically.
  • system 100 can dynamically distribute the workload among the elements performing computationally intensive tasks. As the frame data changes, certain portions of the frame become more computationally intensive than others, and the system can respond by reapportioning the tasks so as to keep the response times roughly equivalent. As one skilled in the art will understand, roughly equivalent response times indicate a balanced load and help to reduce idle time for the PEs/servers.
  • FIG. 2 is a block diagram illustrating an exemplary graphics client 200 in accordance with one embodiment of the present invention.
  • client 200 includes control processing unit (PU) 202 .
  • Control PU 202 is an otherwise conventional processing unit, configured as described herein.
  • client 200 is a PlayStation3TM (PS3).
  • PS3 PlayStation3TM
  • client 200 is an x86 machine.
  • client 200 is a thin client.
  • Client 200 also includes load balancing module 204 .
  • control PU 202 and load balancing module 204 partition a graphics image frame into a plurality of bands based on a server load balancing factor and a prospective rendering factor.
  • load balancing module 204 is configured to set and modify a server load balancing factor based on server response times and user input.
  • user input comprises manual server load balancing settings.
  • load balancing module 204 divides the frame into bands comprising the frame data, and system 200 transmits the divided frame data to the compute servers for rendering.
  • client 200 transmits coordinate information demarcating the boundaries of each band in the frame.
  • the coordinate information comprises coordinates referring to a cached (and commonly accessible) frame.
  • Load balancing module 204 is also configured to set and modify a prospective rendering factor based on scene model data, user input, and server response times.
  • user input comprises camera motion information.
  • camera motion information comprises a perspective, or camera “eye”, and a movement vector indicating the speed and direction of a change in perspective.
  • client 200 accepts user input including camera motion information and is therefore aware of the direction and speed of the eye's motion.
  • client 200 accepts user input including tracking information for a human user's eye movement, substituting the human user's eye movement for a camera eye movement.
  • load balancing module 204 can adjust the server band partitioning in advance, based on the expected change in computational load across the frame.
  • load balancing module 204 could divide the frame into three bands, one band comprising one-half of the disco ball, and two bands each comprising the entire background and one-quarter of the disco ball.
  • the camera eye movement information includes the direction and velocity of the camera or human eye change, as a “tracking vector.”
  • the camera eye movement information includes a target scene object, upon which the camera eye is focused, and the target scene object's relative distance from the current perspective point. That is, if the system is aware of a specific object that is the focus of the user's attention, a “target scene object,” the system can predict that the scene will shift to move that specific object toward the center or near-center of the viewing window. If, for example, the target scene object is located upward and rightward of the current perspective, the camera eye, and therefore the scene, will likely next shift upward and rightward, and the load balancing module can optimize the server band partitioning for that tracking vector.
  • load balancing module 204 uses the camera eye movement information and the scene model data to adjust the server band partitioning in advance, which tends to equalize the computational load across the compute servers.
  • load balancing module 204 uses the tracking vector, target scene object, and relative distance to determine the magnitude of the server band partitioning adjustments.
  • the magnitude of the server band partitioning adjustments is a measure of the “aggressiveness” of a server band partitioning.
  • client 200 distributes the server bands to their assigned compute servers. Client 200 receives processed display bands from the compute servers in return. In one embodiment, client 200 determines the response time for each compute server. In an alternate embodiment, client 200 receives reported response times from each compute server.
  • Client 200 also includes cache 206 .
  • Cache 206 is an otherwise conventional cache. Generally, client 200 stores processed and unprocessed frames, and other information, in cache 206 .
  • Client 200 also includes decompressor 208 .
  • client 200 receives compressed processed server bands from the compute servers.
  • decompressor 208 is configured to decompress compressed processed server bands.
  • Client 200 also includes display interface 210 , user interface 212 , and network interface 214 .
  • Display interface 210 is an otherwise conventional display interface, configured to interface with a display, such as display 120 of FIG. 1 , for example.
  • User interface 212 is an otherwise conventional user interface, configured, for example, as user interface 130 of FIG. 1 .
  • Network interface 214 is an otherwise conventional network interface, configured to interface with a network, such as network 140 of FIG. 1 , for example.
  • client 200 is a graphics client, such as graphics client 110 of FIG. 1 , for example. Accordingly, client 200 transmits raw server bands to computer servers for rendering and receives processed display bands for display.
  • FIG. 3 illustrates an exemplary compute server in accordance with one embodiment of the present invention.
  • FIG. 3 is a block diagram illustrating an exemplary compute server 300 in accordance with one embodiment of the present invention.
  • server 300 includes control processing unit (PU) 302 .
  • control PU 302 is an otherwise conventional processing unit, configured to operate as described below.
  • Server 300 also includes a plurality of processing elements (PEs) 310 .
  • each PE 310 is an otherwise conventional PE, configured with a local store 312 .
  • each PE 310 receives a PE block for rendering, renders the PE block, and returns a rendered PE block to the control PU 302 .
  • Server 300 also includes load balancing module 304 .
  • control PU 302 and load balancing module 304 partition a received raw display band into a plurality of PE blocks based on a PE load balancing factor.
  • load balancing module 304 is configured to set and modify a PE load balancing factor based on PE response times.
  • the PE load balancing factor includes a prospective rending factor, and load balancing module 304 is configured to modify the PE load balancing factor based on PE response times and user input.
  • load balancing module 304 divides the received raw display band into PE blocks comprising the frame data and control PU 302 transmits the divided frame data to the PEs for rendering.
  • control PU 302 transmits coordinate information demarcating the boundaries of each PE block.
  • the coordinate information comprises coordinates referring to a cached (and commonly accessible) frame.
  • server 300 distributes the PE blocks their assigned PEs.
  • the PEs 310 render their received PE blocks and return rendered PE blocks to control PU 302 .
  • each PE 310 stores a rendered PE block in cache 306 and indicates to control PU 302 that the PE has completed rendering its PE block.
  • server 300 also includes cache 306 .
  • Cache 306 is an otherwise conventional cache.
  • server 300 stores processed and unprocessed bands, PE blocks, and other information, in cache 306 .
  • Server 300 also includes compressor 308 .
  • the graphics client receives compressed processed server bands from the compute servers.
  • compressor 308 is configured to compress processed display bands for transmission to the graphics client.
  • Network interface 314 is an otherwise conventional network interface, configured to interface with a network, such as network 140 of FIG. 1 , for example.
  • server 300 receives raw display bands from a graphics client.
  • Control PU 302 and load balancing module 304 divide the received display band into PE blocks based on a PE load balancing factor.
  • the PEs 310 render their assigned blocks and control PU 302 assembles the rendered PE blocks into a processed display band.
  • Compressor 308 compresses the processed display band and server 300 transmits the processed display band to the graphics client.
  • control PU 302 adjusts the PE load balancing factor based on the rendering times for each PE 310 . In one embodiment, control PU 302 also determines a total rendering time for the entire display band and reports the total rendering time to the graphics client. Thus, generally, server 300 can modify the PE load balancing factor to adapt to changing loads on the PEs.
  • server 300 can balance the rendering load between the PEs, which in turn helps improve (minimize) response time.
  • the operation of the graphics client and the compute server are described in additional detail below. More particularly, the operation of an exemplary graphics client is described with respect to FIG. 4 , and the operation of an exemplary compute server is described with respect to FIG. 5 .
  • FIG. 4 illustrates one embodiment of a method for photorealistic imaging workload distribution. Specifically, FIG. 4 illustrates a high-level flow chart 400 that depicts logical operational steps performed by, for example, system 200 of FIG. 2 , which may be implemented in accordance with a preferred embodiment. Generally, control PU 202 performs the steps of the method, unless indicated otherwise.
  • system 200 receives a digital graphic image frame comprising scene model data for display.
  • system 200 can receive a frame from a user or other input.
  • system 200 receives user input.
  • user input includes camera movement information.
  • system 200 sets or modifies a server load balancing factor based on the received frame.
  • system 200 sets or modifies a prospective rendering factor based on received user input and scene model data.
  • system 200 partitions the frame into server bands based on the server load balancing factor and the prospective rendering factor.
  • system 200 is aware of the direction and speed of the camera eye's motion. As such, system 200 can pre-adjust the server workload without having to rely exclusively on reactive adjustments. For example, if the user “looks” up or down (moving the camera eye vertically), system 200 can decrease the size of the regions of the compute server on the leading edge to account for the new model geometry that is about to be introduced into the scene.
  • system 200 can adjust how aggressively to rebalance the workload based on the speed of the eye motion. If the camera eye is moving more quickly, system 200 can adjust the workload more aggressively. If the camera eye is moving more slowly, system 200 can adjust the workload less aggressively.
  • system 200 can tailor workload rebalancing according to the type of eye movement demonstrated by the user input. That is, certain types of eye movement respond best to different adjustment patterns. For example, zooming in or moving along the eye vector leads to less of an imbalance across compute servers. As such, system 200 can adjust the workload less aggressively in response to a rapid zoom function, for example, than in response to a rapid pan function.
  • system 200 partitions the frame into horizontal server bands. In an alternate embodiment, system 200 partitions the frame into vertical server bands. In an alternate embodiment, system 200 partitions the frame into horizontal or vertical server bands, depending on which alignment yields the more effective (load balancing) partitioning.
  • system 200 distributes the server bands to compute servers.
  • system 200 receives compressed processed display bands from the compute servers.
  • system 200 decompresses the received compressed processed display bands.
  • system 200 assembles a processed frame based on the processed display bands.
  • system 200 stores the processed frame.
  • system 200 displays an image based on the processed frame.
  • system 200 transmits the processed frame to a display module for display.
  • system 200 receives reported rendering times from the compute servers.
  • system 200 modifies the server load balancing based on the reported rendering times. The process returns to block 405 , wherein the graphics client receives a frame for processing.
  • FIG. 5 illustrates one embodiment of a method for photorealistic imaging workload distribution.
  • FIG. 5 illustrates a high-level flow chart 500 that depicts logical operational steps performed by, for example, system 300 of FIG. 3 , which may be implemented in accordance with a preferred embodiment.
  • compute PU 302 performs the steps of the method, unless indicated otherwise.
  • a compute server receives a raw display band from a graphics client.
  • system 300 of FIG. 3 receives a raw display band from a graphics client 200 of FIG. 2 .
  • system 300 partitions the raw display band into PE blocks based on a PE load balancing factor.
  • the raw display band includes camera movement information and system 300 partitions the raw display band into PE blocks based on a PE load balancing factor and the camera movement information.
  • system 300 partitions the raw display band in a similar fashion as does system 200 as described with respect to block 425 , above. Accordingly, system 300 can dynamically partition the raw display band to account for prospective changes in the composition of the frame image, helping to maintain load balance between the PEs.
  • system 300 distributes the PE blocks to the processing elements.
  • control PU 302 distributes the PE blocks to one or more PEs 310 .
  • each PE renders its received PE block.
  • the PEs 310 render their received PE blocks.
  • control PU 302 receives the rendered PE blocks from the PEs 310 .
  • control PU 302 receives a notification from the PEs 310 that the rendered blocks are available in cache 306 .
  • system 300 combines the rendered PE blocks into a processed display band.
  • system 300 compresses the processed display band for transmission to the graphics client.
  • compressor 308 compresses the processed display band for transmission to the graphics client.
  • system 300 transmits the compressed display band to the graphics client.
  • system 300 determines a render time for each PE. For example, control PU 302 determines a render time for each PE 310 .
  • system 300 reports the rendering time to the graphics client. In one embodiment, system 300 calculates the total rendering time for the processed display band, based on the slowest PE, and reports the total rendering time to the graphics client. In an alternate embodiment, system 300 reports the rendering time for each PE to the graphics client.
  • system 300 adjusts the PE load balancing factor based on the rendering time for each PE. As described above, system 300 can set the PE load balancing factor to divide the workload among the PEs such that each PE takes approximately the same amount of time to complete its rendering task.
  • the disclosed embodiments provide numerous advantages over other methods and systems. For example, the disclosed embodiments improve balanced workload distribution over current approaches, especially work-stealing systems. Because the disclosed embodiments better distribute the computational workload, work-stealing is unnecessary, and the computational units can retain relevant cache data without also incurring the penalties inherent in re-tasking a processing element under common work-stealing schema.
  • the disclosed embodiments provide the balance of photorealistic imaging workload distribution, especially in ray tracing applications.
  • the rendering system spends less time stalled for data.
  • the disclosed embodiments offer methods that maintain focus of a computational unit on a particular region, even as that region is expanded or reduced to maintain relative workload. As such, any particular computational unit is more likely to retain useful frame data in its cache, which improves cache hit rates. Moreover, the improved cache hit rates overcome the slightly increased intra-frame stalls, improving the overall rendering time.
  • the disclosed embodiments provide a system and method that dynamically adjusts the workload based on prospective rendering tasking. As such, the disclosed embodiments can reduce the performance impact of a rapidly moving camera eye by anticipating changes in the computational intensity of regions in the scene. Other technical advantages will be apparent to one of ordinary skill in the relevant arts.
  • FIG. 6 is a block diagram providing details illustrating an exemplary computer system employable to practice one or more of the embodiments described herein.
  • FIG. 6 illustrates a computer system 600 .
  • Computer system 600 includes computer 602 .
  • Computer 602 is an otherwise conventional computer and includes at least one processor 610 .
  • Processor 610 is an otherwise conventional computer processor and can comprise a single-core, dual-core, central processing unit (PU), synergistic PU, attached PU, or other suitable processors.
  • PU central processing unit
  • Bus 612 is an otherwise conventional system bus. As illustrated, the various components of computer 602 couple to bus 612 .
  • computer 602 also includes memory 620 , which couples to processor 610 through bus 612 .
  • Memory 620 is an otherwise conventional computer main memory, and can comprise, for example, random access memory (RAM).
  • RAM random access memory
  • memory 620 stores applications 622 , an operating system 624 , and access functions 626 .
  • applications 622 are otherwise conventional software program applications, and can comprise any number of typical programs, as well as computer programs incorporating one or more embodiments of the present invention.
  • Operating system 624 is an otherwise conventional operating system, and can include, for example, Unix, AIX, Linux, Microsoft WindowsTM, MacOSTM, and other suitable operating systems.
  • Access functions 626 are otherwise conventional access functions, including networking functions, and can be include in operating system 624 .
  • Computer 602 also includes storage 630 .
  • storage 630 is an otherwise conventional device and/or devices for storing data.
  • storage 630 can comprise a hard disk 632 , flash or other volatile memory 634 , and/or optical storage devices 636 .
  • flash or other volatile memory 634 can be any type of volatile memory
  • optical storage devices 636 can also be employed.
  • I/O interface 640 also couples to bus 612 .
  • I/O interface 640 is an otherwise conventional interface. As illustrated, I/O interface 640 couples to devices external to computer 602 . In particular, I/O interface 640 couples to user input device 642 and display device 644 .
  • Input device 642 is an otherwise conventional input device and can include, for example, mice, keyboards, numeric keypads, touch sensitive screens, microphones, webcams, and other suitable input devices.
  • Display device 644 is an otherwise conventional display device and can include, for example, monitors, LCD displays, GUI screens, text screens, touch sensitive screens, Braille displays, and other suitable display devices.
  • a network adapter 650 also couples to bus 612 .
  • Network adapter 650 is an otherwise conventional network adapter, and can comprise, for example, a wireless, Ethernet, LAN, WAN, or other suitable adapter. As illustrated, network adapter 650 can couple computer 602 to other computers and devices 652 . Other computers and devices 652 are otherwise conventional computers and devices typically employed in a networking environment. One skilled in the art will understand that there are many other networking configurations suitable for computer 602 and computer system 600 .
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s).
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.

Abstract

A graphics client receives a frame, the frame comprising scene model data. A server load balancing factor is set based on the scene model data. A prospective rendering factor is set based on the scene model data. The frame is partitioned into a plurality of server bands based on the server load balancing factor and the prospective rendering factor. The server bands are distributed to a plurality of compute servers. Processed server bands are received from the compute servers. A processed frame is assembled based on the received processed server bands. The processed frame is transmitted for display to a user as an image.

Description

    TECHNICAL FIELD
  • The present invention relates generally to the field of computer networking and parallel processing and, more particularly, to a system and method for improved photorealistic imaging workload distribution.
  • BACKGROUND OF THE INVENTION
  • Modern electronic computing systems, such as microprocessor systems, are often configured to divide a computationally-intensive task into discrete sub-tasks. For heterogeneous systems, some systems employ cache-aware task decomposition to improve performance on distributed applications. As technology advances, the gap between fast local caches and large slower memory widens, and caching becomes even more important. Generally, typical modern systems attempt to distribute work across multiple processing elements (PEs) so as to improve cache hit rates and reduce data stall times.
  • For example, ray tracing, a photorealistic imaging technique, is a computationally expensive algorithm that usually does not have fixed data access patterns. However, ray tracing tasks can nevertheless have a very high spatial and temporal locality. As such, a cache aware task distribution for ray tracing applications can lead to high performance gains.
  • But typical ray tracing approaches cannot be configured to take full advantage of cache aware task distribution. For example, current ray tracers decompose the rendering problem by breaking up an image into tiles. Typical ray tracers either expressly distribute these tiles among computational units or greedily reserve the tiles for access by the PEs through work stealing.
  • Both of these approaches suffer from significant disadvantages. In typical express distribution systems, the additional workload required to manage the distribution of tiles inhibits performance. In some cases, this additional workload can mitigate any gains achieved through managed distribution.
  • In typical work-stealing systems, each PE grabs new tiles after it has processed its prior allotment. But since the PEs grab the tiles from a general pool, the tiles are less likely to have a high spatial locality. Thus, in a work-stealing system, the PEs regularly flush their caches with new scene data and are therefore cold for the next frame, completely failing to take any advantage of the task's spatial locality.
  • BRIEF SUMMARY
  • The following summary is provided to facilitate an understanding of some of the innovative features unique to the embodiments disclosed and is not intended to be a full description. A full appreciation of the various aspects of the embodiments can be gained by taking into consideration the entire specification, claims, drawings, and abstract as a whole.
  • A graphics client receives a frame, the frame comprising scene model data. A server load balancing factor is set based on the scene model data. A prospective rendering factor is set based on the scene model data. The frame is partitioned into a plurality of server bands based on the server load balancing factor and the prospective rendering factor. The server bands are distributed to a plurality of compute servers. Processed server bands are received from the compute servers. A processed frame is assembled based on the received processed server bands. The processed frame is transmitted for display to a user as an image.
  • In an alternate embodiment, a system comprises a graphics client. The graphics client is configured to receive a frame, the frame comprising scene model data; set a server load balancing factor based on the scene model data; set a prospective rendering factor based on the scene model data; partition the frame into a plurality of server bands based on the server load balancing factor and the prospective rendering factor; distribute the plurality of server bands to a plurality of compute servers; receive processed server bands from the plurality of compute servers; assemble a processed frame based on the received processed server bands; and transmit the processed frame for display to a user as an image.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The accompanying figures, in which like reference numerals refer to identical or functionally-similar elements throughout the separate views and which are incorporated in and form a part of the specification, further illustrate the embodiments and, together with the detailed description, serve to explain the embodiments disclosed herein.
  • FIG. 1 illustrates a block diagram showing an improved photorealistic imaging system in accordance with a preferred embodiment;
  • FIG. 2 illustrates a block diagram showing an improved graphics client in accordance with a preferred embodiment;
  • FIG. 3 illustrates a block diagram showing an improved compute server in accordance with a preferred embodiment;
  • FIG. 4 illustrates a high-level flow diagram depicting logical operational steps of an improved photorealistic imaging workload distribution method, which can be implemented in accordance with a preferred embodiment;
  • FIG. 5 illustrates a high-level flow diagram depicting logical operational steps of an improved photorealistic imaging workload distribution method, which can be implemented in accordance with a preferred embodiment; and
  • FIG. 6 illustrates a block diagram showing an exemplary computer system that can be configured to incorporate one or more preferred embodiments.
  • DETAILED DESCRIPTION
  • The particular values and configurations discussed in these non-limiting examples can be varied and are cited merely to illustrate at least one embodiment and are not intended to limit the scope of the invention.
  • In the following discussion, numerous specific details are set forth to provide a thorough understanding of the present invention. Those skilled in the art will appreciate that the present invention may be practiced without such specific details. In other instances, well-known elements have been illustrated in schematic or block diagram form in order not to obscure the present invention in unnecessary detail. Additionally, for the most part, details concerning network communications, electro-magnetic signaling techniques, user interface or input/output techniques, and the like, have been omitted inasmuch as such details are not considered necessary to obtain a complete understanding of the present invention, and are considered to be within the understanding of persons of ordinary skill in the relevant art.
  • As will be appreciated by one skilled in the art, the present invention may be embodied as a system, method or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, the present invention may take the form of a computer program product embodied in any tangible medium of expression having computer usable program code embodied in the medium.
  • Any combination of one or more computer usable or computer readable medium(s) may be utilized. The computer-usable or computer-readable medium may be, for example but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CDROM), an optical storage device, a transmission media such as those supporting the Internet or an intranet, or a magnetic storage device. Note that the computer-usable or computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted, or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory. In the context of this document, a computer-usable or computer-readable medium may be any medium that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. The computer-usable medium may include a propagated data signal with the computer-usable program code embodied therewith, either in baseband or as part of a carrier wave. The computer usable program code may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc.
  • Computer program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
  • The present invention is described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • These computer program instructions may also be stored in a computer-readable medium that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable medium produce an article of manufacture including instruction means which implement the function/act specified in the flowchart and/or block diagram block or blocks.
  • The computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • A data processing system suitable for storing and/or executing program code will include at least one processor coupled directly or indirectly to memory elements through a system bus. The memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution.
  • Input/output or I/O devices (including but not limited to keyboards, displays, pointing devices, etc.) can be coupled to the system either directly or through intervening I/O controllers. Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modems and Ethernet cards are just a few of the currently available types of network adapters.
  • Referring now to the drawings, FIG. 1 is a high-level block diagram illustrating certain components of a system 100 for improved photorealistic imaging workload distribution, in accordance with a preferred embodiment of the present invention. System 100 comprises a graphics client 110.
  • Graphics client 110 is a graphics client module or device, as described in more detail in conjunction with FIG. 2, below. Graphics client 110 couples to display 120. Display 120 is an otherwise conventional display, configured to display digitized graphical images to a user.
  • Graphics client 110 also couples to a user interface 130. User interface 130 is an otherwise conventional user interface, configured to send information to, and receive information from, a user 132. In one embodiment, graphics client 110 receives user input from user interface 130. In one embodiment, user input comprises a plurality of image frames, each frame comprising scene model data, the scene model data describing objects arranged in an image. In one embodiment, user input also comprises camera movement commands describing perspective (or “eye”) movement from one image frame to another.
  • In the illustrated embodiment, graphics client 110 also couples to network 140. Network 140 is an otherwise conventional network. In one embodiment, network 140 is a gigabit Ethernet network. In an alternate embodiment, network 140 is an Infiniband network.
  • Network 140 couples to a plurality of compute servers 150. Each compute server 150 is a compute server as described in more detail in conjunction with FIG. 3, below. In the illustrated embodiment, graphics client 110 couples to the compute servers 150 through network 140.
  • In an alternate embodiment, graphics client 110 couples to one or more computer servers 150 through a direct link 152. In one embodiment, link 152 is a direct physical link. In an alternate embodiment, link 152 is a virtual link, such as a virtual private network (VPN) link, for example.
  • Generally, in an exemplary operation, described in more detail below, system 100 operates as follows. User 132, through user interface 130, directs graphics client 110 to display a series of images on display 120. Graphics client 110 receives the series of images as a series of digitized image “frames,” for example, by retrieving the series of frames from a storage on graphics client 110 or from user interface 130. Generally, each frame comprises scene model data describing elements arranged in a scene.
  • For each frame, graphics client 110 partitions the frame into a plurality of server bands, each server band associated with a particular compute server 150, based on a server load balancing factor and a prospective rendering factor. Graphics client 110 distributes the server bands to the compute servers 150. Each compute server 150 (comprising a plurality of processing elements (PEs)) divides the received server bands (received as “raw display bands”) into PE blocks, each PE block associated with a particular PE, based on a PE load balancing factor. In some embodiments, the compute servers 150 divide the server bands into PE blocks based on the PE load balancing factor and prospective rendering information received from the graphics client 110. The compute servers 150 distribute the PE blocks to their PEs.
  • The PEs process the PE blocks, rendering the raw frame data and performing the computationally intensive work of turning the raw frame data into a form suitable for the target display 120. In photorealistic imaging processing, rendering can include ray tracing, ambient occlusion, and other techniques. The PEs return the processed PE blocks to their parent compute server 150, which assembles the processed PE blocks into a processed display band.
  • In some embodiments, the compute servers 150 compress the processed display bands for transmission to graphics client 110. In some embodiments, one or more compute servers 150 transmit the processed display bands without additional compression. Each compute server 150 determines the time each of its PEs took to render its PE block and the total rendering time for the entire raw display band.
  • The compute servers 150 adjust their PE load balancing factor based on the individual rendering times for each PE. In one embodiment, each compute server 150 also reports its total rendering time to graphics client 110.
  • Graphics client 110 receives the processed display bands and assembles the bands into a processed frame. Graphics client 110 transmits the processed frame to display 120 for display to the user. In one embodiment, graphics client 110 modifies the load balancing factor based on reported rendering times received from the compute servers 150.
  • Thus, as described generally above and in more detail below, graphics client 110 distributes unprocessed server bands to compute servers 150 based in part on the relative load between the servers and in part on prospective rendering information received from the user. The compute servers 150 divide the unprocessed server bands into PE blocks based on the relative load between the PE blocks and the prospective rendering information. The PEs process the blocks, which the compute servers 150 combine into processed bands and return to the graphics client 110. Graphics client 110 assembles the received processed bands into a form suitable for display to a user. Both the compute servers 150 and graphics client 110 use rendering times to adjust load balancing factors dynamically.
  • As such, system 100 can dynamically distribute the workload among the elements performing computationally intensive tasks. As the frame data changes, certain portions of the frame become more computationally intensive than others, and the system can respond by reapportioning the tasks so as to keep the response times roughly equivalent. As one skilled in the art will understand, roughly equivalent response times indicate a balanced load and help to reduce idle time for the PEs/servers.
  • FIG. 2 is a block diagram illustrating an exemplary graphics client 200 in accordance with one embodiment of the present invention. In particular, client 200 includes control processing unit (PU) 202. Control PU 202 is an otherwise conventional processing unit, configured as described herein. In one embodiment, client 200 is a PlayStation3™ (PS3). In an alternate embodiment, client 200 is an x86 machine. In an alternate embodiment, client 200 is a thin client.
  • Client 200 also includes load balancing module 204. Generally, control PU 202 and load balancing module 204 partition a graphics image frame into a plurality of bands based on a server load balancing factor and a prospective rendering factor. In particular, in one embodiment, load balancing module 204 is configured to set and modify a server load balancing factor based on server response times and user input. In one embodiment, user input comprises manual server load balancing settings.
  • In one embodiment, load balancing module 204 divides the frame into bands comprising the frame data, and system 200 transmits the divided frame data to the compute servers for rendering. In an alternate embodiment, client 200 transmits coordinate information demarcating the boundaries of each band in the frame. In one embodiment, the coordinate information comprises coordinates referring to a cached (and commonly accessible) frame.
  • Load balancing module 204 is also configured to set and modify a prospective rendering factor based on scene model data, user input, and server response times. In one embodiment, user input comprises camera motion information. In one embodiment, camera motion information comprises a perspective, or camera “eye”, and a movement vector indicating the speed and direction of a change in perspective.
  • For example, in one embodiment, client 200 accepts user input including camera motion information and is therefore aware of the direction and speed of the eye's motion. In an alternate embodiment, client 200 accepts user input including tracking information for a human user's eye movement, substituting the human user's eye movement for a camera eye movement. As such, load balancing module 204 can adjust the server band partitioning in advance, based on the expected change in computational load across the frame.
  • That is, one skilled in the art will understand that certain parts of the frame are more computationally intensive than other parts. For example, a frame segment consisting of only a solid, single-color background is much less computationally intensive than a frame segment containing a disco ball reflecting light from multiple sources. Thus, for example, load balancing module 204 could divide the frame into three bands, one band comprising one-half of the disco ball, and two bands each comprising the entire background and one-quarter of the disco ball.
  • Further, when the camera eye changes, the scene elements in the frame (e.g., the disco ball) occupy more or less of the frame, in a different location of the frame. In one embodiment, the camera eye movement information includes the direction and velocity of the camera or human eye change, as a “tracking vector.” In an alternate embodiment, the camera eye movement information includes a target scene object, upon which the camera eye is focused, and the target scene object's relative distance from the current perspective point. That is, if the system is aware of a specific object that is the focus of the user's attention, a “target scene object,” the system can predict that the scene will shift to move that specific object toward the center or near-center of the viewing window. If, for example, the target scene object is located upward and rightward of the current perspective, the camera eye, and therefore the scene, will likely next shift upward and rightward, and the load balancing module can optimize the server band partitioning for that tracking vector.
  • As such, in one embodiment, load balancing module 204 uses the camera eye movement information and the scene model data to adjust the server band partitioning in advance, which tends to equalize the computational load across the compute servers. In one embodiment, load balancing module 204 uses the tracking vector, target scene object, and relative distance to determine the magnitude of the server band partitioning adjustments. In one embodiment, the magnitude of the server band partitioning adjustments is a measure of the “aggressiveness” of a server band partitioning.
  • Generally, having partitioned the frame into server bands, client 200 distributes the server bands to their assigned compute servers. Client 200 receives processed display bands from the compute servers in return. In one embodiment, client 200 determines the response time for each compute server. In an alternate embodiment, client 200 receives reported response times from each compute server.
  • Client 200 also includes cache 206. Cache 206 is an otherwise conventional cache. Generally, client 200 stores processed and unprocessed frames, and other information, in cache 206.
  • Client 200 also includes decompressor 208. In one embodiment, client 200 receives compressed processed server bands from the compute servers. As such, decompressor 208 is configured to decompress compressed processed server bands.
  • Client 200 also includes display interface 210, user interface 212, and network interface 214. Display interface 210 is an otherwise conventional display interface, configured to interface with a display, such as display 120 of FIG. 1, for example. User interface 212 is an otherwise conventional user interface, configured, for example, as user interface 130 of FIG. 1. Network interface 214 is an otherwise conventional network interface, configured to interface with a network, such as network 140 of FIG. 1, for example.
  • As described above, client 200 is a graphics client, such as graphics client 110 of FIG. 1, for example. Accordingly, client 200 transmits raw server bands to computer servers for rendering and receives processed display bands for display. FIG. 3 illustrates an exemplary compute server in accordance with one embodiment of the present invention.
  • In particular, FIG. 3 is a block diagram illustrating an exemplary compute server 300 in accordance with one embodiment of the present invention. In particular, server 300 includes control processing unit (PU) 302. As illustrated, control PU 302 is an otherwise conventional processing unit, configured to operate as described below.
  • Server 300 also includes a plurality of processing elements (PEs) 310. Generally, each PE 310 is an otherwise conventional PE, configured with a local store 312. As described in more detail below, each PE 310 receives a PE block for rendering, renders the PE block, and returns a rendered PE block to the control PU 302.
  • Server 300 also includes load balancing module 304. Generally, control PU 302 and load balancing module 304 partition a received raw display band into a plurality of PE blocks based on a PE load balancing factor. In particular, in one embodiment, load balancing module 304 is configured to set and modify a PE load balancing factor based on PE response times. In an alternate embodiment, the PE load balancing factor includes a prospective rending factor, and load balancing module 304 is configured to modify the PE load balancing factor based on PE response times and user input.
  • In one embodiment, load balancing module 304 divides the received raw display band into PE blocks comprising the frame data and control PU 302 transmits the divided frame data to the PEs for rendering. In an alternate embodiment, control PU 302 transmits coordinate information demarcating the boundaries of each PE block. In one embodiment, the coordinate information comprises coordinates referring to a cached (and commonly accessible) frame.
  • Generally, having partitioned the raw display bands into PE blocks, server 300 distributes the PE blocks their assigned PEs. The PEs 310 render their received PE blocks and return rendered PE blocks to control PU 302. In one embodiment, each PE 310 stores a rendered PE block in cache 306 and indicates to control PU 302 that the PE has completed rendering its PE block.
  • As such, server 300 also includes cache 306. Cache 306 is an otherwise conventional cache. Generally, server 300 stores processed and unprocessed bands, PE blocks, and other information, in cache 306.
  • Server 300 also includes compressor 308. In one embodiment, the graphics client receives compressed processed server bands from the compute servers. As such, compressor 308 is configured to compress processed display bands for transmission to the graphics client.
  • Server 300 also includes network interface 314. Network interface 314 is an otherwise conventional network interface, configured to interface with a network, such as network 140 of FIG. 1, for example.
  • Generally, server 300 receives raw display bands from a graphics client. Control PU 302 and load balancing module 304 divide the received display band into PE blocks based on a PE load balancing factor. The PEs 310 render their assigned blocks and control PU 302 assembles the rendered PE blocks into a processed display band. Compressor 308 compresses the processed display band and server 300 transmits the processed display band to the graphics client.
  • In one embodiment, control PU 302 adjusts the PE load balancing factor based on the rendering times for each PE 310. In one embodiment, control PU 302 also determines a total rendering time for the entire display band and reports the total rendering time to the graphics client. Thus, generally, server 300 can modify the PE load balancing factor to adapt to changing loads on the PEs.
  • Thus, server 300 can balance the rendering load between the PEs, which in turn helps improve (minimize) response time. The operation of the graphics client and the compute server are described in additional detail below. More particularly, the operation of an exemplary graphics client is described with respect to FIG. 4, and the operation of an exemplary compute server is described with respect to FIG. 5.
  • FIG. 4 illustrates one embodiment of a method for photorealistic imaging workload distribution. Specifically, FIG. 4 illustrates a high-level flow chart 400 that depicts logical operational steps performed by, for example, system 200 of FIG. 2, which may be implemented in accordance with a preferred embodiment. Generally, control PU 202 performs the steps of the method, unless indicated otherwise.
  • As indicated at block 405, the process begins, wherein system 200 receives a digital graphic image frame comprising scene model data for display. For example, system 200 can receive a frame from a user or other input. Next, as illustrated at block 410, system 200 receives user input. As described above, in one embodiment, user input includes camera movement information.
  • Next, as illustrated at block 415, system 200 sets or modifies a server load balancing factor based on the received frame. Next, as illustrated at block 420, system 200 sets or modifies a prospective rendering factor based on received user input and scene model data. Next, as illustrated at block 425, system 200 partitions the frame into server bands based on the server load balancing factor and the prospective rendering factor.
  • Based on the user input and the prospective rendering factor, system 200 is aware of the direction and speed of the camera eye's motion. As such, system 200 can pre-adjust the server workload without having to rely exclusively on reactive adjustments. For example, if the user “looks” up or down (moving the camera eye vertically), system 200 can decrease the size of the regions of the compute server on the leading edge to account for the new model geometry that is about to be introduced into the scene.
  • Moreover, system 200 can adjust how aggressively to rebalance the workload based on the speed of the eye motion. If the camera eye is moving more quickly, system 200 can adjust the workload more aggressively. If the camera eye is moving more slowly, system 200 can adjust the workload less aggressively.
  • Additionally, system 200 can tailor workload rebalancing according to the type of eye movement demonstrated by the user input. That is, certain types of eye movement respond best to different adjustment patterns. For example, zooming in or moving along the eye vector leads to less of an imbalance across compute servers. As such, system 200 can adjust the workload less aggressively in response to a rapid zoom function, for example, than in response to a rapid pan function.
  • In one embodiment, system 200 partitions the frame into horizontal server bands. In an alternate embodiment, system 200 partitions the frame into vertical server bands. In an alternate embodiment, system 200 partitions the frame into horizontal or vertical server bands, depending on which alignment yields the more effective (load balancing) partitioning.
  • Next, as illustrated at block 430, system 200 distributes the server bands to compute servers. Next, as illustrated at block 435, system 200 receives compressed processed display bands from the compute servers. Next, as illustrated at block 440, system 200 decompresses the received compressed processed display bands.
  • Next, as illustrated at block 445, system 200 assembles a processed frame based on the processed display bands. Next, as illustrated at block 450, system 200 stores the processed frame. Next, as illustrated at block 455, system 200 displays an image based on the processed frame. As described above, in one embodiment, system 200 transmits the processed frame to a display module for display.
  • Next, as illustrated at block 460, system 200 receives reported rendering times from the compute servers. Next, as illustrated at block 465, system 200 modifies the server load balancing based on the reported rendering times. The process returns to block 405, wherein the graphics client receives a frame for processing.
  • FIG. 5 illustrates one embodiment of a method for photorealistic imaging workload distribution. Specifically, FIG. 5 illustrates a high-level flow chart 500 that depicts logical operational steps performed by, for example, system 300 of FIG. 3, which may be implemented in accordance with a preferred embodiment. Generally, compute PU 302 performs the steps of the method, unless indicated otherwise.
  • As illustrated at block 505, the process begins, wherein a compute server receives a raw display band from a graphics client. For example, system 300 of FIG. 3 receives a raw display band from a graphics client 200 of FIG. 2. Next, as illustrated at block 510, system 300 partitions the raw display band into PE blocks based on a PE load balancing factor.
  • In one embodiment, the raw display band includes camera movement information and system 300 partitions the raw display band into PE blocks based on a PE load balancing factor and the camera movement information. In one embodiment, system 300 partitions the raw display band in a similar fashion as does system 200 as described with respect to block 425, above. Accordingly, system 300 can dynamically partition the raw display band to account for prospective changes in the composition of the frame image, helping to maintain load balance between the PEs.
  • Next, as illustrated at block 515, system 300 distributes the PE blocks to the processing elements. For example, control PU 302 distributes the PE blocks to one or more PEs 310. Next, as illustrated at block 520, each PE renders its received PE block. For example, the PEs 310 render their received PE blocks.
  • Next, as illustrated at block 525, control PU 302 receives the rendered PE blocks from the PEs 310. As described above, in one embodiment, control PU 302 receives a notification from the PEs 310 that the rendered blocks are available in cache 306. Next, as illustrated at block 530, system 300 combines the rendered PE blocks into a processed display band.
  • Next, as illustrated at block 535, system 300 compresses the processed display band for transmission to the graphics client. For example, compressor 308 compresses the processed display band for transmission to the graphics client. Next, as illustrated at block 540, system 300 transmits the compressed display band to the graphics client.
  • Next, as illustrated at block 545, system 300 determines a render time for each PE. For example, control PU 302 determines a render time for each PE 310. Next, as illustrated at block 545, system 300 reports the rendering time to the graphics client. In one embodiment, system 300 calculates the total rendering time for the processed display band, based on the slowest PE, and reports the total rendering time to the graphics client. In an alternate embodiment, system 300 reports the rendering time for each PE to the graphics client.
  • Next, as illustrated at block 555, system 300 adjusts the PE load balancing factor based on the rendering time for each PE. As described above, system 300 can set the PE load balancing factor to divide the workload among the PEs such that each PE takes approximately the same amount of time to complete its rendering task.
  • Accordingly, the disclosed embodiments provide numerous advantages over other methods and systems. For example, the disclosed embodiments improve balanced workload distribution over current approaches, especially work-stealing systems. Because the disclosed embodiments better distribute the computational workload, work-stealing is unnecessary, and the computational units can retain relevant cache data without also incurring the penalties inherent in re-tasking a processing element under common work-stealing schema.
  • More specifically, the disclosed embodiments provide the balance of photorealistic imaging workload distribution, especially in ray tracing applications. By actively managing the computationally intensive regions of a frame, and stalling the computational units waiting for the next frame, the rendering system spends less time stalled for data.
  • Further, the disclosed embodiments offer methods that maintain focus of a computational unit on a particular region, even as that region is expanded or reduced to maintain relative workload. As such, any particular computational unit is more likely to retain useful frame data in its cache, which improves cache hit rates. Moreover, the improved cache hit rates overcome the slightly increased intra-frame stalls, improving the overall rendering time.
  • Additionally, the disclosed embodiments provide a system and method that dynamically adjusts the workload based on prospective rendering tasking. As such, the disclosed embodiments can reduce the performance impact of a rapidly moving camera eye by anticipating changes in the computational intensity of regions in the scene. Other technical advantages will be apparent to one of ordinary skill in the relevant arts.
  • As described above, one or more embodiments described herein may be practiced or otherwise embodied in a computer system. Generally, the term “computer,” as used herein, refers to any automated computing machinery. The term “computer” therefore includes not only general purpose computers such as laptops, personal computers, minicomputers, and mainframes, but also devices such as personal digital assistants (PDAs), network enabled handheld devices, internet or network enabled mobile telephones, and other suitable devices. FIG. 6 is a block diagram providing details illustrating an exemplary computer system employable to practice one or more of the embodiments described herein.
  • Specifically, FIG. 6 illustrates a computer system 600. Computer system 600 includes computer 602. Computer 602 is an otherwise conventional computer and includes at least one processor 610. Processor 610 is an otherwise conventional computer processor and can comprise a single-core, dual-core, central processing unit (PU), synergistic PU, attached PU, or other suitable processors.
  • Processor 610 couples to system bus 612. Bus 612 is an otherwise conventional system bus. As illustrated, the various components of computer 602 couple to bus 612. For example, computer 602 also includes memory 620, which couples to processor 610 through bus 612. Memory 620 is an otherwise conventional computer main memory, and can comprise, for example, random access memory (RAM). Generally, memory 620 stores applications 622, an operating system 624, and access functions 626.
  • Generally, applications 622 are otherwise conventional software program applications, and can comprise any number of typical programs, as well as computer programs incorporating one or more embodiments of the present invention. Operating system 624 is an otherwise conventional operating system, and can include, for example, Unix, AIX, Linux, Microsoft Windows™, MacOS™, and other suitable operating systems. Access functions 626 are otherwise conventional access functions, including networking functions, and can be include in operating system 624.
  • Computer 602 also includes storage 630. Generally, storage 630 is an otherwise conventional device and/or devices for storing data. As illustrated, storage 630 can comprise a hard disk 632, flash or other volatile memory 634, and/or optical storage devices 636. One skilled in the art will understand that other storage media can also be employed.
  • An I/O interface 640 also couples to bus 612. I/O interface 640 is an otherwise conventional interface. As illustrated, I/O interface 640 couples to devices external to computer 602. In particular, I/O interface 640 couples to user input device 642 and display device 644. Input device 642 is an otherwise conventional input device and can include, for example, mice, keyboards, numeric keypads, touch sensitive screens, microphones, webcams, and other suitable input devices. Display device 644 is an otherwise conventional display device and can include, for example, monitors, LCD displays, GUI screens, text screens, touch sensitive screens, Braille displays, and other suitable display devices.
  • A network adapter 650 also couples to bus 612. Network adapter 650 is an otherwise conventional network adapter, and can comprise, for example, a wireless, Ethernet, LAN, WAN, or other suitable adapter. As illustrated, network adapter 650 can couple computer 602 to other computers and devices 652. Other computers and devices 652 are otherwise conventional computers and devices typically employed in a networking environment. One skilled in the art will understand that there are many other networking configurations suitable for computer 602 and computer system 600.
  • The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
  • One skilled in the art will appreciate that variations of the above-disclosed and other features and functions, or alternatives thereof, may be desirably combined into many other different systems or applications. Additionally, various presently unforeseen or unanticipated alternatives, modifications, variations or improvements therein may be subsequently made by those skilled in the art, which are also intended to be encompassed by the following claims.

Claims (25)

1. A method, comprising:
receiving, by a graphics client, a frame, the frame comprising scene model data;
setting a server load balancing factor based on the scene model data;
setting a prospective rendering factor based on the scene model data;
partitioning the frame into a plurality of server bands based on the server load balancing factor and the prospective rendering factor;
distributing the plurality of server bands to a plurality of compute servers;
receiving processed server bands from the plurality of compute servers;
assembling a processed frame based on the received processed server bands; and
transmitting the processed frame for display to a user as an image.
2. The method of claim 1, further comprising:
receiving user input; and
wherein setting the prospective rendering factor further comprises setting the prospective rendering factor based on the scene model data and received user input.
3. The method of claim 1, wherein partitioning the frame further comprises selecting between horizontal server bands and vertical server bands.
4. The method of claim 1, further comprising:
receiving reported rendering times from at least one of the plurality of servers; and
wherein setting the server load balancing factor further comprises setting the server load balancing factor based on the scene model data and the reported rendering times.
5. The method of claim 1, wherein assembling a processed frame band further comprises decompressing the received processed server bands.
6. A computer program product for processing a digitized graphic frame, the computer program product stored on a computer usable medium having computer usable program code embodied therewith, the computer useable program code comprising:
computer usable program code configured to receive a frame, the frame comprising scene model data;
computer usable program code configured to set a server load balancing factor based on the scene model data;
computer usable program code configured to set a prospective rendering factor based on the scene model data;
computer usable program code configured to partition the frame into a plurality of server bands based on the server load balancing factor and the prospective rendering factor;
computer usable program code configured to distribute the plurality of server bands to a plurality of compute servers;
computer usable program code configured to receive processed server bands from the plurality of compute servers;
computer usable program code configured to assemble a processed frame based on the received processed server bands; and
computer usable program code configured to transmit the processed frame for display to a user as an image.
7. The computer program product of claim 6, further comprising:
computer usable program code configured to receive user input; and
wherein setting the prospective rendering factor further comprises setting the prospective rendering factor based on the scene model data and received user input.
8. The computer program product of claim 6, wherein partitioning the frame further comprises selecting between horizontal server bands and vertical server bands.
9. The computer program product of claim 6, further comprising:
computer usable program code configured to receiving reported rendering times from at least one of the plurality of servers; and
wherein setting the server load balancing factor further comprises setting the server load balancing factor based on the scene model data and the reported rendering times.
10. The computer program product of claim 1, wherein assembling a processed frame band further comprises decompressing the received processed server bands.
11. A method, comprising:
receiving, by a compute server, a raw display band, the raw display band comprising scene model data;
the compute server comprising a plurality of processing elements (PEs);
partitioning the raw display band into a plurality of PE blocks based on a PE load balancing factor;
distributing the plurality of PE blocks to the plurality of PEs;
rendering, by each PE, the PE blocks, to generate rendered PE blocks;
combining, by the compute server, the rendered PE blocks, to generate a processed display band;
determining, by the compute server, a rendering time for each PE;
modifying the PE load balancing factor based on the determined rendering times; and
transmitting the processed display band to a graphics client.
12. The method of claim 11, wherein transmitting comprises compressing the processed display band.
13. The method of claim 11, further comprising reporting a rendering time to the graphics client based on the determined rendering times.
14. The method of claim 11, further comprising:
wherein the raw display band further comprises prospective rendering input; and
wherein partitioning the raw display band comprises partitioning based on the PE load balancing factor and the prospective rendering input.
15. The method of claim 11, wherein modifying the PE load balancing factor further comprises modifying the PE load balancing factor based on the determined rendering times and received prospective rendering input.
16. A computer program product for processing a digitized graphic frame, the computer program product stored on a computer usable medium having computer usable program code embodied therewith, the computer useable program code comprising:
computer usable program code configured to receive a raw display band, the raw display band comprising scene model data;
computer usable program code configured to partition the raw display band into a plurality of PE blocks based on a PE load balancing factor;
computer usable program code configured to distribute the plurality of PE blocks to a plurality of PEs;
computer usable program code configured to render, by each PE, the PE blocks, to generate rendered PE blocks;
computer usable program code configured to combine the rendered PE blocks, to generate a processed display band;
computer usable program code configured to determine a rendering time for each PE;
computer usable program code configured to modify the PE load balancing factor based on the determined rendering times; and
computer usable program code configured to transmit the processed display band to a graphics client.
17. The computer program product of claim 16, wherein transmitting comprises compressing the processed display band.
18. The computer program product of claim 16, further comprising computer usable program code configured to report a rendering time to the graphics client based on the determined rendering times.
19. The computer program product of claim 16, further comprising:
wherein the raw display band further comprises prospective rendering input; and
wherein partitioning the raw display band comprises partitioning based on the PE load balancing factor and the prospective rendering input.
20. The computer program product of claim 16, wherein modifying the PE load balancing factor further comprises modifying the PE load balancing factor based on the determined rendering times and received prospective rendering input.
21. A system comprising a graphics client, the graphics client configured to:
receive a frame, the frame comprising scene model data;
set a server load balancing factor based on the scene model data;
set a prospective rendering factor based on the scene model data;
partition the frame into a plurality of server bands based on the server load balancing factor and the prospective rendering factor;
distribute the plurality of server bands to a plurality of compute servers;
receive processed server bands from the plurality of compute servers;
assemble a processed frame based on the received processed server bands; and
transmit the processed frame for display to a user as an image.
22. The system of claim 21, further comprising:
wherein the graphics client is further configured to receive user input; and
wherein setting the prospective rendering factor further comprises setting the prospective rendering factor based on the scene model data and received user input.
23. The system of claim 21, further comprising:
wherein the graphics client is further configured to receive reported rendering times from at least one of the plurality of servers; and
wherein setting the server load balancing factor further comprises setting the server load balancing factor based on the scene model data and the reported rendering times.
24. The system of claim 21, further comprising:
a plurality of compute servers, each compute server coupled to the graphics client and comprising a plurality of processing elements (PEs), and each compute server configured to:
receive a raw display band from the graphics client, the raw display band comprising scene model data;
partition the raw display band into a plurality of PE blocks based on a PE load balancing factor; and
distribute the plurality of PE blocks to the plurality of PEs;
wherein each PE is configured to render the PE blocks, to generate rendered PE blocks; and
wherein each compute server is further configured to:
combine the rendered PE blocks rendered by that compute server's PEs, to generate a processed display band;
determine a rendering time for each of that compute server's PEs;
modify the PE load balancing factor based on the determined rendering times; and
transmit the processed display band to the graphics client.
25. The system of claim 24, further comprising:
wherein the raw display band further comprises prospective rendering input; and
wherein partitioning the raw display band comprises partitioning based on the PE load balancing factor and the prospective rendering input.
US12/329,586 2008-12-06 2008-12-06 System and method for photorealistic imaging workload distribution Active 2034-09-18 US9270783B2 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
US12/329,586 US9270783B2 (en) 2008-12-06 2008-12-06 System and method for photorealistic imaging workload distribution
JP2011539018A JP5462882B2 (en) 2008-12-06 2009-12-02 System and method for distributing the processing load of realistic image formation
PCT/EP2009/066257 WO2010063769A2 (en) 2008-12-06 2009-12-02 System and method for photorealistic imaging workload distribution
CN200980148614.1A CN102239678B (en) 2008-12-06 2009-12-02 System and method for photorealistic imaging workload distribution
US15/049,102 US9501809B2 (en) 2008-12-06 2016-02-21 System and method for photorealistic imaging workload distribution

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US12/329,586 US9270783B2 (en) 2008-12-06 2008-12-06 System and method for photorealistic imaging workload distribution

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/049,102 Continuation US9501809B2 (en) 2008-12-06 2016-02-21 System and method for photorealistic imaging workload distribution

Publications (2)

Publication Number Publication Date
US20100141665A1 true US20100141665A1 (en) 2010-06-10
US9270783B2 US9270783B2 (en) 2016-02-23

Family

ID=42170461

Family Applications (2)

Application Number Title Priority Date Filing Date
US12/329,586 Active 2034-09-18 US9270783B2 (en) 2008-12-06 2008-12-06 System and method for photorealistic imaging workload distribution
US15/049,102 Active US9501809B2 (en) 2008-12-06 2016-02-21 System and method for photorealistic imaging workload distribution

Family Applications After (1)

Application Number Title Priority Date Filing Date
US15/049,102 Active US9501809B2 (en) 2008-12-06 2016-02-21 System and method for photorealistic imaging workload distribution

Country Status (4)

Country Link
US (2) US9270783B2 (en)
JP (1) JP5462882B2 (en)
CN (1) CN102239678B (en)
WO (1) WO2010063769A2 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130236126A1 (en) * 2012-03-08 2013-09-12 Samsung Electronics Co., Ltd. Image processing apparatus and method for processing image thereof
US20150261571A1 (en) * 2014-03-12 2015-09-17 Live Planet Llc Systems and methods for scalable asynchronous computing framework
US9965890B1 (en) * 2013-07-19 2018-05-08 Outward, Inc. Generating video content
US11461959B2 (en) * 2017-04-24 2022-10-04 Intel Corporation Positional only shading pipeline (POSH) geometry data processing with coarse Z buffer

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5977023B2 (en) * 2011-11-07 2016-08-24 株式会社スクウェア・エニックス・ホールディングス Drawing system, program, and recording medium
WO2013097210A1 (en) * 2011-12-31 2013-07-04 华为技术有限公司 Online rendering method and offline rendering method and relevant device based on cloud application
CN102664937B (en) * 2012-04-09 2016-02-03 威盛电子股份有限公司 High in the clouds computing graphics server and high in the clouds computing figure method of servicing
GB2534225B (en) * 2015-01-19 2017-02-22 Imagination Tech Ltd Rendering views of a scene in a graphics processing unit
KR102648256B1 (en) * 2017-03-30 2024-03-14 매직 립, 인코포레이티드 Centralized Rendering
CN109426473B (en) * 2017-08-25 2023-07-28 微软技术许可有限责任公司 Wireless programmable media processing system

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6028608A (en) * 1997-05-09 2000-02-22 Jenkins; Barry System and method of perception-based image generation and encoding
US6057847A (en) * 1996-12-20 2000-05-02 Jenkins; Barry System and method of image generation and encoding using primitive reprojection
US6192388B1 (en) * 1996-06-20 2001-02-20 Avid Technology, Inc. Detecting available computers to participate in computationally complex distributed processing problem
US20040003022A1 (en) * 2002-06-27 2004-01-01 International Business Machines Corporation Method and system for using modulo arithmetic to distribute processing over multiple processors
US6753878B1 (en) * 1999-03-08 2004-06-22 Hewlett-Packard Development Company, L.P. Parallel pipelined merge engines
US6816905B1 (en) * 2000-11-10 2004-11-09 Galactic Computing Corporation Bvi/Bc Method and system for providing dynamic hosted service management across disparate accounts/sites
US7075541B2 (en) * 2003-08-18 2006-07-11 Nvidia Corporation Adaptive load balancing in a multi-processor graphics processing system
US20070016560A1 (en) * 2005-07-15 2007-01-18 International Business Machines Corporation Method and apparatus for providing load diffusion in data stream correlations
US7200219B1 (en) * 1999-02-10 2007-04-03 Avaya Technology Corp. Dynamically allocating server resources to competing classes of work based upon achievement of service goals
US20070101336A1 (en) * 2005-11-03 2007-05-03 International Business Machines Corporation Method and apparatus for scheduling jobs on a network
US20080021987A1 (en) * 2006-07-21 2008-01-24 Sony Computer Entertainment Inc. Sub-task processor distribution scheduling
US20080114942A1 (en) * 2006-11-13 2008-05-15 Jeffrey Douglas Brown Dynamic Data Cache Invalidate with Data Dependent Expiration
US7916147B2 (en) * 2002-03-01 2011-03-29 T5 Labs Ltd. Centralised interactive graphical application server

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4352990B2 (en) 2004-05-17 2009-10-28 日本ビクター株式会社 3D image generation system
US20080079714A1 (en) 2006-09-28 2008-04-03 Shearer Robert A Workload Distribution Through Frame Division in a Ray Tracing Image Processing System
KR20080057483A (en) * 2006-12-20 2008-06-25 삼성전자주식회사 Server, client, load balancing system, and load balancing method thereof
JP4422161B2 (en) 2007-03-09 2010-02-24 ザイオソフト株式会社 Image processing apparatus and image processing program

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6192388B1 (en) * 1996-06-20 2001-02-20 Avid Technology, Inc. Detecting available computers to participate in computationally complex distributed processing problem
US6057847A (en) * 1996-12-20 2000-05-02 Jenkins; Barry System and method of image generation and encoding using primitive reprojection
US6028608A (en) * 1997-05-09 2000-02-22 Jenkins; Barry System and method of perception-based image generation and encoding
US7200219B1 (en) * 1999-02-10 2007-04-03 Avaya Technology Corp. Dynamically allocating server resources to competing classes of work based upon achievement of service goals
US6753878B1 (en) * 1999-03-08 2004-06-22 Hewlett-Packard Development Company, L.P. Parallel pipelined merge engines
US6816905B1 (en) * 2000-11-10 2004-11-09 Galactic Computing Corporation Bvi/Bc Method and system for providing dynamic hosted service management across disparate accounts/sites
US7916147B2 (en) * 2002-03-01 2011-03-29 T5 Labs Ltd. Centralised interactive graphical application server
US20040003022A1 (en) * 2002-06-27 2004-01-01 International Business Machines Corporation Method and system for using modulo arithmetic to distribute processing over multiple processors
US7075541B2 (en) * 2003-08-18 2006-07-11 Nvidia Corporation Adaptive load balancing in a multi-processor graphics processing system
US20070016560A1 (en) * 2005-07-15 2007-01-18 International Business Machines Corporation Method and apparatus for providing load diffusion in data stream correlations
US20070101336A1 (en) * 2005-11-03 2007-05-03 International Business Machines Corporation Method and apparatus for scheduling jobs on a network
US20080021987A1 (en) * 2006-07-21 2008-01-24 Sony Computer Entertainment Inc. Sub-task processor distribution scheduling
US20080114942A1 (en) * 2006-11-13 2008-05-15 Jeffrey Douglas Brown Dynamic Data Cache Invalidate with Data Dependent Expiration

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130236126A1 (en) * 2012-03-08 2013-09-12 Samsung Electronics Co., Ltd. Image processing apparatus and method for processing image thereof
US9530176B2 (en) * 2012-03-08 2016-12-27 Samsung Electronics Co., Ltd. Image processing apparatus and method for processing image thereof
US9965890B1 (en) * 2013-07-19 2018-05-08 Outward, Inc. Generating video content
US10430992B1 (en) 2013-07-19 2019-10-01 Outward, Inc. Generating video content
US10839592B1 (en) 2013-07-19 2020-11-17 Outward, Inc. Generating video content
US11250614B1 (en) 2013-07-19 2022-02-15 Outward, Inc. Generating video content
US11704862B1 (en) 2013-07-19 2023-07-18 Outward, Inc. Generating video content
US20150261571A1 (en) * 2014-03-12 2015-09-17 Live Planet Llc Systems and methods for scalable asynchronous computing framework
US9417911B2 (en) * 2014-03-12 2016-08-16 Live Planet Llc Systems and methods for scalable asynchronous computing framework
US9672066B2 (en) 2014-03-12 2017-06-06 Live Planet Llc Systems and methods for mass distribution of 3-dimensional reconstruction over network
US10042672B2 (en) 2014-03-12 2018-08-07 Live Planet Llc Systems and methods for reconstructing 3-dimensional model based on vertices
US11461959B2 (en) * 2017-04-24 2022-10-04 Intel Corporation Positional only shading pipeline (POSH) geometry data processing with coarse Z buffer

Also Published As

Publication number Publication date
JP2012511200A (en) 2012-05-17
US20160171643A1 (en) 2016-06-16
CN102239678B (en) 2014-04-09
JP5462882B2 (en) 2014-04-02
WO2010063769A2 (en) 2010-06-10
CN102239678A (en) 2011-11-09
WO2010063769A3 (en) 2010-11-18
US9501809B2 (en) 2016-11-22
US9270783B2 (en) 2016-02-23

Similar Documents

Publication Publication Date Title
US9501809B2 (en) System and method for photorealistic imaging workload distribution
CN112020858B (en) Asynchronous temporal and spatial warping with determination of regions of interest
US10062181B1 (en) Method and apparatus for rasterizing and encoding vector graphics
EP2245536B1 (en) Methods and systems for remoting three dimensional graphics
US9922007B1 (en) Split browser architecture capable of determining whether to combine or split content layers based on the encoding of content within each layer
CN111567052A (en) Scalable FOV + for issuing VR360 video to remote end user
US20140074911A1 (en) Method and apparatus for managing multi-session
US9479570B2 (en) System and method for processing load balancing of graphic streams
WO2011163388A1 (en) Remote server environment
US20160162597A1 (en) Intelligent browser-based display tiling
US20160330260A1 (en) Ultra-Low Latency Remote Application Access
US9372725B2 (en) Dynamically adjusting wait periods according to system performance
EP3391190B1 (en) Pipelining pre-composition data
EP3628096A1 (en) System and method for dynamic transparent scaling of content display
JP2008289030A (en) Picture drawing transfer system
US20120005587A1 (en) Performing Remoting Operations For Different Regions Of A Display Surface At Different Rates
US11528516B2 (en) Distributed transcoding method and distributed transcoding system
KR20190063568A (en) Optimization method for time reduction of distributed transcoding and system thereof
KR20190121280A (en) Electronic device supporting for Live Streaming Service of Virtual Contents based on Tiled Encoding image
US11704813B2 (en) Visual search method, visual search device and electrical device
US11074885B2 (en) Facilitating efficient detection of patterns in graphics display streams prior to their display at computing devices
US11102327B1 (en) Method, device, and computer program product for acquiring visual content
Jeon et al. Mondrian: On-Device High-Performance Video Analytics with Compressive Packed Inference
KR101464619B1 (en) Frame buffer direct access control method for VDI client
Liu Principle and Practice of Distributing Low and High Resolution Display Content from One Computer to Many Computers in Stand-alone or Display Wall Configurations

Legal Events

Date Code Title Description
AS Assignment

Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION,NEW YO

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MADRUGA, JOAQUIN;MINOR, BARRY L.;NUTTER, MARK R.;REEL/FRAME:021953/0439

Effective date: 20081205

Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MADRUGA, JOAQUIN;MINOR, BARRY L.;NUTTER, MARK R.;REEL/FRAME:021953/0439

Effective date: 20081205

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 8