A METHOD AND APPARATUS OF USING A NEURAL NETWORK TO TRAIN A NEURAL NETWORK
FIELD OF THE INVENTION
The present invention relates generally to image compression. More particularly, the present invention relates to training a neural network.
BACKGROUND OF THE INVENTION
Wavelet transforms are widely used in analysis, where they are known as "multiresυlution analysis", and in image and audio compression, where they are used as a pyramid coding method for lossy compression. The wavelets used are generally from a very small set of analytically designed wavelets, such as Daubechies wavelets, or quadrature mirror filters ( "QMF"). For some applications, designing specific wavelets with special coding properties would be beneficial.
Presently, there are no methods to construct a neural network which performs the finite or discrete Radon transform, on the desired geometry to satisfy the connectedness of the desired neural network.
SUMMARY OF THE INVENTION A method and apparatus of training a neural network are described. The method and apparatus include creating a model for a desired function as a multi-dimensional function, deteπτ_ining if the created model fit a simple finite geometry model, and generating a Radon transform to fit the simple finite geometry model. The desired function is fed tlirough the Radon transform to generate weights. A multilayer perceptron of the neural network is trained usinsi the weights.
BRIEF DESCRIPTION OF THE DRAWINGS
Features and advantages of the present mvention will be apparent to one skilled in the art in hght of the following detailed description in winch.
Figure 1 is a diagram of one embodiment of a multilayer perceptπ_n .
Figures 2a and 2b are illustrations of a unit square and a torus.
Figure 3a illustrates one embodiment of geodesies on a sphere.
Figure 3b is an illustration ot a leaf showmg one embodiment of the overlapping segments of the geodesic of a half-sphere.
Figure 4 is an illustration ot one embodiment of the mapping of half-sphere geodesies to a plane in a continuum.
Figure 5 is an illustration of one embodiment of budding dimension.
Figure 6 is a block diagram ot one embodiment ot a computer system.
Figure 7 is a flow diagram of one embodiment ot a method of designing a set ot wavelet basis.
Figure 8 is a block diagram of one embodiment of system for designing a set of wavelet basis.
Figure 9 is a flow diagram of one embodiment of a method of compressmg images;
Figure 10 is a block diagram of one embodiment of system for compressmg images,
Figure 11 is a flow diagram of one embodiment of a method of reconstructing audio/video/image data from lugher moment data.
Figure 12 is a block diagram of one embodiment ot system for reconstructing audio/video/image data from lugher moment data,
Figure 13 is a flow diagram of one embodiment of a method of us g a neural network to tram a neural network, and
Figure 14 is a block diagram of one embodiment of system tor usmg a neural network to tram a neural network.
DETAILED DESCRIPTION
A method and an apparatus ot creatmg wavelet basis are described In the
following detaded description of the present mventior, numerous specific detads are set
forth m ord to piovide a thorough understandmg ot the piesent invention However, it
will be apparent to one skilled in the art that the present mvention may be practiced
without the e specific details In some instances, w ell-know n structures and devices are
shown m block diagram form, rather than in detad. m order to avoid obscuπng the present
invention
Some portions of the detailed descriptions that follow ai e presented m terms of
algorithms and symbolic representations ot operations on data bits w itlun a computer
memory These αlgoπtlurac descriptions and lepresentations ai e the means used by those
skilled ui the data processmg ai ts to most effectiveK convey the substance of their work to
others skilled m the ai t An algontlmi is hei e, and generalh . conceiv ed to be a self-
consistent sequence of steps leading to a desn ed iesult The steps ai e those l equirmg
physical manipulations of physical quantities Usually, though not necessaπly these
quantities take the form of electrical or magnetic signals capable of beuig stoied,
transferred, combmed, compared, and otherwise manipulated It has pi ov en convenient at
tunes, prmcipaUy for reasons of common usage, to refer to these signals as bits, values,
elements, symbols, characters, terms, numbers. 01 the like
It should be home in mind, ho v ei . that a l of these and sinulai terms ai e to be
associated with the appropriate physical quantities and aie merely convenient labels
applied to these quantities Uidess specifically stated otherwise as appai ent horn the
fodowing discussion, it is appreciated that thioughout the description, discussions utihzmg
terms such as "processmg" or "computmg or "calculating or 'determmmg 01
"displaying" or the like, iefer to the action and processes of a computei system. 01 simdai
electronic computmg device, that manipulates and transforms data repi esented as physical
(electronic ) quantities witlun the computer system s registers and memories mto other data
-V
simdarly represented as physical quant ties witlun the computer system memories or
registers or other such mformaf on storage, transmission or display devices.
The present invention also reL es to apparatus for performing the operations
herem. This apparatus may be specially constructed for the required purposes, or it may
comprise a general purpose computer selectively activated or reconfigured by a computer
program stored in the computer. Such a computer program may be stored a computer
readable storage medium, such as, but is not limited to, any type of disk including floppy
disks, optical disks, CD-ROMs, and magnetic-optical disks, read-only memories (ROMs),
random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, or
any type ot media suitable for storing electronic instructions, and each coupled to a
computer system bus.
The algorithms and displays presented herein are not inherently related to any
particular computer or other apparatus. Various general purpose systems may be used
with programs m accordance with the teachuigs herem, or it may prove convement to
construct more specialized apparatus to perform the requued method steps The requued
structure tor a variety of these systems will appear from the description below. In
addition, the present invention is not described with reference to any particular
programmuig language It will be appreciated that a variety of programming languages
may be used to unplement the teachmgs of the invention as described herem.
Wavelet transforms convert a signal mto a series of wavelets. Wavelets are convenient for data transformation because they are finite m nature and contam frequency information. Since most actual waves have a finite duration and their frequencies change abruptly, wavelet transforms are better approxunations for actual waveforms than other
transforms, such as the Fourier transform. Signals processed by wavelet transforms are stored more efficiently than those processed by Fourier transforms.
Imagery may be created by inverting one or more layers ot unction m neural networks. Such a reversal of visual system processing may take place stages or all at once. Finite Radon transforms on certam geometries are used to accomplish the reversal of the visual system processmg. A dual system is created to certam feed forward network models of visual processmg, and its application to visual processmg and to non-image processmg applications is shown.
For purposes ot explanation, m the following description, numerous specdic details are set forth m order to provide a thorough understanding of the present invention. It will be apparent, however, to one skdled m the art that the present invention can be practiced without these details. In other instances, well-known structures and devices are showing block diagram toπn in order to avoid obscuring the present invention
In unage signal processmg, an underlying assumption for any model of the visual system must deal with recall of imagery as well as comparison, classification and identification. Images are recalled in some form such as mental imagery, dream sequences, and other forms of recall, more or less vivid, depending on the individual. A basic postulate of visual imagery is that imagery comes from the creation, or recreation, ot visual signals m the areas of the brain w hich process incoming images
One approach to modeling visual systems is to assume that the processes of the visual system would have to be inverted order to produce imagery witlun the visual system as a form of recall. Both the inversion process and the estimation of the visual system may be examined by looking at the inversion of the Radon transform. This is because the forward transformations, which m many cases occur m the visual system- resemble Radon transforms Thus, the process of extracting miormation from the visual stream is modeled usmg the Radon transform and its dual backpπηection.
In Radon transforms, mstead ot assigning a value to each pomt on a plane, each line on a plane is assigned a value by adding up the values of all the pomts along the line (i e take the mtegral of the pomts along the une) To obtam the backprojection of the Radon transform, the value ot a particular pomt on a plane is calculated usmg the values of all the hues that go through the particular pomt
A neural network ("neural net") is a collection of nodes and weighted connections between the nodes The nodes are configured in layers At each node, all of the mputs mto the node are summed, a no n- linear function is applied to the sum ot the mputs and the iesult is transmitted to the next layer of the neural network A neural network may be used to budd a radon transform.
Multilayer perceptrons ( "MLP" ) are a frequent tool m artificial neural networks MLP have been used to model some brain functions It has been shown that MLP with one hidden layer may be used to model any contmuous function Thus, given a layer that models a function of sufficient dimension, the ability to form Radon transform mverses implies the ability to foπn all continuous functions
An MLP with one hidden layer is capable of representmg a contmuous function. It can be shown that the function it represents is the function that results from backprojectmg whatever function is represented at the hidden layer In Older to budd a neui al network (MLP) for a particular function, a trauung method is used ( usually backprojection ) to set the weights at the hidden layer If the function is discrete, then it should be set to the Radon transform of the desired function, with a sharpening filter unposed to get rid of the blurring from the average If there is no average, then there is no blurring and no sharpening is needed
In the context of human vision, mput is put through a different kmd of neural network . particularly one that performs a finite or discrete Radon transform. If this network is set to create the Radon transform of the desired function, then it can be used to set the weights needed by the MLP. So tlus neural netwo I (afferent on the hidden layer
-( -
of the MLP) trams the MLP Tlus is quicker than bαckpropogαtion. and unlike traditional techniques such as backpropogation, it allows the calculation of additional weights to add neurons to the MLP ludden la /er
Figure 1 is a diagram ot a multdayer perceptron Multilayer perceptron 100 includes an mput 101 , a ludden layer 102, an afferent luik to dual network 103 and output 104 An mput 101 is received by MLP 100 and processed through ludden layer 102 Hidden layer 102 includes nodes 102a-f The nodes 102α-t are shown tor illustration purposes A ludden layer 102 may include greater or fewer nodes dependmg on the design of the MLP
Neural networks ot arbitral y complexity may be constructed using discrete and finite Radon transforms A discrete and finite Radon transform involves takmg the values of hue segments instead of hues Thus, the values of all the line segments on a plane are taken for the discrete Radon transform and the backprojection ot the Radon ti anstorm is accomplished usmg hue segments through a particular pomt
GeneraUy a backprojection is not the mverse of a Radon transform because there is some blurπng Thus, typically a filter is used to make the inverse sharpei However, if the function is transferred to a new function on pomts so that the backprojection ot a radon transform is the radon transform s invei se, there is no blurπng The transformation ot the function that causes the backprojection to be the vei se is a wavelet ti anstormation because it satisfies "the wavelet condition ' (that the average value ot the function is zero )
The central equation for constructing the neural networks, the Guidikm or Bolker equation, mvolves backprojectmg the Radon transform and subtracting a global ( to the pomt m question ) average function The nature of the average function to be subtracted is dependent on the transform geometry, and can be varied by varying the interconnect structure of the neural network
The transform is dual to the network Thus, the transform may be weighted to a desued template function
Hidden layer 102 is repre sented as a Radon backpr ojection Thus, mput 101 is the stored sum of the values of line s egments gomg through the pomt At hidden layer 102, a function representing α radon tra ist ir.n is performed on i le mput 101
Thus, if the mput 101 is represented by x, the output 104 is represented by o = (σ∑i X1 vv,j , where σ is the radon transform- As illustrated, ludden layer 102 receives mput 101 and afferent mputs 103 a-f Afferent mputs 103a-f bemg transmitted to hidden layer 102 represent the back propagation of the MLP 100 Thus, if MLP 100 represents α radon transform, afferent mputs 103α-t are the mversions of the radon transform. Tie back propagation is used to adjust the weights of the function at ludden layer 102 so that the inversion 103α-f is equal to the mput 101
Tlie sum of the mputs received at each of nodes 102a-f is processed by applying a function, such as a radon transform backprojection
Tlie afferent mputs are received through afferent hide 103 to a dual network (not shown) Tlie afferent mputs are mversions of the radon transforms The results of ludden layer 102 processmg are summed usmg a weighting to produce output 104
After the network is prepared, the wavelet prototype, is fed through a network and its back propagation The wavelet prototype is generally a function which is close to the desued shape, if that is known, although it is arbitrary
Tlie output is then used to modify the mput function by subtracting output from the mput function to obtam a difference and moving the mput function m the opposite direction from the difference Tlie process converges to zero difference between the mput and the output, which satisfies the wavelet condition Tlie resultmg function is then a "mother wavelet" from which a wavelet basis local to that pomt may be formed
In constructmg discrete radon transform mverses. the inverse process on different geometries and for different structures are examined One inverse, based on cosets of Z2
has the form
f ( x) = a( f ) + ∑ U ιg ) - ti( f ))
?EG.
(Equation 1 ) where Zp is the rmg w ith p elements, with addition beuig addition modulo p, and multiplication likewise This is stundαid notation Tlie superscnpt 2 indicates that tlus is the rmg of ordered paus ot two members of this rmg, with addition and multiplication done componentwise It is the rmg ot pairs ( a. b) where a and b are in the rmg Z Tlus is known to one skilled m the art In equation 1 , the sum (∑) is taken over the mcidence set
G ot lines the group which intersect \ and the average, represented by α( f), is taken over the whole group See F Matus and J Flussei . 'Image Repiesentations via a Finite
Radon Tianstorm. ' lEEE Ti an v PAMI. \ 15. no 10, 1993, pp 996- 1006 The form implies that, tor a function with zero average, the backprojection is the inverse If the cosets of Zl are plotted, the plot is essentially a disci etization of the closed geodesies on a
torus
Figures 2a and 2b are illustration ot a uiut square and a torus Unit square 200 mcludes sides 205a-d As seen m Figure 2b, torus 210 is formed by jouung opposite sides of a uiut square 200 such as, for example, sides 205a and 205c Tlus operation is an isometry, so that the size and shape ot a volume element does not change from that tor R~ Consequently, the geodesies on the torus map to straight hues m R". and they pack tightly , forming uniform approximations to the av erages on that surface For example, geodesies 201b and 202b on torus 210 map to straight hues 201 a and 202a unit square 200
Whde, the same is not true toi α half-sphere, an mterestmg version ot the mversion formal may be formed for the half-sphere wluch wid lead to findmg a reasonable formula for the human visual system.
Figure 3a illustrates one embodurient of geodesies on a sphei e Spheie 300 mcludes geodesies 30 lα and 301 b, for example On the sphere, geodesies are "great circles", meaiung that foi S", any copy of S" ' shares the same center and radius as the
sphere itself. . n antipodal map, which takes x to its opposmg pomt on the other side of the sphere, may be denoted by A(x). However, an mvertable transform on a sphere, usmg integration along geo Iesics, can not be obtamed because the geodesies through pomt x are identical to those through A(x)
If the transform is restricted to functions tor which / (Λ) =/ (A(x)), then the transform is essentiaUy restricted to a half-sphere transform. By equatuig the pomts x and A(x), another essentially property is found. Given two pomts on the half-sphere, there is exactly one geodesic which passes through them.
An mversion over geodesies on the sphere may be obtamed as follows Assuming that for each pomt x() a half-sphere may be used, k + 1 geodesies 301a,b through n are chosen, divided into k sections including sections 302a,b, tor example On each section 302a.b of geodesic #,, a representative .γ„ ιs chosen A discrete Radon transform is calculated by takmg the average ot the k sections usmg the formula
fig, ) = γ k ∑ J=1 f(X)
(Equation 2).
To keep notation simple, by rearrangmg the indices for example, the sample on each geodesic contauung λ is mdexed Λ, I , and the sample picked for tlus segment on all geodesies is X Constramts on segmenting the geodesies insure that this is reasonable as k gets large.
Tlie backprojection at pomt x(> is defined to be the sum of the values on the geodesies through x0 ,
shχ 0 ) = ∑hg. )
(Equation 3)
The sum mav be rearranged to be
s/(V = ∑ ∑/( > = <∑ ∑/u )+/(χ„,J) + 7∑/(v )
(Equation 4)
In equation 4 the ist term m the expiession contains one copv of each ot the L samples taken Denotmg the average lue over all the samples as / smce \ is chosen to be the sample for each segment m which it falls the equation is /(V= +∑ /(? )-/) (EquationS)
With some adjustments m the samples taken as the size tit the sample set giows / approaches the average value ot the function over the half-spheie and /(!,' ) approaches the usual definition ot the Radon transtoim Matus and Flussu found the same expression m the case ot the gioup Z' wheie then analysis performs the double
hbration alluded to b\ Bolkei See E D Bolkei Tlie Finite Radon Tianstorm Integral Geometn, AMS Contemporary Mathematics scr v 63, 1984, pp 27-50 F Matus and I Flusser, Image Representations via a Fuute Radon Tianstorm, IEEE Trans PAMI,\ 15, no 10, 1993, pp 996-1006
Equation 5 is a limit value for the formula given bv Bolker in the case of sets m which theie aie ceitain block design constiamts The constraints aie satisfied above b\ notmg that, given two points on the half-spheie theie is exactly one geodesic passing through them, and bv the use of the index L guaranteeing that theie aie equal numbeis of geodesies through each point m the disci etization formula Specihcalh using Bolker s notation, a - L + 1 and β = 1, so that the formula reads
f(λ) = SRf(\) + — - — u( ) = —SRf( \) + — u(f) (Equation 6) a- β a- β L k
In Vvedenska a and Gindikin s formula the term β does not show specifically because it is arranged b\ geometn See N D Vvedenskava and S G Gmdikui Discrete
Radon Transform and Image Re> υnstruction ' Mathematical Problems in Tomographs . AMS, 1990, 141- 188 Tlie term β does ado w, however, the buddmg ot mterestmg biologically plausible structures
In order to create a scenario lesembung the fuute transforms encountered m bram processmg, a set of discrete transforms need to be w ven together into α sheet Tlus is done by usmg the foπnula for the half-sphere (Equation ??) and acknowledgmg the finiteness of each geodesic set
Fust segments of geodesies are taken on the half-sphere If a pattern of finite segments is allowed, then one possible arrangement is to allow that each segment is incident on pomts only along the geodesic on which it lies, that each segment is incident on the same number ot points, and that there is a segment centered at each sample point
If the number ot samples m each segment is k. and theie is a sample centered at each Λ,; , then there are k segments mcident on the trial sample point \„ These A. segments comprise k~ samples, counting repetition, so that an "average" over these segments would require a factor of 1/ k~ The rest of the analysis proceeds as with the half-sphere analysis, except that there is a different average value calculation, and, a different wavelet condition Tlie average is replaced with a weighted average
Each local set along a geodesic on the half-spheie will be referred to as a segment. and it will be assumed that each segment contains k samples Furthermoi e, it is assumed that the segments are centered at samples spaced one sample apart, so that along a given geodesic, the segment centered at Λ7 contams ΛV, if, and oidy if, the distance between the two samples is less than ( A D/2.
For each distance 0<d< (A 1 V2, there are two segments whose centers are this far from Λr; , so that there are a total of k- 1 segments which overlap Xa , but are not centered there Thus, there are k segments wluch contain n along any geodesic
Because each segment contains k samples, there are a total of k~ values summed by summing up the segments along one geodesic overlapping n . Each set of segments along one v iodesie covering the point χlt will be referred to as a "leaf".
Figure 3b is an illustration of a leaf showmg one embodiment of the overlapping segments of the geodesic of a half-sphere. Point 310 represents a point AV.OΠ a half- sphere. Segments 312, 314, 316, 318 and 320 overlap along a geodesic covering point AT.. Segments 312-320 form a leaf.
Proceeding with the previous construction, without making adjustments for the number of overlaps or samples, the Radon transform for a segment of length k centered at sample x„ will be defined as
/'(£. = ∑/ ". ) (Equation 7)
and the backprojection or adjoint transform will be defined Sf(x0)= ∑ (£,.) (Equation 8)
to be the sum over the set of ad segments of aU leaves at x . Written for in terms of samples, for each leaf, the sum is sX(g„) = ∑(k-d)f(x,) (Equation 9).
As before, assuming k + 1 geodesies on the half- sphere, intersecting \n , the equation becomes
Sf(x0 ) = ∑ ∑ ( k - d)f(xη ) (Equation 10). ι=0 )=l
Tlie sum. as before, is manipulated to expose the inverse formula:
SfXχ = ∑∑(k- )f(x,) + fXx
)J
(Equation 11).
The term mside the parentheses r. Equation 1 1 has { k + 1 )(A0 - k) + k = A0 samples, mdicatmg that rf the Radon transform were defined with a factor accounting for the k~ samples occurring on each leaf, and the average w ere defined to be the sum on the right, with a weighting factor of 1/ A0, to account for the samples on each leaf, the inverse formula would be k f (xa ) = u{f ) + ∑ (j(g„ ) - u( f) (Equation 12)
The weighted average μ(/ ) needs to be expressed as a function of the Radon transform of/, not / itself See Bolker If the incidence structure of the points and segments is uruform, tlus is no problem because then every pomt ends up with k segments incident on it. and the weight g formula may be defined on the Radon transform segments by definuig a distance d between the segment and x() to be the distance from x to the center of the segment
For the spherical model, tlus leads to an average over all segments, weighted by distance and divided by a factor of A0 for the overlap The same exercise may be done usmg different packing formulas, which amount to specifying the connectivity between pomts in the model of the visual system
Figure 4 is an illustration ot one embodiment of the mappmg ot half- sphere geodesies to a plane m a continuum. Half- spheie 400 has geodesies 401a.b.c Tlie geodesies 401a.b,c are mapped 41 lα,b,c to plane 420 including a grid 430 The shape shown for the grid 430 is a function ot the kmd of connectivity the contmuum has
By usmg orthographic projection, a domam of pomts is obtamed at which oriented filters are represented by Radon set transforms Consequently, the form tor an mversion of a uniformly packed domam of finite segment Radon transforms have been found As with the other cases examined, ύ a functional mput exists which adheres to the wavelet condition, modified, m this case, to accommodate the weighted average rather than a uruform measure, the mverse can be obtamed directlv Lorn the backprojection
Partial Backprojections
A specific example wdl be used to illustrate the use of partial backprojections On the surface of a half-sphere with the same geodesies through point x(> , α large number ot objects are desired to be formed by tak g paus ot geodesies through point ,; In neural terms, correlations are forming, specifically junctions and end stoppmg cells ot a particular variety
Tlie correlations may be made more like end stoppmg cells by takmg half arcs jomed at the pomt x„ . Since the two cases are conceptually identical, the latter formulation will be taken Tlie coirelation may be built from the structures generated by a gπd of half-spheres Tlie constructs are parameterized as follows At each point A, sets ore parameterized to be g ( ϋ,φ, x ) where ϋ is the angle of the first hαlf-geodesic. and φ is the angle from the first to the second The Radon transform horn the set ot points to the set of f> (&, φ, x ) may be denoted by f(g) = ∑ f (-Xj ) +f(xn) + ∑ f lX+ΨJ ) (Equation 13)
which is nothing more than the sum up one arc to AV, plus the sum up the other arc.
Movement between representations is possible, usmg the formulation given tor the discrete Radon transform above, by noting that if two such structures are taken, with 9'2=9'ι+ π/2, and summed, another Radon transform may be defined on pairs of geodesies Tlus duplicates the value f(x0)
Tlus representation is correct is correct m the following way A set ot geodesies through pomts is assumed at the start. Tlie values on these geodesies are given by the Radon transform m the usual sense If sets of these structures, characterized by fixed angle ϋ , are added, a different average v alue formula is obtamed, but the backprojection is of the same general form. Consequently, the iesult ot the transformation may be inverted m a single step
Generalizing tlus model, sets of geodesies may be taken from the set ot leaves m GAO, the set of all segments intersecting x . Because any set of these segments contams
copes ot n, and because by rot ational symmetry, all rotations ot such sets may be taken as a basis at each pomt, the san t construct may be generated in formmg an mverse. Such cor .tructs are referred to as 'pi rti d oαckprojections "
Partial backprojections are unportant for tw o l easons The first reason is that there are unportant examples ot these sets that correspond to human visual system constructs The set just mentioned, for example, corresponds well to the angular cells among the hypercomplex cells - they respond to angles
Thus, it is shown that, with some adjustments for a thresholding process that occurs in forming such unions ( e g , throw out high frequency wavelets), the output ot such cells is reversible, and can be reversed m one backprojection step Tlus is an interesting point smce feedback afferent on the early stages of the human v isual system comes from stages that are separated by more than one step
Also, for the visual system, in a space in wluch the input functions do not have an average value equal to zero, the entue algorithmic formula comes mto play Supposmg a localized Radon transform of a color space, when vertmg the color space, the backprojection may be adjusted or the backprojection may be filtered to render no dispersion m the pomt spread function Then net effect is that edge information has been extracted at the expense ot level set mformation, and the level set information has been replaced with a new value Tlus is identical to a gray world assumption m the retinex or simdar algorithms
The second reason that tlus transformation is important is because, bemg a groupmg of the elements of the Radon transform ( ι e Lines) mto sets in an incidence structure, it represents another geometry tor the Radon transform, wluch may be defined as the sum of the hne values in the sets Tliis is just the definition given to the partial backprojection Consequently, the sets that define a partial backprojection have been used to foπn a new Radon transform
Bolker has shown that J these sets are spreads, then the transform so generated will be constant on the range of the Radon transfoπn of the pomts. Bolker uses his in local form to budd a coho mo logy-like sequence of transforms.
There is nothing preventmg the taking of arbitrary sets of geodesies except tractabϋity, however, and the one chosen is particularly practical. Because the sets chosen give response to a coπelation of multiple (e.g. , two in this example) orientations, they are defined by a pair of lines, and therefore have the dimension of a plane.
Figure 5 is an illustration of one embodiment of building dimension. Two points 501 , 502 foπn a l ne 51 1. Two lines 51 1 , 512 foπn a plane 521. Two planes 521.522 foπu a volume 531.
Evidently, this transformation has resulted in increasing the dimension by one. This is evident from the fact that two angles and a two dimensional position must be specdied for each new segment set.
It may be noted that none of the derivation done for samples is affected by what dimension of sphere is being worked on. although one could add a factor of k for each dimension to satisfy intuition (the geodesies on S" are copies of S"'J sharing the property of common center with S"). Consequently. Radon transfoπn sequences may be built wliich build geometries of arbitrary dimension in this fashion.
Figure 6 shows a diagrammatic representation of machine m the exemplary form
of a computer system 600 within wliich a set of instructions, for causing the machine to
perfoπn any one of the methodologies discussed above, may be executed. In alternative
embodiments, the machine may comprise a network router, a network switch, a network
bridge. Personal Digital Assistant (PDA), a cellular telephone, a web appliance or any
machine capable of executing a sequence of instructions that specify actions to be taken by that machine.
The computer system 600 includes a processor 602, a mam memory 604 and a
static memory 606, which communicate with each other vi i a bus 608 Tlie computer system 600 may further mclude a video display unit 610 (e g., a liquid crystal display (LCD) or α cathode ray tube (CRT)) Tlie computer syste>n 600 also mcludes an alphanumeric mput device 612 (e g , a keyboard), a cursor control device 614 (e g., a mouse), a disk drive uiut 616, a signal generation device 620 (e g , a speaker) and a network mterface device 622.
The disk drive unit 616 mcludes a computer- readable medium 624 on which is stored a set ot instructions ( i e., software) 626 embodying any one, or ad, of the methodologies described above Tlie software 626 is also shown to reside, completely or at least partially, within the mam memory 604 and/or within the processor 602 The software 626 may further be transnutted or received via the network mterface device 622 For the purposes of this specdication, the term " computer-readable medium" shall be taken to mclude any medium that is capable ot stormg or encoding a sequence of instructions for execution by the computer and that cause the computer to perform any one of the methodologies of the present invention Tlie term "computer-readable medium" shad accordmgly be taken to uicluded, but not be lunited to, solid-state memories, optical and magnetic disks, and caπier wave signals
Figure 7 is a flow diagram of one embodunent of a method of designing a set of wavelet basis to fit a particular problem
At processmg block 701 a neural network of arbitrary complexity is constructed usmg a discrete and finite Radon transform. Central equation for doing the Radon transform may mclude the Gmdikm equation or the Bolker equation referenced above
Construction of the neural network wdl mclude back projectmg the Radon transform to a pomt and subtractmg a global average iun tιon of the pomt. The global
average function is dependent on the transfoπn geometry and may be v aried by varyuig the mterconnect structure by the neural network, as described above
The Radon transform may be w eighted to a desued template function Tlie neu al network may be budt to have a particular geometry so that, given a particular pomt, the size and shape ot the lme segments chosen for the Radon transtoπri and its back projection form the particular geometry Tlie geometry may be any geometry such as. for example, a sphere or hyperbolic, etc
The Radon transform is duel to the network because the neural network performs the Radon transform and mveits the Radon transtoπu
At processuig block 702. an mput wavelet prototype designed to tit α particular problem is ted through the neural network and its back propagation to u.se an output The wavelet prototype may be a function w luch is close to the desired shape, if that is known Tlie wavelet is used to train the neural network to be specific to α certain set ot images
At processuig block 703, the mput function ot the neural netwoi k is modified usmg the output. Tlie mput function is moddied by subtractmg the difference between the output of the neural net, wluch is the inverse of the Radon transfomi. and the mput ot the neural network, which is the oπgmal data, such as tor example, an image Tlie difference between the output and the input is used by the neural network to modify the input function by moving the mput function in the opposite direction from the ditfei ence between the output and the input Tlus is then a "mother w avelet" from w Inch α wavelet basis local to that pomt to be toπned
This process converges to zero difference between uiput and output, which satisfies the wavelet condition Thus, the neural network will produce wavelets that are capable of processmg the images with little loss Tlie trauung ot the neui al network may contmue untd the difference between the output and the input reaches a predeteπnmed value, which may be an error value tor the neural network Once the predetermined value is reached, the trauung will cease so that the neural network is not overtramed
Tlus method of construct ng wavelets is optimized for massively paradel unage processmg and distribution It optimizes around the unage or template bemg processed, and does not require ti at the exa ;t < hαracteπstics ot the template be known αnalyticady Tlie method of constructing wavelets also works tor any dimension, and can work on data that comes from experunent. when a template is not known, by using the template as α block design
Tlie method is adaptive m paradel. and could be used to geneiate w avelet basis tuned to very specdic templates, such as, tor example, to measure differences Tlie method also allows wavelets be built tor unage analysis functions specdically . and adows "picture-centric" wavelet bases Picture-centric wavelet bases includes wav elet bases that are specdic to a certain type ot image For example, the wavelet bases may be constructed for images of houses, which have a large number of parallel and horizontal luies The wavelet basis may also be constructed to be an edge detector, as described above
The method of constructmg wavelets generalizes to many dimensions, and may be used to compress multi-dimensional data Tlie method, with another dimension, may be appropriate to spatio temporal data such as, tor example, video The method of constructmg wavelets models the human visual system, and could be important to computer vision tasks
Figure 8 is a block diagram ot one embodiment of system 800 tor uesigiπng a set of wavelet basis System 800 includes a designing module 810 coupled to α feeder 820 The desigrung module 810 designs an input wavelet to fit a paiticular problem and this mput wavelet prototype is fed through a neural network 840 and its backpropagation to produce an output Tlie wavelet prototype may be a function wluch is close to the desued shape, if that is known The wavelet is used to tram the neural network 840 to be specdic to a certam set ot unages
The neural network 840 is of arbitrary complexity and is constructed by a neural network constructor 830 usmg a discrete and fuute Radon transfomi Central equation tor
domg the Radon transfomi may mclude the Gmdik i equation or the Bolker equation as refeπed to above
Construction ot the neural network 840 wn include back projectmg the Radon transform to a pomt and subtractmg a global average function ot the point Tlie global average function is dependent on the transfoπn geometry and may be varied by varymg the interconnect structure by the neural network 840. as described above.
The Radon transfomi mav be weighted to a desired template function. Tlie neural network 840 may be built to have a particular geometry so that, given a particular pomt, the size and shape of the line segments chosen tor the Radon ti anstorm and its back projection foπn the particular geometry Tie geometry may be any geometry such as, for example, α sphere or hyperbolic, etc
The Radon transfomi is duel to the netwoi k 840 because the neural netwoi k 840 pertoπns the Radon transfomi and inverts the Radon transfomi.
The neural network 840 is also coupled to a moddier module 850 that modifies an mput function of the neural network 840 using the output. Tlie input function is moddied by subtracting the difference between the output of the neural network 840. which is the mverse of the Radon transform, and the input ot the neural network 840. hich is the original data, such as tor example, an image The difference between the output and the input is used by the neural netwoi k 840 to modify the input function by moving the input function in the opposite direction from the difference between the output and the input Tlus is then a "mother wavelet" from wluch a wavelet basis local to that pomt to be foπned.
Tlus process converges to zero difference between input and output, which satisfies the wavelet condition Thus, the neural network 840 wdl produce wavelets that are capable of processing the images ith little loss The training of the neural network 840 may continue untd the difference between the output and the mput reaches a predeteπmned alue, which may be an eπor value tor the neural network 840 Once the
predetermmed value is reached, the trauung will cease so that the neural network 840 is not overtramed
Figure 9 is a flow diagram ot one embodiment of a method of compre.. ,ιng images
At processmg block 901 , α neural network hαv g α specific geometry is constructed usmg a discrete and finite Radon transfomi Tlie construction of the neural network is based on an analysis of the geometry ot the desued netwoi k Tlie specdic geometry chosen may depend on the simplicity ot the encoding, the simplicity of the decodmg, the natural geometry suggested by the subject mattei to be compressed, and/or the natural geometries suggested by the network architecture
At processing block 902. the data to be compressed is fed tlirough the network to produce α transfoπn data stream Data is passed through a neui l network that produces the i adon transfoπn of the data Passmg it through the MLP stage produces the backprojection of the Radon transfoπn If the Radon transform is designated R, and the backprojection designated R*, then the whole system performs R*R on an mput Output is compared to mput and weights are set for the mput stage such that output minus mput equals zero The resultmg mput stage is α wavelet transfomi Data passed through the input process is wavelet transtoπned by this stage That constitutes the "ti anstoπned stream" By trauung the input stage to i esult in no blumiig horn the neural net, the mput stage becomes a wavelet transfomi Passmg data through this stage results m a transformed (by the wavelet transform) stream
At processmg block 903, the transform data stream is thresholded Thresholdmg the data stream may include thresholding the data based on predetermmed criteria The predetermmed criteria may mclude quality of features to be preserved, such as. for example, outlines, or a criteria such as desued compression ratio Tlie thresholdmg process may also include removing components of the data stream above a predetermmed
maximum fre αency Thus, frequencies that, for example, would normady not be seen by the human eye, may be removed to reduce the amount ot data to be compressed
At pro ;essιng block 904, α fixed mput signal is ted back through the neural network to generate a decodmg calculation ot an average value Tlie average value will be used to invert the Radon transtoπn to recover the transformed data Refemng back to Figure 1 , the feedback coiuiections ehmmate the average, wluch causes blumng Tlus is a function ot the geometry of the netwoi k A signal may be input that is fixed and constant over the network inputs Tlus produces the blur part ot the output If the blur part of the output is fed back to the weights on the network, this can be used to tune the weights to make the output and input match to tune the network
At processmg block 905. the thresholded data stream is entropy encoded to compress the data stream Tlie thresholded data stream may be divided into a plurality ot data streams if compressed data is to be stored m α distributed mode In alternative embodiments, the compressed stream may also be zero-tree encoded or bitplane encoded This produces the compressed stream Whether the transformed stream should be thresholded and/or zero-tree or bitplane encoded depends on the geometπc design ot the Radon transfomi. The mverse is the inverse ot the entropy and bitplane encoding plus the neural net express g R^R To decompress it, the entropy and bitplane oi zeio-tree encoding is inverted ( standard) and passed tluough R ' R which produces the oπguial. decoded
In the method ot compression described, the wavelet used to transform data is designed by the shape of the oriented filters and the geometry of the neural network Thus, the wavelets may be generated to fit extraordmary forms of compression demands, or specdic material
Tlie method of compression also provides a method of cleaiung the data while compiessmg it In one embodiment, tlus is accomplished by usmg threshold functions
which are soft (i.e., graduated ra .her than bmary) for compression geometry that have multiple resolutions.
Since the geometry of the u put. tι e geometry of the output, the configuration ot the oriented fdters, and the dunension ot the compression are explicit, one embodiment of the method ot compression adows extra control over compression optmuzatioii By usmg partial backprojections, tlus embodunent adows storage of the compressed data m a form which could be used for texture detection, some curvature and three-dunensional information, without decompressmg Tie partial backprojections may be done by the use of coπelation, such as the correlation ot neighbormg data, and adows image compression which is compatible with feature detection and query by content
Tlie method of compression ado s a very general, but very analytical, method toi designing unage compression. Tlie method allows image compression which munmizes concentration of activity on a network, trauung of specialized wavelet compression methods for special data, and the creation of compression methods consistent with image querymg.
Figure 10 is a block diagram ot one embodiment ot system 1000 tor compressmg images System 1000 mcludes a data repository 1010 coupled to a feeder 1020 Data repository 1010 contains data to be ted through a neural network 1040 by the feeder 1020. The neural network 1040 has a specdic geometry and is constructed by a neural network constructor 1030 by usmg a fuute and discrete Radon transfomi
Tlie construction ot the neural network 1040 is based on an analysis of the geometry of the desued network Tlie specific geometry chosen may depend on the simplicity of the encodmg, the simphcity of the decodmg, the natural geometry suggested by the subject matter to be compressed, and/or the natural geometries suggested by the network architecture
The data is fed through the neural network 1040, and the neural network 1040 produces a transformed data stream Tie transformed data stream moves through a
thresholder 1050 wluch thiesholds the data stream Tluesholdmg the Jαtα stieam may mclude thresholdmg the data based on predeteππmed criteria Tlie predetermmed criteria may include quality of features to be preserv ed, such as. for example, outlines, or a criteria such as desired compression ratio Tlie tliresholdmg process may also mclude removing components of the data stream above a predetemimed niaxunum frequencv Thus, frequencies that, foi example, would nonnully not be seen by the human ey e. may be remov ed to i educe the amount ot data to be compi essed
A fixed input signal feeder 1060 feeds a fixed input signal through the neural netwoi k 1040 to generate a decoding calculation ot an av erage v alue The av erage value will be used to invert the Radon transtoim to recover the transfonned data Refemng back to Figuie 1 the feedback connections eliminate the av erage, w hich causes blumng Tlus is a function only ot the geometry ot the netwoi \ signal mav be input that is fixed and constant ov er the netwoi inputs This produces the blui pai t of the output It the blur part ot the output is fed back to the weights on the network, tins can be used to tune the weights to make the output and mput match to tune the network
An entropy encoder 1070 is coupled to the thresholder 1050, and the entropy encoder 1070 encodes the thiesholded data stream coming out ot the thi esholdei 1050 Tlus compresses the data stream The thresholded data stieam mav be div ided into a plurality ot data streams if compressed data is to be stoi ed in a distnbuted mode
Figure 11 is a flow diagram ot one embodiment ot a method ot reconstructuig audio/video/nnage data from lugher moment data
At processing block 1 101 , a finite Radon transfomi is pei formed on α lugher moment data At processuig block 1 102. an average function is generated to adow mversion of the Radon transtoπn in one step Tlie average function may be calculated only based on the geometiy and used foi multiple i econstructions At processmg block 1 103. the Radon transfonn at each pomt is conelated When α Radon transtoπn of a
-2 -
dimension one lugher than the oπguial transform is created by takmg correlations at each pomt of the transformed data, a partial backprojection is produced.
At processuig block 1 104, α resultant set ot duplications is calculated usmg the correlation process m order to generate a new average function At block 1 105, the sum is taken ot the partial backprojections ot the Radon transfomi at each point Tlie new average function tor each point is subtracted ft-om the sum ot the partial backprojections at that point 1 106 Tlie mverse to each step is representative of the Guidikm formula.
In one embodiment, the general form tor discrete Radon transforms is explicitly given, in new cases, specifically tor the case in which balanced resolved block designs are not present The solution is not a relaxation method Tlie solution is consistent with moments generated m unage analysis Also, the solution takes geometry mto account, significantly generalizing the moment method ot describing unage data
The method disclosed, when executed m paradel, is potentially faster, smce it requires oidy a suigle step Also, the average function may be calculated only based on the geometry and used for multiple reconstructions. Tlie solution can also model many different experimental designs and coπelation statistics. In addition, the method can be trained for geometries with no closed form by backprojectmg a constant function
Figure 12 is a block diagram of one embodiment ot system 1200 for reconstructmg audio/video/image data from lugher moment data System 1200 includes a lugher moment data repository 1210 coupled to a Radon transfomi module 1220 Tlie lugher moment data repository contauis lugher moment data that is used by the Radon transtoπn module 1220 to perfoπn a fuute Radon transfoπn 1230
An average function generator 1240 generates an average function to adow mversion of the Radon transfomi 1230 m one step 1250 Tlie average function may be calculated only based on the geometry and used for multiple reconstructions
A correlation module 1260 coπelates the Radon ransfoπn 1230 at each pomt. When a Radon transfomi 1230 of a dunension one lugher t tan the oπguial transform is
created by takmg correlations at ' ach pomt ot the transtoπned data, a partial backprojection is produced.
A calculator 1270 couple..*, to the correlation module 1260 calculates a resultant set of dupdcations usmg the correlation process to generate a new average function.
A summuig module 1280 sums partial backprojections of the Radon transform 1230 at each pomt. A subtractmg module 1290 is coupled to both the calculator 1270 and the summuig module 1280. Tlie subtractmg module 1290 subtracts the new average function for each pomt from the sum of the partial backprojections at that pomt. Tlie mverse to each step is representative ot the Guidikm toπnula.
In one embodunent, the general foπn tor discrete Radon transtoπns is explicitly given, m new cases, specificady tor the case in wluch balanced resolved block designs are not present. Tlie solution is not a relaxation method The solution is consistent with moments generated ui unage analysis. Also, the solution takes geometry into account, significantly generauz g the moment method of descπbuig unage data.
Figure 13 is a flow diagram of one embodunent of a method of usmg a neural network to train a neural network.
At processing block 1301 , a model for a desued function is created as a multidimensional function A decision is made as to whether to model it in a suigle stage or not. In one embodiment at processing block 1302, to detemime whether to model the function as a smgle stage or not, it is determined if the created model fits a simple fuute geometry model or not. There is always a geometry that wid fit a particular applciation. If that geometry is better expressed as bemg of l gher dimension than 2, then the model will be to used multiple stages.
At processuig block 1303, a Radon transform is generated to fit the simple fuute geometry model. The desued function is fed through the Radon transfoπn to generate weights at processing block 1304 These weights are then used to tram a multilayer perceptron of the neural network as seen at processmg block 1305.
In this method, the const .naive proof is used to program neural networks for more than simple model problems. Nc w, neural networks can be created wl ch can model arbitrary functions with simple ii version formulas, making ti:eιr programmmg easier.
Tliis method adows suigle pass trauung of a neural network once the geometry of the trauung network is specdied. It also adows the interpolation of neurons m the ludden layer to add specificity. Tliis is not currently done with backpropagation. In addition, it adows simplification of a neural network functionality by analytic techniques from geometry and combinatorics.
Furtheπnore, the present invention presents a new, possibly simpler way to program neural networks. Tliis may allow more networks to be built with the same geometry in less time, by giving different input specifications to the trauung network. It also presents a way to add nodes to networks without rebuilding or retraining the network. Currently, d' the size of a multdayer perceptron is misestimated, the process requires going through the entire trauung cycle again. With tlus method of training, only angular projections are added. These can be calculated to ύiterpolate the existhig neurons.
Figure 14 is a block diagram of one embodunent of system 1400 for using a neural network to traui a neural network. System 1400 includes a model generator 1410 coupled to a decision module 1420. The model generator 1410 creates a model for a desired function as a multi-dimensional function. In order to detemime whether to model the function as a suigle stage or not, the decision module 1420 deteπnines if the created model fits a simple geometry model or not. There is always a geometry that will fit a particular application. If that geometry is better expressed as being of lugher dunension than 2, then the model wdl be to use multiple stages.
A Radon transfomi generator 1430 generates a Radon transform 1450 to fit the simple geometry model, and a feeder 1440 feeds the desired function through the Radon transfoπn 1450 to generate weights. A training module 1460 trauis a multdayer perceptron of the neural network usdig the weights.
The specific arrangements and methods herein are merely illustrative ot the prmciples of this mvention Numerous moddications ui form and detad may be made by those skilled m the art without departmg from the true spun and scope of the mvention