WO2002041246A2 - Video signal processing computer, cellular chip and method - Google Patents

Video signal processing computer, cellular chip and method Download PDF

Info

Publication number
WO2002041246A2
WO2002041246A2 PCT/HU2001/000113 HU0100113W WO0241246A2 WO 2002041246 A2 WO2002041246 A2 WO 2002041246A2 HU 0100113 W HU0100113 W HU 0100113W WO 0241246 A2 WO0241246 A2 WO 0241246A2
Authority
WO
WIPO (PCT)
Prior art keywords
output
input
stripe
local
video signals
Prior art date
Application number
PCT/HU2001/000113
Other languages
French (fr)
Other versions
WO2002041246A3 (en
Inventor
Ákos ZARÁNDY
Tamás ROSKA
Original Assignee
Magyar Tudományos Akadémia Számítástechnikai És Automatizálási Kutató Intézet
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Magyar Tudományos Akadémia Számítástechnikai És Automatizálási Kutató Intézet filed Critical Magyar Tudományos Akadémia Számítástechnikai És Automatizálási Kutató Intézet
Priority to AU2002218423A priority Critical patent/AU2002218423A1/en
Publication of WO2002041246A2 publication Critical patent/WO2002041246A2/en
Publication of WO2002041246A3 publication Critical patent/WO2002041246A3/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/14Picture signal circuitry for video frequency region

Definitions

  • the invention relates to a cellular network based video signal processing computer, a cellular chip and a method for converting one or more input video signals into one or more output video signals, where the conversion, i.e. the video signal processing, is carried out in real time without digitalization.
  • a programmable analog computer is disclosed, which is based on a special type of cellular networks, the cellular neural network, called CNN-UM (CNN Universal Machine).
  • the CNN-UM comprises a cell matrix consisting of locally interconnected programmable analog/logic cells. Each programmable cell, of the cell matrix comprises an analog core and local analog memory elements connected to the core via switches. Furthermore, the CNN-UM comprises a control unit for controlling the analog cores and the switches.
  • “locally interconnected” means that an analog core can only communicate directly with surrounding analog cores located within a predetermined distance.
  • the CNN-UM comprising the above cell matrix could already be suitable for real time processing of video signals without digitalizing.
  • a video signal processing computer containing such an analog/logic cell matrix is only capable of processing video frames represented by one or more input video signals, if each pixel in the frame corresponds to a respective cell in the cell matrix.
  • There are several hundred thousand pixels in one video frame but due to the limitations of the silicon technology, one silicon chip can only comprise up to some tens of thousand analog cells. Therefore, based on this known solution, it is not possible to design a cellular chip suitable for processing video signals in real time without digitalizing.
  • a cellular chip since a cellular chip can not have as many programmable cells as the number of pixels in a whole frame, a cellular chip is to be designed that processes the frames in a segmented manner. Since the frames are received line by line, it is advisable to perform a horizontal segmentation of the frames. According to the invention, the video frames are segmented into horizontal stripes extending over, i.e. covering a number of video lines, and these stripes are then processed one by one. According to the invention, each of the programmable cells of the cellular chip is assigned to a relevant pixel in a given stripe.
  • the video frames are processed in overlapping stripes. Due to the overlaps, the video frame stripes to be processed - because the input video signals are received on an ongoing basis in real time processing - are alternately read in separate memory areas, and a separate image processing operation is carried out for each stripe, using the stored values of the stripes.
  • these memory areas are formed as input memory banks comprising local analog memory elements in each programmable cell. The result of the image processing operation is loaded alternately in separate memory areas, hereinafter output memory banks, in each programmable cell, and then the one or more output video signals can be generated by an alternating scheduled read-out of these output memory banks.
  • the programmable cells in the cellular chip comprise a core being characteristic to the relevant cellular network, and in the given case they also comprise other processing units, for example those described in connection with the CNN-UM, as well as the input and output memory banks.
  • the part of the programmable cell without the inventive input and output memory banks will be called a local processor.
  • the design of the local processors is irrelevant from the aspect of the invention, and the solution offered by the invention can be used in any appropriate cellular network. Principally, the only difference between the cellular networks is in the analog cores of the local processors.
  • the invention is a video signal processing computer for converting one or more input video signals into one or more output video signals, the computer comprising a cellular network consisting of a plurality of programmable cells, each of the programmable cells comprising a local processor core and local analog memory elements attached thereto via controllable internal switches, wherein the video signal processing computer comprises a control unit for controlling the local processor cores and the internal switches, characterized in that each of the programmable cells is assigned to a respective pixel of a horizontal frame stripe extending over a number of video lines, wherein the stripe comprises a useful area and an overlapping area, the local analog memory elements comprise input memory elements connected via controllable input switches to conductor lines carrying the input video signals, and output memory elements connected via controllable output switches to conductor lines carrying the output video signals, wherein stripes located one below and overlapping one another in the overlapping areas are processed successively in such a way that by means of the control of the input switches, input analog values associated with the pixels of the stripe are
  • the invention is a cellular chip for a video signal processing computer converting one or more input video signals into one or more output video signals
  • the cellular chip comprising a cellular network formed by a plurality of programmable cells, each of the programmable cells comprising a local processor core, local analog memory elements and an internal bus connecting the local analog memory elements with the local processor core, wherein the local processor core and the local analog memory elements are connected to the internal bus via controllable internal switches, characterized in that the programmable cells forming the cellular network are arranged in a stripe consisting of several lines, which stripe comprises a useful area and an overlapping area, the programmable cells in the useful area comprise at least two input memory banks and at least two output memory banks, the programmable cells in the overlapping area comprise at least two input memory banks and at least one output memory bank, wherein each input memory bank comprises input memory elements in a number corresponding to that of the input video signals, the input memory elements storing sampled values of the input video signals, and each output
  • the invention is a method for converting one or more input video signals into one or more output video signals, wherein a cellular network comprising a plurality of programmable cells is used for the conversion, and each of the programmable cells comprises a local processor core and local memory elements, characterized in that the input video signals are processed in stripes one below and overlapping one another and extending over a number of video lines, the stripe comprising pixels in a useful area not overlapping the neighboring stripe and pixels in an overlapping area overlapping with the neighboring stripe, wherein the cellular network comprises one programmable cell for each of the pixels in the stripe, and each of the programmable cells is assigned to the same respective pixel for each stripe, the programmable cells comprising local input and output memory elements alternately assigned to the successive stripes, wherein input analog values associated with each pixel in a given stripe and obtained by a scheduled sampling of the input video signals are stored in the local input memory elements assigned to the given stripe of the programmable cell assigned to the pixel
  • the invention is a method for converting one or more input video signals into one or more output video signals, wherein the video signals represent frames consisting of video lines, characterized in that the one or more input video signals are processed in stripes one below and overlapping one another and extending over a number of video lines, the stripe comprising pixels in a useful area not overlapping the neighboring stripe and pixels in an overlapping area overlapping the neighboring stripe, wherein by a scheduled sampling of the one or more input video signals, input analog values associated with the pixels of a first stripe are stored in a first memory unit assigned to the stripe, and after storing the input analog values of an entire stripe, an image processing operation is carried out, the image processing operation resulting output analog values associated with the pixels, the output analog values are loaded into a second memory unit assigned to the stripe, and then the output analog values associated with the pixels in the useful area of the stripe are read out in a scheduled way from the second memory unit, thereby generating a part of one or more output video signals.
  • Fig. 1 is a schematic diagram of a video signal processing computer according to the invention
  • Fig. 2 is a schematic stripe configuration with a symmetric boundary condition
  • Fig. 3 is a schematic stripe configuration with an asymmetric boundary condition
  • Fig. 4 is a schematic stripe configuration with a pre-calculated asymmetric boundary condition
  • Fig. 5 is a timing diagram of a video signal processing computer designed with two input memory banks for processing stripes with a symmetric boundary condition
  • Fig. 6 is a timing diagram of a video signal processing computer designed with three input memory banks for processing stripes with a symmetric boundary condition
  • Fig. 7 is a timing diagram of a video signal processing computer designed with two input memory banks for processing stripes with an asymmetric boundary condition
  • Fig. 8 is a timing diagram of a video signal processing computer designed with three input memory banks for processing stripes with an asymmetric boundary condition
  • Fig. 9 is a schematic diagram of a preferred embodiment of a programmable cell according to the invention.
  • Fig. 10 is a diagram of the I/O module of the programmable cell shown in Fig. 9, Fig. 11 is an example showing the locations of sampling and result read-out points at a given moment within a stripe,
  • Fig. 12 is a schematic diagram of a video signal processing computer implemented with a single cellular chip according to the invention.
  • Fig. 13 is a schematic diagram of a video signal processing computer implemented with two cellular chips according to the invention.
  • Figs. 14 and 15 are schematic diagrams of video signal processing computers suitable for "inter frame” image processing
  • Fig. 16 is a schematic diagram of the digital control unit in the video signal processing computer as shown in Figs. 12 to 15.
  • the video signal processing computer has been designed for real time processing without digitalizing the interlaced or progressive scan video signals, but of course it is also suitable for processing a video signal according to any current or future standard.
  • N Assuming an N number of line and frame synchronized Y in , ⁇ ... Yj n , N input video signals as well as an M number of line and frame synchronized Y 0 ut, ⁇ , --- Yout.M output video signals, the video signal processing computer 10 comprising the cellular chip according to the invention is shown in Fig. 1.
  • input and output video signals are primarily understood as monochromatic luminance signals obtained from standard video signals by removing the synchron signal and separating into RGB or YUV components the composite video signals.
  • the input video signals YJ ⁇ ,I ... Yin. N are generated for example by a video input module 72 to be described below, and from the output video signals Y out, ⁇ • ⁇ Y o u t .
  • the standard output video signals are generated for example by a video output module 73 to be described below.
  • the programmable cells of the cellular network which is implemented preferably in the form of a cellular chip, are assigned one by one to the pixels in the stripe, therefore the design of the selected frame stripe determines the design of the cellular chip. Therefore, the description below of preferred stripe designs also serves as a description of cellular chip designs.
  • the preferred stripe designs disclosed in the specification have a full video line width (PAL: 768 pixels; NTSC: 640 pixels), consequently they include or cover full video lines.
  • PAL 768 pixels
  • NTSC 640 pixels
  • Fig. 2 depicts the design of a stripe 20 with an inventive symmetric boundary condition (SBC) , which stripe comprises a useful area 21 , an upper overlapping area 22 arranged above the useful area 21 , and a lower overlapping area 23 arranged below the useful area 21.
  • the useful area 21 contains an 'h' number of video lines, and the overlapping areas 22 and 23 comprise an 'o' number of video lines each.
  • the analog values in the upper overlapping area 22 and in the lower overlapping area 23 are always processed, and hence when applying an appropriate overlapping, the problem caused by the boundary areas can be eliminated.
  • Useful output information is only generated in the 'h' number of lines of the useful area 21 , while the 'o' number of lines in each of the upper and lower overlapping areas 22 and 23 are only processed in order to avoid boundary effects. Therefore, the result captured in these lines is not read out from the cellular chip.
  • Length L s of the stripe 20 in the figure corresponds to the width of the video frame, and width W s of the stripe 20 is the sum of the number of lines in the stripe 20, i.e. in the current case o+h+o.
  • the stripe 20 depicted in Fig. 3 has an asymmetric boundary condition (ABC).
  • ABS asymmetric boundary condition
  • the useful area 21 in addition to the useful area 21 , it is the 'o' number of lines in the lower overlapping area 23 which are processed, because the upper overlapping area has already been processed in the previous step.
  • the image processing problems caused by the boundary areas can be avoided.
  • Stripe 20 shown in Fig. 4 has a pre-calculated asymmetric boundary condition (PABC).
  • PABC asymmetric boundary condition
  • the number of lines in the overlapping area 23 may even be one half of that in the ABC case.
  • the lower boundary line of the lower overlapping area 23 represents a single accumulator line 25 for each input video signal.
  • the first line in the boundary area 26 is sampled in the accumulators in the accumulator lines 25, and then the next lines in the boundary area 26 are added in a weighted way to the first one. Consequently, in the PABC system, ⁇ +o 2 number of lines are involved in the calculation of the boundary condition.
  • Processing time t p is the maximum time available to the video signal processing computer to process a stripe of a video frame.
  • the optimal stripe design can be unambiguously determined. However, in the case of smaller series, permanent costs - for example the costs of the design work - are more dominant. Therefore, preference can be given to a cellular chip design which consists of the same type of programmable cells, i.e. a structure with a higher number of programmable cells could be optimal.
  • Fig. 5 depicts the timings of a preferred embodiment having a stripe design with a symmetric boundary condition, where the programmable cells of the cellular chip in the video signal processing computer receive the analog values associated with pixels of the one or more input video signals in two input memory banks MBA and MBB, alternately for each stripe, and the analog values generated as a result of the image processing operation are loaded into two output memory banks MB ⁇ and MB ⁇ , from which the one or more output video signals can be generated by a scheduled read-out, wherein the output memory banks MB ⁇ and MBp are read out alternately stripe by stripe.
  • the output memory banks MB ⁇ and MBp are read out alternately stripe by stripe.
  • Fig. 5 depicts a time diagram in which time runs from top to bottom contrary to usual diagrams.
  • the passing of time is indicated by consecutively received input video lines consisting of one or more video signals, which are depicted by horizontal dashed lines, and by consecutively sent output video lines consisting of one or more video signals and indicated by horizontal dashed lines.
  • the time difference between two video lines, i.e. line period t ⁇ is 64 ⁇ s in the case of a PAL system, and 63.55 ⁇ s in the case of an NTSC system.
  • FIG. 5 It can be counted in Fig. 5, how many lines the useful area 21 of the stripe configuration and its upper and lower overlapping areas 22 and 23 consist of.
  • the figure is to be treated as depicting a general embodiment, where the useful area 21 always consists of an 'h' number of lines, and the overlapping areas 22 and 23 always comprise 'o' number of lines each.
  • thin arrows t ⁇ , t * and t o depict time ranges in which the memory banks play an external I/O role
  • thick arrows t p show the time ranges when in the framework of the image processing operation, the memory banks communicate with the local processors in the cells.
  • the loading of the input memory banks associated with the relevant stripe and located in the programmable cells is always followed by a processing step, the duration of which is indicated by the thick arrow t p .
  • the relevant input memory banks are connected to the local processors. During this time, the information stored in the input memory banks is read out and processed.
  • the input memory banks are returned to data acquisition status, and they are again loaded with video lines from the beginning to the end of the stripe. Consequently, when a stripe appears repeatedly within one column, this means that the same input memory banks are repeatedly loaded.
  • the operation of the embodiment shown in Fig. 5 is the following.
  • the first input video line is loaded into the input memory banks MB A of the programmable cells associated with the first line of the useful area 21 in the first stripe, to make sure that the output video line obtained by processing the input video line can appear in the output image. This is because it is always the result generated in the useful area 21 , which is read out as explained above. If a video line is not loaded into the useful area 21 , this line would not appear on the output.
  • the upper overlapping area 22 above the first video line can be loaded with a constant value or with copies of the first video line. The same can be done when processing the bottom of the frame, in the stripe in the area below the last video line.
  • the video lines represented by the one or more input video signals are sampled in the programmable cells either by both types of the input memory banks MB A , MB B on a simultaneous basis or only by one type of those.
  • the video lines stored in the other type of input memory banks are processed by the video signal processing computer. Because of the errors appearing at the lower and upper edges, only the 'h' number of lines in the middle of the total processed image, i.e. the lines corresponding to the pixels in the useful area 21 , can be used as an output, because these inner pixels are not deteriorated by errors.
  • the loading of the input memory banks MBB associated with the useful area of the next stripe must be started immediately.
  • the lower overlapping area 23 of the fist stripe must also be loaded into the input memory banks MB A of the programmable cells in the overlapping area, i.e. for a time both types of input memory banks are sampling.
  • the input memory banks MB A associated with the first stripe are full, the input memory banks MB A are separated in a way described later from the conductor lines carrying the input video signals, and they are connected to the inner bus in the cellular chip, and hence the processing can be started.
  • the loading period of a useful area of a stripe is h-tj.
  • this entire h-t ⁇ period while the data is collected in the useful area of the other memory bank, may not be devoted to processing, because the processing may only commence when the lower overlapping area of the stripe is read in, and the processing is to be stopped when the sampling of the upper overlapping area of the next stripe associated with the input memory banks is started. Therefore, the time that can be dedicated to processing is the following:
  • the local processors can only process the data in one type of input memory banks at a time. This means that no more time is available for processing the data of a stripe than the total cycle period divided by the number of memory banks. If a 'k' number of input memory banks are assumed in the programmable cells, the total cycle period is k-h-ti, because this is the frequency at which the system is in the same phase within one frame, therefore the time that can be devoted to processing may not exceed k h-t
  • /k h-t
  • the utilization factor of the local processors can be calculated in a way that the processing time is divided by the number of lines in the useful area:
  • the output video lines can be read out with the following delay:
  • This delay t d is indicated in the figure by an arrow with a dashed line.
  • the read-out time of the video lines of one stripe from the output memory banks of the programmable cells in useful area 21 is
  • FIG. 6 depicts an embodiment where in the programmable cells there are three input memory banks MB A , MB B) MB C for receiving the input video signals. The operation of this embodiment is very similar to that discussed above. Again, each video line is to be sampled into the input memory banks of programmable cells associated with a useful area 21 of a stripe.
  • Symbols 'x' in Fig. 6 show in a given instant the programmable cell positions in which the input/output memory banks are currently busy in sampling the input video signals and generating the output video signals.
  • input memory banks MB A and output memory banks MB ⁇ are connected to the local processors for the processing operation.
  • the output video lines can be read out with the following delay:
  • the number of input memory banks per programmable cell can be further increased. Assuming that a 'k' number of input memory banks are used per programmable cell, the processing time can be calculated on the basis of the following formula:
  • the timing of the configurations with asymmetric and pre-calculated asymmetric boundary condition is principally the same.
  • the image processing operation can be started in both cases after loading an 'o' or Oi' number of lines, respectively, in the lower overlapping area 23.
  • a time equal to loading an o 2 lines is to be lapsed for generating the pre-calculated boundary condition.
  • This means that the calculations using this boundary condition can only be started o 2 -t ⁇ time after the starting of image processing. All this of course also indicates that these calculations must fit into the t p - ⁇ 2 -t ⁇ time frame, which must always be positive. In such cases, consequently, the following condition must be satisfied:
  • the input memory banks associated with a stripe, of the programmable cells in the useful area of the stripe are loaded with data for a period h-t
  • the restriction relating to local processors is in this case the following:
  • the utilization factor of the local processors is the following:
  • the output video lines may be read out with a delay of
  • the processing time can be calculated on the basis of the following formula:
  • the restriction in association with the local processors in this case is the following:
  • the utilization factor of the processors is the following:
  • the number of memory banks per programmable cell can be increased further. Assuming that a 'k' number of memory banks are applied per programmable cell, the processing time can be calculated on the basis of the following formula:
  • the utilization factor of the processors in the general case is: h - t, h ( 3)
  • Fig. 9 shows the internal structure of a preferred embodiment of a programmable cell 30 in the cellular chip according to the invention.
  • the cellular chip according to the invention can be implemented in any suitable cellular network.
  • the programmable cells are located in a standard square grid and they are locally connected to each other.
  • the cellular chip according to the invention is designed as a stripe, where each of the programmable cells 30 implemented on the cellular chip is assigned to a respective pixel of the stripe 20.
  • the programmable cell 30 comprises a local processor 31.
  • the local processor 31 comprises a programmable analog forward convolution local processor core 32.
  • the latter in the given case may also be a reverse convolution core for example, depending on the application.
  • the local processor core 32 as indicated by arrows 33, is connected to the other local processor cores 32 in the cellular chip, and together with them it is suitable to carry out for example convolution operations.
  • the local processor 31 may also comprise in accordance with the requirements of the given application a local analog output unit (LAOU) 34 known from the CNN-UM technology, which LAOU 34 is an analog arithmetic unit, and by which in the course of the image processing operations for example various operations can be carried out using analog values stored in an 'a' number of local analog memory elements (LAM) 36 in the programmable cell 30.
  • LAOU local analog output unit
  • the local processor 31 may also comprise a local logic unit (LLU) 35 also known from the CNN-UM technology, which LLU 35 is designed for carrying out logic operations using logic values in a 'b' number of local logic memories (LLM) in the programmable cell 30.
  • LLU local logic unit
  • the application of the LAOU 34 could be necessary when processing grayscale images, and the application of the LLU 35 could be advantageous when processing black and white images.
  • the LAOU 34 and/or the LLU 35, as well as the LAMs 36 and/or LLMs 37 can be omitted, i.e. these elements are not necessarily parts of the known local processor 31.
  • the elements of the local processor 31 are linked to each other via an internal bus 40, which is used for forwarding analog and logic values.
  • the elements are connected via controllable internal switches 41 to the internal bus 40, which internal switches 41 are preferably controlled globally, i.e. in an identical way in all programmable cells 30 of the cellular chip.
  • a global conductor line 55 can be used to establish connection with the inner bus 40, which enables the loading of the programmable cells 30 with initial values.
  • the programmable cells 31 also comprise an I/O module 50, which comprises local analog memory elements.
  • the local analog memory elements in the I/O module 50 are arranged in memory banks, and these memory banks perform different functions at different times, as described earlier.
  • the external and internal access to as well as the scheduling of the local analog memory elements in the memory banks are designed in a way that they enable a continuous receiving of one or more input video signals Y, n , ⁇ ... Y m , N and a continuous sending of one or more output video signals Y ou t, ⁇ • ⁇ Yout.M.
  • all the LAMs of one and the same input memory bank and output memory bank are connected to the internal bus 40 in the programmable cells. These LAMs are preferably shown memory mapped. In such a way, when switching to a different memory bank, the addresses of the mapped LAMs will not change, which simplifies the programming of the cellular chip.
  • the I/O modules 50 in the cellular chip operate synchronously with one another, i.e. at a given moment the same memory banks are activated for example to processing or sampling in each programmable cell 30.
  • the I/O module 50 there are two types of memory banks as described above.
  • the embodiment depicted in Fig. 10 comprises three input memory banks MB A , MB B and MBc, and two output memory banks MB ⁇ , MB ⁇ .
  • the number of local analog memory elements in the input memory banks MB A , MB B and MB C called hereinafter as the input memory elements, is identical with that of the input video signals. Due to the image processing of overlapping stripes, at least two input memory banks are required, but three or more input memory banks may also be applied.
  • each programmable cell 30 of the cellular chip There are the same number of input memory banks in each programmable cell 30 of the cellular chip.
  • the number of local analog memory elements in the output memory banks is identical with that of the output video signals.
  • the output memory elements In the programmable cells 30 corresponding to the pixels in the useful area 21 of the stripe, there are at least and preferably two output memory banks, and in the overlapping areas 22 and 23 there is preferably and at least one. The reason for this is that the LAMs in the output memory banks are used for temporary storage during the calculation, and then the final results are transferred here after performing the calculation. However, these final results are only of relevance in the useful area 21 , and not in the overlapping areas 22 and 23.
  • the read-out of the output video signals begins from the output memory banks, while the other output memory banks are connected to the local processors 31.
  • the final result generated in the overlapping areas 22 and 23 can be discarded, and hence the LAMs located in the output memory banks of these areas can immediately take part in processing the next frame stripe.
  • the input memory banks in the cellular chip together are called the first memory unit, and the output memory banks in the cellular chip are called the second memory unit.
  • the same input memory bank and output memory bank in each programmable cell 30, for example in Fig. 10 the input memory bank MB A and the output memory bank MB ⁇ are connected to the internal bus 40 of the local processor 31.
  • the input memory bank connected to the internal bus 40 results in the appearance of an N number of read-write LAMs on the internal bus 40, which contain the last read stripe at the time of starting the image processing.
  • the local analog memory elements in the input memory banks can also be used as temporary memories once the algorithm no longer needs the frame stripe in its original form.
  • the relevant output memory bank with its M number of LAMs becomes visible on the internal bus 40. In these LAMs, there is no useful data at the time of starting the calculation.
  • each local processor 31 has altogether a+N+M number of LAMs, including the LAMs 36, available for performing the calculations.
  • the input and output memory banks respectively, which are currently not connected to the local processors 31, perform the sampling of the input video signals Yj n , ⁇ ... Y ⁇ n ,N and the generating of output video signals Y ou t, ⁇ ⁇ •• Y o u t , N , respectively.
  • An input memory bank which is associated with a frame stripe, and which is located in a programmable cell 30 assigned to a given pixel of the frame stripe to be read in, is only connected at the moment of sampling via controllable input switches 53 to conductor lines 51 carrying the input video signals Y iri
  • Fig. 11 provides an example of the location of sampling and result readout points at one moment within one stripe. The figure refers to the symmetric boundary condition arrangement shown in Fig.
  • the input memory bank MB C of the programmable cell 30 in position 60 samples the upper overlapping area 22.
  • the output memory bank MB ⁇ of the programmable cell 30 in position 61 the result is read out, and the input memory bank MBB of the programmable cell in position 62 samples the useful area 21.
  • the input memory banks MB A and the output memory banks MBp are connected to the local processors 31.
  • the video signal processing computer described above can be implemented preferably by using the stripe based cellular chip according to the invention. Through the use of such cellular chips, a high capacity video signal processing computer can be built with a minimum number of elements.
  • a simple video signal processing computer as shown in Fig. 12, has a single cellular chip 70 of a stripe design according to the invention. In this case, in the stripe structured cellular chip 70, the number of programmable cells in one row is identical with the horizontal resolution of the input and output video signals.
  • the standard line and frame synchronized incoming video signals are received by a known video input module 72, which removes the synchron signal from them, separates eventual composite video signals to components, and only sends the monochromatic luminance signals to the cellular chip 70.
  • This N number of signals will be the input video signals Y, ⁇ , ⁇ ... Y, ⁇ , N for the cellular chip 70.
  • the video input module 72 can also carry out level shifts, if necessary.
  • a video output module 73 known per se and shown in the figure supplies synchron signals for the luminance signals read out from the cellular chip 70, i.e. for the output video signals Y out , ⁇ ••• Y o ut. N described above, and sends these through a standard video signal output amplifier stage.
  • the video signal processing computer comprises a digital control unit 71 , which carries out the digital control and synchronizing of the whole system.
  • the frame stripe can be segmented vertically into two or more sections overlapping each other in a lateral direction, where the processing of each such section can be implemented by a separate stripe structured cellular chip.
  • Fig. 13 depicts the schematical structure of a video signal processing computer having two cellular chips 70a, 70b, which video signal processing computer is designed for processing frames divided into two sections. The vertical overlap between the sections must at least be the same size as the horizontal overlap between two stripes which are one below the other. By applying such an overlap, communication (data transfer) between the cellular chips 70a and 70b can be avoided.
  • the video signal processing computers described above are only capable of so-called 'intra frame' processing. Through such type of image processing, it is not possible to implement numerous basic so-called inter frame image processing functions, e.g. motion detection, noise filtering by averaging in time etc. If, however, an analog memory of the size capable to store an entire frame, for example an ARAM 74 shown in Fig. 14, is introduced into the system, the implementation of these functions also becomes possible by means of the video signal processing computer according to the invention.
  • an ARAM as described in the publication "A 0.5 ⁇ m CMOS CNN Analog Random Access Memory Chip for Massive Image Processing" by R. Carmone, S. Espejo, R. Dominguez-Castro, A. Rodriguez-Vazquez, T.
  • the sensors of certain cameras for example long wave IR cameras and image intensified cameras have an offset error which changes in space but is constant in time.
  • the video signal is digitized in the known video signal processing computer systems, and then after the compensation of the offset error, an analog signal is generated again. If the ARAM 74 shown in Fig. 14 is replaced by a non-volatile memory, the system will be able to compensate the offset error.
  • other image improving and processing functions necessary for these noisy cameras can also be carried out by the same video signal processing computer.
  • the design of the digital control unit 71 depicted in Figs. 12 to 15 is shown in Fig. 16.
  • the control unit 71 comprises a digital processor 80, which is preferably a DSP, a microprocessor or a microcontroller, a memory 81 , a serial port handling unit 82 and bi-directional digital ports 83a, 83b, 83c.
  • the elements of the control unit 71 are connected to each other via an internal digital bus 84.
  • the memory 81 also comprises a RAM, and a boot unit, preferably a boot EPROM or a flash memory.
  • the control unit 71 can be programmed, tuned or diagnosed via an external serial bus 85.
  • These functions can be implemented preferably from a PC or a notebook connected to the external serial bus 85.
  • the bi-directional digital port 83a serves for communication with the cellular chip 70 and in the given case with the ARAM 74
  • the bi-directional digital port 83b serves for the communication with the video input module 72
  • the bi-directional digital port 83c serves for the communication with the video output module 73.
  • the speed of the digital ports 83a, 83b and 83c must be matched to the speed of the cellular chip 70, the video input module 72 and the video output module 73, and is usually approx. 15 to 50 MHz.
  • the control unit 71 may also be formed as part of the cellular chip 70.
  • the control unit 71 issues the commands preferably in a coded form to the cellular chip 70.
  • the cellular chip 70 preferably comprises a decoder unit (not shown) for transforming commands from the control unit 71 into signals for conductor lines that operate the local processor cores 32, the internal switches 41 , the input switches 53 and the output switches 54. If the control unit 71 is designed on the cellular chip 70, the decoder unit may be omitted.
  • the internal switches 41 , the input switches 53 and the output switches 54 are preferably controlled CMOS switches.
  • the control unit 71 executes a program stored in the memory 81.
  • This program usually includes a larger cycle, which has to be carried out in each frame stripe. Consequently, the period of such a cycle is equal to the following: the number of lines in the cellular chip 70, multiplied by the line period of the video line.
  • the video signal processing computer is able to receive complete video lines, and collect, process and issue them on a continuous basis.
  • the video signal processing computer can be made suitable for processing video signals coming from different modality cameras (black and white, color RGB, long wave IR, image intensified etc.), compiling them in an intelligent way, and the output may also be of multi-channel type, for example color RGB.
  • a stripe structured video signal processing computer can be built.
  • the SBC, ABC and PABC cellular chip arrangements described as preferred embodiments eliminate the problems arising from the lack of overlaps on the one hand, and on the other they can be built cost efficiently on a silicon basis.
  • the joint 'inter frame' processing of consecutive frames as described above is also possible by stripe structured cellular chips, which also allow to carry out image processing tasks in time, e.g. motion detection.
  • the video signal processing computer based on the cellular chip according to the invention may also be of a PCMCI size, and its computational output could be several hundred times higher than the current highest capacity digital signal processing systems (e.g. TMS320C6x).
  • the video signal processing computer, cellular chip and method according to the invention may be applied according to the description above not only in the cellular network used in the described embodiments, but in any suitable cellular network, for example in the convolution neural network described in the introduction, in a network having general analog processor cores or in a cellular network implemented with a two-dimensional resistive grid.

Abstract

A video signal processing computer, a cellular chip and methods are disclosed for converting one or more input video signals into one or more output video signals. A cellular network consisting of a plurality of programmable cells (30) is used for the conversion, wherein each of the programmable cells (30) is assigned to a respective pixel of a horizontal frame stripe extending over a number of video lines, wherein the stripe comprises a useful area and an overlapping area. Input memory elements are connected via controllable input switches (53) to conductor lines (51) carrying input video signals, and output memory elements are connected via controllable output switches (54) to conductor lines (52) carrying output video signals. The stripes located one below and overlapping one another in the overlapping areas are processed successively by reading in, processing and outputting analog values associated with the pixels.

Description

VIDEO SIGNAL PROCESSING COMPUTER, CELLULAR CHIP AND METHOD
TECHNICAL FIELD
The invention relates to a cellular network based video signal processing computer, a cellular chip and a method for converting one or more input video signals into one or more output video signals, where the conversion, i.e. the video signal processing, is carried out in real time without digitalization.
BACKGROUND ART
There are several known devices for video signal processing, in which a cellular network is used. For example, in EP 0 488 003 A2 an apparatus is disclosed for processing video signals, preferably television signals. In this apparatus, the cellular network operates with delayed digitized samples of video signals and primarily aims at correcting signal errors. However, this known apparatus is not suitable for processing the video signals in real time without digitalization, and thereby its field of application is substantially limited.
In US 5,355,528 a programmable analog computer is disclosed, which is based on a special type of cellular networks, the cellular neural network, called CNN-UM (CNN Universal Machine). The CNN-UM comprises a cell matrix consisting of locally interconnected programmable analog/logic cells. Each programmable cell, of the cell matrix comprises an analog core and local analog memory elements connected to the core via switches. Furthermore, the CNN-UM comprises a control unit for controlling the analog cores and the switches. Throughout the specification, "locally interconnected" means that an analog core can only communicate directly with surrounding analog cores located within a predetermined distance.
The CNN-UM comprising the above cell matrix could already be suitable for real time processing of video signals without digitalizing. However, a video signal processing computer containing such an analog/logic cell matrix is only capable of processing video frames represented by one or more input video signals, if each pixel in the frame corresponds to a respective cell in the cell matrix. There are several hundred thousand pixels in one video frame, but due to the limitations of the silicon technology, one silicon chip can only comprise up to some tens of thousand analog cells. Therefore, based on this known solution, it is not possible to design a cellular chip suitable for processing video signals in real time without digitalizing.
DISCLOSURE OF INVENTION
It is an object of the invention to provide a cellular chip based video signal processing computer, which - due to its large computational capacity - enables the processing of video signals in real time without digitalizing, and yet consists of a relatively few components, thereby enabling a low cost implementation.
It is another object of the invention to provide a cellular chip, which is suitable for processing video signals in real time without digitalizing, the chip consisting of a number of programmable cells which is acceptable from the aspect of implementation, thereby enabling a low cost implementation on one or some silicon chips.
It is a further object of the invention to provide a method for processing video signals, which enables a high capacity real time processing in a simple way, with relatively low hardware requirements and without digitalizing the video signals.
We have found that since a cellular chip can not have as many programmable cells as the number of pixels in a whole frame, a cellular chip is to be designed that processes the frames in a segmented manner. Since the frames are received line by line, it is advisable to perform a horizontal segmentation of the frames. According to the invention, the video frames are segmented into horizontal stripes extending over, i.e. covering a number of video lines, and these stripes are then processed one by one. According to the invention, each of the programmable cells of the cellular chip is assigned to a relevant pixel in a given stripe.
However, in processing video frames stripe by stripe, the problem of boundary areas arises. This is because most of the image processing operations involve activities that rely not only on the given pixel, but also on the area in their close vicinity. Therefore, the lack of or insufficient overlaps are known to lead to defects in the processed image. According to the invention, the video frames are processed in overlapping stripes. Due to the overlaps, the video frame stripes to be processed - because the input video signals are received on an ongoing basis in real time processing - are alternately read in separate memory areas, and a separate image processing operation is carried out for each stripe, using the stored values of the stripes. According to the invention, these memory areas are formed as input memory banks comprising local analog memory elements in each programmable cell. The result of the image processing operation is loaded alternately in separate memory areas, hereinafter output memory banks, in each programmable cell, and then the one or more output video signals can be generated by an alternating scheduled read-out of these output memory banks.
According to the invention, the programmable cells in the cellular chip comprise a core being characteristic to the relevant cellular network, and in the given case they also comprise other processing units, for example those described in connection with the CNN-UM, as well as the input and output memory banks. Throughout the specification, the part of the programmable cell without the inventive input and output memory banks will be called a local processor. The design of the local processors is irrelevant from the aspect of the invention, and the solution offered by the invention can be used in any appropriate cellular network. Principally, the only difference between the cellular networks is in the analog cores of the local processors. Some examples of the possible cellular networks are given below:
- A cellular network having cores as in the CNN-UM described above.
- Convolution neural network with analog feedforward convolution cores. This is an analog two-layer feedforward neural network formed as a square grid, which computes convolution in an analog way on the two-dimensional data set loaded into an input layer, and the calculated result of the convolution is displayed in an output layer.
- A cellular network with general analog processor cores. Such type of network has been described for example by Piotr Dudek and Peter J. Hicks under the title "A CMOS general-purpose sampled-data analog microprocessor", on ISCAS 2000 - IEEE Symposium on Circuits and Systems, held in Geneva on 28- 31 May 2000.
- 2D resistive grid based processing cellular network.
According to a first aspect, the invention is a video signal processing computer for converting one or more input video signals into one or more output video signals, the computer comprising a cellular network consisting of a plurality of programmable cells, each of the programmable cells comprising a local processor core and local analog memory elements attached thereto via controllable internal switches, wherein the video signal processing computer comprises a control unit for controlling the local processor cores and the internal switches, characterized in that each of the programmable cells is assigned to a respective pixel of a horizontal frame stripe extending over a number of video lines, wherein the stripe comprises a useful area and an overlapping area, the local analog memory elements comprise input memory elements connected via controllable input switches to conductor lines carrying the input video signals, and output memory elements connected via controllable output switches to conductor lines carrying the output video signals, wherein stripes located one below and overlapping one another in the overlapping areas are processed successively in such a way that by means of the control of the input switches, input analog values associated with the pixels of the stripe are read in by sampling into the input memory elements of the respective programmable cells, by means of the control of the internal switches, after reading in an entire stripe, an image processing operation is performed by the cellular network on the input analog values of the stripe, the image processing operation resulting output analog values associated with the pixels, and the output analog values are loaded into the output memory elements of the respective programmable cells, and then by means of the control of the output switches, by performing a scheduled read-out of the output analog values associated with the pixels in the useful area, a part of the output video signals is generated. According to a second aspect, the invention is a cellular chip for a video signal processing computer converting one or more input video signals into one or more output video signals, the cellular chip comprising a cellular network formed by a plurality of programmable cells, each of the programmable cells comprising a local processor core, local analog memory elements and an internal bus connecting the local analog memory elements with the local processor core, wherein the local processor core and the local analog memory elements are connected to the internal bus via controllable internal switches, characterized in that the programmable cells forming the cellular network are arranged in a stripe consisting of several lines, which stripe comprises a useful area and an overlapping area, the programmable cells in the useful area comprise at least two input memory banks and at least two output memory banks, the programmable cells in the overlapping area comprise at least two input memory banks and at least one output memory bank, wherein each input memory bank comprises input memory elements in a number corresponding to that of the input video signals, the input memory elements storing sampled values of the input video signals, and each output memory bank comprises output memory elements in a number corresponding to that of the output video signals, the output memory elements storing calculated sample values of the output video signals, and wherein the input memory elements are connected via controllable input switches to conductor lines of the input video signals, and the output memory elements are connected via controllable output switches to conductor lines of the output video signals.
According to a third aspect, the invention is a method for converting one or more input video signals into one or more output video signals, wherein a cellular network comprising a plurality of programmable cells is used for the conversion, and each of the programmable cells comprises a local processor core and local memory elements, characterized in that the input video signals are processed in stripes one below and overlapping one another and extending over a number of video lines, the stripe comprising pixels in a useful area not overlapping the neighboring stripe and pixels in an overlapping area overlapping with the neighboring stripe, wherein the cellular network comprises one programmable cell for each of the pixels in the stripe, and each of the programmable cells is assigned to the same respective pixel for each stripe, the programmable cells comprising local input and output memory elements alternately assigned to the successive stripes, wherein input analog values associated with each pixel in a given stripe and obtained by a scheduled sampling of the input video signals are stored in the local input memory elements assigned to the given stripe of the programmable cell assigned to the pixel in the given stripe, after storing the input analog values of an entire stripe, an image processing operation is carried out by the cellular network on the input analog values stored in the local input memory elements assigned to the given stripe, the image processing operation resulting output analog values associated with the pixels, and in each respective programmable cell, the output analog values are loaded into the local output memory elements which are assigned to the stripe, and then by a scheduled read-out of the output analog values from the programmable cells assigned to the pixels in the useful area of the given stripe, the given stripe associated part of the output video signals are generated.
According to a further aspect, the invention is a method for converting one or more input video signals into one or more output video signals, wherein the video signals represent frames consisting of video lines, characterized in that the one or more input video signals are processed in stripes one below and overlapping one another and extending over a number of video lines, the stripe comprising pixels in a useful area not overlapping the neighboring stripe and pixels in an overlapping area overlapping the neighboring stripe, wherein by a scheduled sampling of the one or more input video signals, input analog values associated with the pixels of a first stripe are stored in a first memory unit assigned to the stripe, and after storing the input analog values of an entire stripe, an image processing operation is carried out, the image processing operation resulting output analog values associated with the pixels, the output analog values are loaded into a second memory unit assigned to the stripe, and then the output analog values associated with the pixels in the useful area of the stripe are read out in a scheduled way from the second memory unit, thereby generating a part of one or more output video signals.
BRIEF DESCRIPTION OF DRAWINGS
The invention will hereinafter be described on the basis of preferred embodiments depicted by the drawings, where
Fig. 1 is a schematic diagram of a video signal processing computer according to the invention,
Fig. 2 is a schematic stripe configuration with a symmetric boundary condition,
Fig. 3 is a schematic stripe configuration with an asymmetric boundary condition,
Fig. 4 is a schematic stripe configuration with a pre-calculated asymmetric boundary condition,
Fig. 5 is a timing diagram of a video signal processing computer designed with two input memory banks for processing stripes with a symmetric boundary condition,
Fig. 6 is a timing diagram of a video signal processing computer designed with three input memory banks for processing stripes with a symmetric boundary condition,
Fig. 7 is a timing diagram of a video signal processing computer designed with two input memory banks for processing stripes with an asymmetric boundary condition,
Fig. 8 is a timing diagram of a video signal processing computer designed with three input memory banks for processing stripes with an asymmetric boundary condition,
Fig. 9 is a schematic diagram of a preferred embodiment of a programmable cell according to the invention,
Fig. 10 is a diagram of the I/O module of the programmable cell shown in Fig. 9, Fig. 11 is an example showing the locations of sampling and result read-out points at a given moment within a stripe,
Fig. 12 is a schematic diagram of a video signal processing computer implemented with a single cellular chip according to the invention,
Fig. 13 is a schematic diagram of a video signal processing computer implemented with two cellular chips according to the invention,
Figs. 14 and 15 are schematic diagrams of video signal processing computers suitable for "inter frame" image processing, and
Fig. 16 is a schematic diagram of the digital control unit in the video signal processing computer as shown in Figs. 12 to 15.
Elements performing identical functions are indicated with identical reference signs in the figures.
The video signal processing computer according to the invention has been designed for real time processing without digitalizing the interlaced or progressive scan video signals, but of course it is also suitable for processing a video signal according to any current or future standard. Assuming an N number of line and frame synchronized Yin,ι ... Yjn,N input video signals as well as an M number of line and frame synchronized Y0ut,ι , --- Yout.M output video signals, the video signal processing computer 10 comprising the cellular chip according to the invention is shown in Fig. 1. In the description, input and output video signals are primarily understood as monochromatic luminance signals obtained from standard video signals by removing the synchron signal and separating into RGB or YUV components the composite video signals. From the standard input video signals, the input video signals YJΠ,I ... Yin.N are generated for example by a video input module 72 to be described below, and from the output video signals Yout,ι •■■ Yout. the standard output video signals are generated for example by a video output module 73 to be described below.
In the video signal processing computer 10 according to the invention, the programmable cells of the cellular network, which is implemented preferably in the form of a cellular chip, are assigned one by one to the pixels in the stripe, therefore the design of the selected frame stripe determines the design of the cellular chip. Therefore, the description below of preferred stripe designs also serves as a description of cellular chip designs. The preferred stripe designs disclosed in the specification have a full video line width (PAL: 768 pixels; NTSC: 640 pixels), consequently they include or cover full video lines. However, according to the description below, some other cellular chip designs are also possible, in which the stripes are segmented vertically into several overlapping sections, and each section is processed by a separate cellular chip. The advantage of this structure is that a single cellular chip does not have to cover the entire video line, but only a part thereof, which results in the fact that each cellular chip can be designed on a smaller silicon surface, leading to a more robust production technology and a better yield.
Fig. 2 depicts the design of a stripe 20 with an inventive symmetric boundary condition (SBC) , which stripe comprises a useful area 21 , an upper overlapping area 22 arranged above the useful area 21 , and a lower overlapping area 23 arranged below the useful area 21. The useful area 21 contains an 'h' number of video lines, and the overlapping areas 22 and 23 comprise an 'o' number of video lines each. In the course of processing the video frames stripe by stripe, in addition to the sampled analog values associated with the pixels in the useful area 21 , the analog values in the upper overlapping area 22 and in the lower overlapping area 23 are always processed, and hence when applying an appropriate overlapping, the problem caused by the boundary areas can be eliminated. Useful output information is only generated in the 'h' number of lines of the useful area 21 , while the 'o' number of lines in each of the upper and lower overlapping areas 22 and 23 are only processed in order to avoid boundary effects. Therefore, the result captured in these lines is not read out from the cellular chip. Length Ls of the stripe 20 in the figure corresponds to the width of the video frame, and width Ws of the stripe 20 is the sum of the number of lines in the stripe 20, i.e. in the current case o+h+o.
The stripe 20 depicted in Fig. 3 has an asymmetric boundary condition (ABC). In this case, in addition to the useful area 21 , it is the 'o' number of lines in the lower overlapping area 23 which are processed, because the upper overlapping area has already been processed in the previous step. However, in such a design it can be necessary to store the intermediate results or final results of the lowest line in the useful area 21 of the previously processed stripe 20. This is because these stored lines will be the upper boundary lines of the useful area 21 of the next stripe 20. With such a design, the image processing problems caused by the boundary areas can be avoided. In the ABC case, only the programmable cell rows corresponding to the lines in the useful area 21 and the programmable cell rows corresponding to the lines in the lower overlapping area 23 must be designed, thereby making the stripe 20 and the stripe structured cellular chip even narrower, and saving substantial silicon surface. Of course, for the upper boundary line, a storing area 24 comprising memory lines is to be created, which storing area occupies roughly the same space on the silicon chip as a single programmable cell row.
Stripe 20 shown in Fig. 4 has a pre-calculated asymmetric boundary condition (PABC). This can be used efficiently in algorithms where the values of distant pixels only influence the final result via large neighborhood diffusions, and in the other image processing steps it suffices to take into consideration a few lines in the immediate vicinity. This is a frequent type of algorithm in the correction/fusion of live video image flows. Since it is sufficient to include the average of distant neighbors in the diffusion, in this case an o2 number of lines in a boundary area 26 below the 01 number of lines in the lower overlapping area 23 are added in a weighted way depending on their distances, and when calculating the diffusion, the line so obtained is used as the lower boundary line of the lower overlapping area 23. In this case, in addition to the lines in the useful area 21, again only the 01 number of lines in the lower overlapping area 23 is to be created on the silicon chip, but here the number of lines in the overlapping area 23 may even be one half of that in the ABC case. The lower boundary line of the lower overlapping area 23 represents a single accumulator line 25 for each input video signal. The first line in the boundary area 26 is sampled in the accumulators in the accumulator lines 25, and then the next lines in the boundary area 26 are added in a weighted way to the first one. Consequently, in the PABC system, θι+o2 number of lines are involved in the calculation of the boundary condition.
The timings of the cellular chips associated with the above stripe designs will be discussed below. The timing considerations are always based on the requirement that the video signal processing computer should be able to receive, process and then issue every single video line of the video frames. As discussed later, in designing the stripe configurations, there are several free parameters to be selected according to the special requirements of the target application. These free parameters (hereinafter primary parameters) are the following:
- processing time tp
- total overlap,
- N number of input video signals,
- M number of output video signals.
Processing time tp is the maximum time available to the video signal processing computer to process a stripe of a video frame.
These parameters can be deducted from the relevant application, but even on the basis of this information the solution is not yet unambiguous. According to the discussion to follow, by increasing the number of programmable cell rows in the useful area, the number of local analog memory elements per one programmable cell can be reduced, and vice versa. In the case of a large production volume, from the designs offering an identical functionality, of course the most appropriate solution is the one where the silicon surface requirement is lower. To make a decision, the following secondary parameters based on the given silicon technology are required:
- the total silicon surface requirement of a programmable cell without the local analog memory elements;
- the silicon surface requirement of a local analog memory element.
If all the parameters are known, the optimal stripe design can be unambiguously determined. However, in the case of smaller series, permanent costs - for example the costs of the design work - are more dominant. Therefore, preference can be given to a cellular chip design which consists of the same type of programmable cells, i.e. a structure with a higher number of programmable cells could be optimal.
The various stripe designs and the timings of the corresponding cellular chip should be examined by taking into account a continuous receiving and sending of the video lines making up the video frames. The receiving and sending can also be examined separately, therefore the process is described first from the receiving side. Fig. 5 depicts the timings of a preferred embodiment having a stripe design with a symmetric boundary condition, where the programmable cells of the cellular chip in the video signal processing computer receive the analog values associated with pixels of the one or more input video signals in two input memory banks MBA and MBB, alternately for each stripe, and the analog values generated as a result of the image processing operation are loaded into two output memory banks MBα and MBβ, from which the one or more output video signals can be generated by a scheduled read-out, wherein the output memory banks MBα and MBp are read out alternately stripe by stripe. There are local analog memory elements in these memory banks to be described in more details later on.
In line with the characteristic image updating process of video systems, Fig. 5 depicts a time diagram in which time runs from top to bottom contrary to usual diagrams. The passing of time is indicated by consecutively received input video lines consisting of one or more video signals, which are depicted by horizontal dashed lines, and by consecutively sent output video lines consisting of one or more video signals and indicated by horizontal dashed lines. The time difference between two video lines, i.e. line period tι is 64 μs in the case of a PAL system, and 63.55 μs in the case of an NTSC system.
Of course, not all video lines of a video frame are depicted in Fig. 5, but only the timings of the basic operation is shown. The columns in the figure indicate the I/O operations of each memory bank in the programmable cells. The stripes indicated in the columns of the input memory banks imply that the input memory bank of the column is assigned to the relevant stripe, and the useful areas shown in the columns of the output memory banks imply that the result of processing of the relevant stripe is read out from the output memory banks corresponding to the column, of the programmable cells in the useful area of the stripe. Consequently, when a stripe in a column associated with an input memory bank covers an input video line, this means that the relevant video line is sampled by the relevant input memory bank in the programmable cells associated with the video line's position within the stripe, and when in a column associated with an output memory bank the useful area covers an output video line, this means that the relevant video line is read out from the relevant output memory bank in the programmable cells associated with the output video line's position within the stripe. Furthermore, it can also be followed horizontally, which programmable cell's input memory bank within the stripe receives the input analog values, and, respectively, which programmable cell's output memory bank within the stripe the output analog values are read out from. When two input memory banks are indicated on the same input video line, this means that in the programmable cells associated with the video line's position within the stripes, both input memory banks sample the video line, i.e. this video line is stored twice within the cellular chip.
It can be counted in Fig. 5, how many lines the useful area 21 of the stripe configuration and its upper and lower overlapping areas 22 and 23 consist of. However, the figure is to be treated as depicting a general embodiment, where the useful area 21 always consists of an 'h' number of lines, and the overlapping areas 22 and 23 always comprise 'o' number of lines each. In the figure, thin arrows tι, t* and to depict time ranges in which the memory banks play an external I/O role, while thick arrows tp show the time ranges when in the framework of the image processing operation, the memory banks communicate with the local processors in the cells.
The loading of the input memory banks associated with the relevant stripe and located in the programmable cells is always followed by a processing step, the duration of which is indicated by the thick arrow tp. In the course of processing, the relevant input memory banks are connected to the local processors. During this time, the information stored in the input memory banks is read out and processed. Next, the input memory banks are returned to data acquisition status, and they are again loaded with video lines from the beginning to the end of the stripe. Consequently, when a stripe appears repeatedly within one column, this means that the same input memory banks are repeatedly loaded.
The operation of the embodiment shown in Fig. 5 is the following. The first input video line is loaded into the input memory banks MBA of the programmable cells associated with the first line of the useful area 21 in the first stripe, to make sure that the output video line obtained by processing the input video line can appear in the output image. This is because it is always the result generated in the useful area 21 , which is read out as explained above. If a video line is not loaded into the useful area 21 , this line would not appear on the output. In the first stripe, the upper overlapping area 22 above the first video line can be loaded with a constant value or with copies of the first video line. The same can be done when processing the bottom of the frame, in the stripe in the area below the last video line.
The video lines represented by the one or more input video signals are sampled in the programmable cells either by both types of the input memory banks MBA, MBB on a simultaneous basis or only by one type of those. During the time when only one type of input memory banks is sampling, the video lines stored in the other type of input memory banks are processed by the video signal processing computer. Because of the errors appearing at the lower and upper edges, only the 'h' number of lines in the middle of the total processed image, i.e. the lines corresponding to the pixels in the useful area 21 , can be used as an output, because these inner pixels are not deteriorated by errors. Therefore, when the input memory banks MBA associated with the useful area of the first stripe in the programmable cells of the useful area are full, the loading of the input memory banks MBB associated with the useful area of the next stripe must be started immediately. Of course, simultaneously, the lower overlapping area 23 of the fist stripe must also be loaded into the input memory banks MBA of the programmable cells in the overlapping area, i.e. for a time both types of input memory banks are sampling. When the input memory banks MBA associated with the first stripe are full, the input memory banks MBA are separated in a way described later from the conductor lines carrying the input video signals, and they are connected to the inner bus in the cellular chip, and hence the processing can be started.
By assuming an 'h' number of lines in the useful area 21 and an 'o' number of lines in the overlapping areas 22 and 23 each, the result is that the data acquisition period of one type of input memory banks is t, = (h+2o)-t|. When processing the first and last stripes of a frame, data are only collected for the period t* < t, because in this case an incomplete stripe is read in. For the image processing operation carried out on the values stored in one type of input memory banks, a time period is available while these given input memory banks do not collect data. Since in this preferred embodiment one video line is always loaded into one and only one useful area, once a useful area is full, the other type of input memory banks receive the next 'h' number of lines. Consequently, the loading period of a useful area of a stripe is h-tj. However, this entire h-tι period, while the data is collected in the useful area of the other memory bank, may not be devoted to processing, because the processing may only commence when the lower overlapping area of the stripe is read in, and the processing is to be stopped when the sampling of the upper overlapping area of the next stripe associated with the input memory banks is started. Therefore, the time that can be dedicated to processing is the following:
The local processors can only process the data in one type of input memory banks at a time. This means that no more time is available for processing the data of a stripe than the total cycle period divided by the number of memory banks. If a 'k' number of input memory banks are assumed in the programmable cells, the total cycle period is k-h-ti, because this is the frequency at which the system is in the same phase within one frame, therefore the time that can be devoted to processing may not exceed k h-t|/k = h-t|.
Figure imgf000017_0002
It can be seen that this condition is always met in this case. The utilization factor of the local processors can be calculated in a way that the processing time is divided by the number of lines in the useful area:
h ~ 2o e = (3) h - t, h
It can be seen that a 100 % local processor utilization factor cannot be achieved in such a design, except in the case when a system without overlap is made, where o = 0. Now let us examine what is the width of the arrangement in a typical case. Let the necessary overlap be 8, and let the minimum required processing time be 480 μs. Now o = 8, tp = 8t|. If a substitution is made in equation (1), the result is that the number of useful lines h - 24, and the width of the cellular chip is h + 2o = 40. This is a relatively large number, because if the intention is to build the computer on a single chip, it should include 768 40 = 30.720 elementary programmable cells, which is roughly the limit of feasibility. At the same time it can be seen that the utilization factor of local processors is low, only 1/3.
From the output memory banks, the output video lines can be read out with the following delay:
td = ( +o)-tι+tp => td = (2h-o)-tι (4)
This delay td is indicated in the figure by an arrow with a dashed line. The read-out time of the video lines of one stripe from the output memory banks of the programmable cells in useful area 21 is
t0 = h-t,. (5)
By increasing the number of input memory banks per programmable cell, the utilization factor of local processors can be increased, i.e. fewer local processors can carry out the same task. Fig. 6 depicts an embodiment where in the programmable cells there are three input memory banks MBA, MBB) MBC for receiving the input video signals. The operation of this embodiment is very similar to that discussed above. Again, each video line is to be sampled into the input memory banks of programmable cells associated with a useful area 21 of a stripe.
Symbols 'x' in Fig. 6 show in a given instant the programmable cell positions in which the input/output memory banks are currently busy in sampling the input video signals and generating the output video signals. In this instant, input memory banks MBA and output memory banks MBβ are connected to the local processors for the processing operation.
Since there are three input memory banks MBA, MBB, MBC for each programmable cell in this case, data are collected for a period h-t| from the useful area 21 of a given stripe, and then the input memory banks associated with the stripes are free for a period 2h-t|. If the time devoted to loading the lines in the overlapping areas is subtracted from the time 2h-tι, the processing time is obtained:
tp = (2h-2o)t| (6) The restriction related to local processors is the following in this case:
tp < h-ti => 2h-2o < h = h < 2o (7)
The same is reflected by the utilization factor of local processors:
tD 2h-2o e = (8) h -t, h
From the output memory banks, the output video lines can be read out with the following delay:
td=(h+o)-tι+tp => td=(3h-o)-tι (9)
If the condition (7) is not satisfied, the utilization factor is higher than 1 , which is obviously a nonsense. It can be seen that in the case of h = 2o, the utilization factor of the local processors is 100 %.
By substituting the previous parameters (o = 8, tp = 8tι), the number of useful lines falls back to one half, i.e. to h = 12, while the width of the cellular chip is h+2o = 28, i.e. the number of local processors is substantially reduced. The utilization factor of the processors is tp/(h-tι) = 2/3.
The number of input memory banks per programmable cell can be further increased. Assuming that a 'k' number of input memory banks are used per programmable cell, the processing time can be calculated on the basis of the following formula:
tp = ((k-1)h-2o)t, (10)
In such a case the output condition is:
tp ≤ h-ti = (k-1 )h-2o < h = (k-2)h < 2o (11) The utilization factor of the processors is:
tp _ (k - 1)h -2o
(12) h -t, h
The timing of the configurations with asymmetric and pre-calculated asymmetric boundary condition is principally the same. The image processing operation can be started in both cases after loading an 'o' or Oi' number of lines, respectively, in the lower overlapping area 23. In the PABC case, however, after loading 'o-i' number of lines a time equal to loading an o2 lines is to be lapsed for generating the pre-calculated boundary condition. This means that the calculations using this boundary condition can only be started o2-tι time after the starting of image processing. All this of course also indicates that these calculations must fit into the tp2-tι time frame, which must always be positive. In such cases, consequently, the following condition must be satisfied:
tP > o2-t| (13)
Examining first the case of two input memory banks as depicted in Fig. 7, the difference between the cases discussed above is that there is an overlapping area 23 at the bottom only, i.e. the number of overlapping lines is reduced to one half. The data acquisition time associated with one stripe is
Figure imgf000020_0001
The input memory banks associated with a stripe, of the programmable cells in the useful area of the stripe are loaded with data for a period h-t|, and then they are free for a period h-t|. If the period devoted to reading in the overlapping lines is subtracted from this h-t| time, the processing time is obtained:
tp = (h-o)t, (14)
The restriction relating to local processors is in this case the following:
tp < h-tι => h-o < h => o > 0 (15) The utilization factor of the local processors is the following:
tp = h -o
(16) h -t, h
Similarly to the case of symmetric boundary condition with two input memory banks, again a 100 % utilization factor can be obtained only if no overlap is applied. Substituting the previous parameters in the ABC case (o = 8, tp = 8tι), the number of useful lines in this case is h = 16, while the full width of the cellular chip will be h+o = 24. Thereby the number of local processors is reduced further. The utilization factor of the local processors is 50%.
Considering the restriction of condition (13) in the PABC case, 'o' is replaced by 'o-T in (14), (15) and (16). According to the example, in the case of the parameters Oi = 4 and o2 = 4, the width of the useful area is reduced to h = 12 and the total width of the cellular chip is only h+o = 16, while the utilization factor of local processors increases to 2/3.
From the output memory banks of the programmable cells associated with the stripe, the output video lines may be read out with a delay of
td = (h+o)-tι+tp => td = 2h-tι (17)
When applying three input memory banks per cell, the number of local processors can be reduced also in the asymmetric case, as explained below in connection with Fig. 8. Similarly to the previous considerations, the processing time can be calculated on the basis of the following formula:
Figure imgf000021_0001
The restriction in association with the local processors in this case is the following:
tp ≤ h-tι => 2h-o < h => h < o (19) The utilization factor of the processors is the following:
2h -o e = - (20) h - t,
The loading time of one type of input memory banks associated with the relevant stripe and located in the programmable cells is tj = (h+o)-t|. The delay of the output image is td = (h+o)-tι+tp, i.e. td = 3h-t|.
It can be seen that if the condition (19) were not satisfied, the utilization factor would be higher than 100%. Substituting the previous parameters (o = 8, tp = 8t|) in the ABC case, the number of useful lines falls back to two-thirds, i.e. h = 8, while the width of the cellular chip is h+o = 16 in the asymmetric case, and in the pre-calculated asymmetric case it is o2 less. The utilization factor of processors in this case is 100 %.
Since already the ABC case utilizes the processors in 100 %, the formula (19) in the PABC case would not allow the number of useful lines to fall below 8 either. Therefore, in this case the necessary silicon surface can only be reduced by reducing the number of overlapping lines. In the case of o2 = 4, the total width of the cellular chip is 12.
The number of memory banks per programmable cell can be increased further. Assuming that a 'k' number of memory banks are applied per programmable cell, the processing time can be calculated on the basis of the following formula:
tp = ((k-1)h-o)t, (21)
In such a case the output condition is:
tp < h-ti => (k-1 )h-o ≤ h => (k-2)h < o (22)
The utilization factor of the processors in the general case is: h - t, h ( 3)
Fig. 9 shows the internal structure of a preferred embodiment of a programmable cell 30 in the cellular chip according to the invention. As mentioned in the introduction, the cellular chip according to the invention can be implemented in any suitable cellular network. In association with Figs. 9 and 10, an implementation based on a cellular network according to the CNN-UM will be described below as an example. In this cellular network, the programmable cells are located in a standard square grid and they are locally connected to each other. The cellular chip according to the invention is designed as a stripe, where each of the programmable cells 30 implemented on the cellular chip is assigned to a respective pixel of the stripe 20.
The programmable cell 30 comprises a local processor 31. As it is known from the CNN-UM technology, the local processor 31 comprises a programmable analog forward convolution local processor core 32. The latter in the given case may also be a reverse convolution core for example, depending on the application. The local processor core 32, as indicated by arrows 33, is connected to the other local processor cores 32 in the cellular chip, and together with them it is suitable to carry out for example convolution operations. In the embodiment shown in the figure, the local processor 31 may also comprise in accordance with the requirements of the given application a local analog output unit (LAOU) 34 known from the CNN-UM technology, which LAOU 34 is an analog arithmetic unit, and by which in the course of the image processing operations for example various operations can be carried out using analog values stored in an 'a' number of local analog memory elements (LAM) 36 in the programmable cell 30. According to the application requirements, the local processor 31 may also comprise a local logic unit (LLU) 35 also known from the CNN-UM technology, which LLU 35 is designed for carrying out logic operations using logic values in a 'b' number of local logic memories (LLM) in the programmable cell 30. The application of the LAOU 34 could be necessary when processing grayscale images, and the application of the LLU 35 could be advantageous when processing black and white images. Subject to the desired image processing operations, the LAOU 34 and/or the LLU 35, as well as the LAMs 36 and/or LLMs 37 can be omitted, i.e. these elements are not necessarily parts of the known local processor 31. The elements of the local processor 31 are linked to each other via an internal bus 40, which is used for forwarding analog and logic values. The elements are connected via controllable internal switches 41 to the internal bus 40, which internal switches 41 are preferably controlled globally, i.e. in an identical way in all programmable cells 30 of the cellular chip. By means of a known controllable cell switch 56, a global conductor line 55 can be used to establish connection with the inner bus 40, which enables the loading of the programmable cells 30 with initial values.
According to the invention, the programmable cells 31 also comprise an I/O module 50, which comprises local analog memory elements. The local analog memory elements in the I/O module 50 are arranged in memory banks, and these memory banks perform different functions at different times, as described earlier. The external and internal access to as well as the scheduling of the local analog memory elements in the memory banks are designed in a way that they enable a continuous receiving of one or more input video signals Y,n,ι ... Ym,N and a continuous sending of one or more output video signals Yout,ι •■■ Yout.M.
During the image processing operation, all the LAMs of one and the same input memory bank and output memory bank are connected to the internal bus 40 in the programmable cells. These LAMs are preferably shown memory mapped. In such a way, when switching to a different memory bank, the addresses of the mapped LAMs will not change, which simplifies the programming of the cellular chip.
The I/O modules 50 in the cellular chip operate synchronously with one another, i.e. at a given moment the same memory banks are activated for example to processing or sampling in each programmable cell 30. In the I/O module 50, there are two types of memory banks as described above. The embodiment depicted in Fig. 10 comprises three input memory banks MBA, MBB and MBc, and two output memory banks MBα, MBβ. The number of local analog memory elements in the input memory banks MBA, MBB and MBC, called hereinafter as the input memory elements, is identical with that of the input video signals. Due to the image processing of overlapping stripes, at least two input memory banks are required, but three or more input memory banks may also be applied. There are the same number of input memory banks in each programmable cell 30 of the cellular chip. The number of local analog memory elements in the output memory banks, called hereinafter as the output memory elements, is identical with that of the output video signals. In the programmable cells 30 corresponding to the pixels in the useful area 21 of the stripe, there are at least and preferably two output memory banks, and in the overlapping areas 22 and 23 there is preferably and at least one. The reason for this is that the LAMs in the output memory banks are used for temporary storage during the calculation, and then the final results are transferred here after performing the calculation. However, these final results are only of relevance in the useful area 21 , and not in the overlapping areas 22 and 23. In the useful area 21 , once the calculation is completed, the read-out of the output video signals begins from the output memory banks, while the other output memory banks are connected to the local processors 31. At the same time, the final result generated in the overlapping areas 22 and 23 can be discarded, and hence the LAMs located in the output memory banks of these areas can immediately take part in processing the next frame stripe. Of course, it is possible to have the same number of output memory banks, preferably two, in each programmable cell 30. This enables a more flexible cellular chip design, because when programming the cellular chip, the size of the useful area 21 and that of the overlapping areas 22 and 23 can be modified in a simple way.
In the description and in the claims, the input memory banks in the cellular chip together are called the first memory unit, and the output memory banks in the cellular chip are called the second memory unit.
As described above, during image processing, the same input memory bank and output memory bank in each programmable cell 30, for example in Fig. 10 the input memory bank MBA and the output memory bank MBβ are connected to the internal bus 40 of the local processor 31. The input memory bank connected to the internal bus 40 results in the appearance of an N number of read-write LAMs on the internal bus 40, which contain the last read stripe at the time of starting the image processing. The local analog memory elements in the input memory banks can also be used as temporary memories once the algorithm no longer needs the frame stripe in its original form. At the time of processing, the relevant output memory bank with its M number of LAMs becomes visible on the internal bus 40. In these LAMs, there is no useful data at the time of starting the calculation. Once the calculation is completed, the calculated final result is to be transferred here. For calculating the final result, of course these LAMs may also be used as temporary memories. Consequently, in the embodiment shown, each local processor 31 has altogether a+N+M number of LAMs, including the LAMs 36, available for performing the calculations.
The input and output memory banks, respectively, which are currently not connected to the local processors 31, perform the sampling of the input video signals Yjn,ι ... Yιn,N and the generating of output video signals Yout,ι ■•• Yout,N, respectively. An input memory bank, which is associated with a frame stripe, and which is located in a programmable cell 30 assigned to a given pixel of the frame stripe to be read in, is only connected at the moment of sampling via controllable input switches 53 to conductor lines 51 carrying the input video signals Yiri|1 ... Yjn.N- In this moment, the analog values of the pixel of the input video signals Y|n,ι ... Yjn. being in line and frame synchron are sampled into the LAMs of the input memory bank. The result obtained by the image processing operation is read out from those output memory banks, which are associated with the stripe, and which are in the 30 programmable cells corresponding to the pixels in the useful area 21. The read out is carried out in a scheduled succession in a way that the M number of LAMs of the relevant output memory bank are connected by means of controllable output switches 54 at the moment of read-out to conductor lines 52 carrying the output video signals Yout,ι ••• YoutN
The output video signals follow the input video signals with a delay of td = (h+o)tι+tp. This is the same delay all the time, because it takes this much time until the first stripe is processed by the video signal processing computer, and until output can be started. Since this time is an integer multiple of the line period, the input and output video signals are always in line synchron. This results in the fact that the sampling, as well as the read-out of the result generated by the image processing operations is always carried out in the same column of the stripe at any moment. Fig. 11 provides an example of the location of sampling and result readout points at one moment within one stripe. The figure refers to the symmetric boundary condition arrangement shown in Fig. 6, and the programmable cells 30 active from the aspect of sampling/read-out are shown in the horizontal positions marked with an 'x'. The input memory bank MBC of the programmable cell 30 in position 60 samples the upper overlapping area 22. From the output memory bank MBα of the programmable cell 30 in position 61 the result is read out, and the input memory bank MBB of the programmable cell in position 62 samples the useful area 21. At this time, the input memory banks MBA and the output memory banks MBp are connected to the local processors 31.
It can be seen that in controlling the cellular chip according to the invention, it is to be specified at each sampling moment which programmable cells are assigned to an I/O operation, and this represents the specifying of a single column and several lines in the stripe. Furthermore, the input memory banks into which the sampling values are to be loaded in the assigned programmable cells, and output memory banks used for reading out the result of the image processing operation should be specified as well.
For comparing the SBC, ABC and PABC stripe configurations, it is assumed that in the various arrangements, the number of cells in one row is identical with the horizontal resolution of the input video signals. Consequently, in the different designs, it is not the number of columns which is different, but only the number of rows, and the number of LAMs in the I/O modules 50 of each programmable cell 30 changes. Therefore, it is enough to compare the complexity of the columns of the arrangements. Of course, special elements like the additional circuits and memory lines of the ABC as well as the analog accumulator lines required for calculating the PABC should not be overlooked. The comparison of arrangements according to the discussion above is given in Table 1 , assuming two output memory banks in the useful area, and one output memory bank in the overlapping areas.
Figure imgf000028_0001
Table 1
The video signal processing computer described above can be implemented preferably by using the stripe based cellular chip according to the invention. Through the use of such cellular chips, a high capacity video signal processing computer can be built with a minimum number of elements. A simple video signal processing computer, as shown in Fig. 12, has a single cellular chip 70 of a stripe design according to the invention. In this case, in the stripe structured cellular chip 70, the number of programmable cells in one row is identical with the horizontal resolution of the input and output video signals. The standard line and frame synchronized incoming video signals are received by a known video input module 72, which removes the synchron signal from them, separates eventual composite video signals to components, and only sends the monochromatic luminance signals to the cellular chip 70. This N number of signals will be the input video signals Y,π,ι ... Y,Π,N for the cellular chip 70. The video input module 72 can also carry out level shifts, if necessary. A video output module 73 known per se and shown in the figure supplies synchron signals for the luminance signals read out from the cellular chip 70, i.e. for the output video signals Yout,ι ••• Yout.N described above, and sends these through a standard video signal output amplifier stage. Furthermore, the video signal processing computer comprises a digital control unit 71 , which carries out the digital control and synchronizing of the whole system.
If it is impossible or not advisable to design all the necessary programmable cells on a single silicon chip, the frame stripe can be segmented vertically into two or more sections overlapping each other in a lateral direction, where the processing of each such section can be implemented by a separate stripe structured cellular chip. Fig. 13 depicts the schematical structure of a video signal processing computer having two cellular chips 70a, 70b, which video signal processing computer is designed for processing frames divided into two sections. The vertical overlap between the sections must at least be the same size as the horizontal overlap between two stripes which are one below the other. By applying such an overlap, communication (data transfer) between the cellular chips 70a and 70b can be avoided. The operation of a video signal processing computer consisting of several cellular chips is fully identical with that of the system consisting of a single cellular chip. The only difference is that the first cellular chip samples and processes the first part of each video line, the second cellular chip processes the second part etc.
The video signal processing computers described above are only capable of so-called 'intra frame' processing. Through such type of image processing, it is not possible to implement numerous basic so-called inter frame image processing functions, e.g. motion detection, noise filtering by averaging in time etc. If, however, an analog memory of the size capable to store an entire frame, for example an ARAM 74 shown in Fig. 14, is introduced into the system, the implementation of these functions also becomes possible by means of the video signal processing computer according to the invention. For example, an ARAM as described in the publication "A 0.5 μm CMOS CNN Analog Random Access Memory Chip for Massive Image Processing" by R. Carmone, S. Espejo, R. Dominguez-Castro, A. Rodriguez-Vazquez, T. Roska, T. Kozek and L. O. Chua, Proceedings of IEEE Int. Workshop on Cellular Neural Networks and Their Applications, (CNNA '98), pages 271-276, can be applied as ARAM 74. In a way shown in Fig. 14, by means of the ARAM 74, a delay equal to a full frame can be introduced in the path of the video signals, which results in the fact that not only the current stripe, but also the stripe in the same location of the previous frame can be loaded into the stripe structured cellular chip 70, it can be stored in the local analog memory elements being in the programmable cells 30 and it can be used in the image processing operation. According to the layout shown in Fig. 15, by means of the ARAM 74, a feedback may also be established between the one or more input video signals and the one or more output video signals, and consequently, calculations in time requiring feedback can also be implemented.
The sensors of certain cameras, for example long wave IR cameras and image intensified cameras have an offset error which changes in space but is constant in time. To eliminate these offset defects, the video signal is digitized in the known video signal processing computer systems, and then after the compensation of the offset error, an analog signal is generated again. If the ARAM 74 shown in Fig. 14 is replaced by a non-volatile memory, the system will be able to compensate the offset error. Of course, at the same time, other image improving and processing functions necessary for these noisy cameras can also be carried out by the same video signal processing computer.
The design of the digital control unit 71 depicted in Figs. 12 to 15 is shown in Fig. 16. The control unit 71 comprises a digital processor 80, which is preferably a DSP, a microprocessor or a microcontroller, a memory 81 , a serial port handling unit 82 and bi-directional digital ports 83a, 83b, 83c. The elements of the control unit 71 are connected to each other via an internal digital bus 84.
The memory 81 also comprises a RAM, and a boot unit, preferably a boot EPROM or a flash memory. By means of the serial port handling unit 82, the control unit 71 can be programmed, tuned or diagnosed via an external serial bus 85. These functions can be implemented preferably from a PC or a notebook connected to the external serial bus 85. The bi-directional digital port 83a serves for communication with the cellular chip 70 and in the given case with the ARAM 74, the bi-directional digital port 83b serves for the communication with the video input module 72 and the bi-directional digital port 83c serves for the communication with the video output module 73. The speed of the digital ports 83a, 83b and 83c must be matched to the speed of the cellular chip 70, the video input module 72 and the video output module 73, and is usually approx. 15 to 50 MHz. In the given case, the control unit 71 may also be formed as part of the cellular chip 70.
The control unit 71 issues the commands preferably in a coded form to the cellular chip 70. In this case, the cellular chip 70 preferably comprises a decoder unit (not shown) for transforming commands from the control unit 71 into signals for conductor lines that operate the local processor cores 32, the internal switches 41 , the input switches 53 and the output switches 54. If the control unit 71 is designed on the cellular chip 70, the decoder unit may be omitted. The internal switches 41 , the input switches 53 and the output switches 54 are preferably controlled CMOS switches.
The control unit 71 executes a program stored in the memory 81. This program usually includes a larger cycle, which has to be carried out in each frame stripe. Consequently, the period of such a cycle is equal to the following: the number of lines in the cellular chip 70, multiplied by the line period of the video line.
As described above, the video signal processing computer according to the invention is able to receive complete video lines, and collect, process and issue them on a continuous basis. The video signal processing computer can be made suitable for processing video signals coming from different modality cameras (black and white, color RGB, long wave IR, image intensified etc.), compiling them in an intelligent way, and the output may also be of multi-channel type, for example color RGB.
By using one or more cellular chips according to the invention, a stripe structured video signal processing computer can be built. The SBC, ABC and PABC cellular chip arrangements described as preferred embodiments eliminate the problems arising from the lack of overlaps on the one hand, and on the other they can be built cost efficiently on a silicon basis. The joint 'inter frame' processing of consecutive frames as described above is also possible by stripe structured cellular chips, which also allow to carry out image processing tasks in time, e.g. motion detection.
The video signal processing computer based on the cellular chip according to the invention may also be of a PCMCI size, and its computational output could be several hundred times higher than the current highest capacity digital signal processing systems (e.g. TMS320C6x).
The video signal processing computer, cellular chip and method according to the invention may be applied according to the description above not only in the cellular network used in the described embodiments, but in any suitable cellular network, for example in the convolution neural network described in the introduction, in a network having general analog processor cores or in a cellular network implemented with a two-dimensional resistive grid.
For those skilled in the art it is obvious that the embodiments described above are only to be considered as examples, and various modifications and variants can be designed within the scope of the invention defined by the following claims.

Claims

Claims
1. A video signal processing computer for converting one or more input video signals into one or more output video signals, the computer comprising a cellular network consisting of a plurality of programmable cells, each of the programmable cells comprising a local processor core and local analog memory elements attached thereto via controllable internal switches, wherein the video signal processing computer comprises a control unit for controlling the local processor cores and the internal switches, c h a r a c t e r i z e d in that each of the programmable cells (30) is assigned to a respective pixel of a horizontal frame stripe (20) extending over a number of video lines, wherein the stripe (20) comprises a useful area (21) and an overlapping area (22, 23), the local analog memory elements comprise input memory elements connected via controllable input switches (53) to conductor lines (51) carrying the input video signals, and output memory elements connected via controllable output switches (54) to conductor lines (52) carrying the output video signals, wherein stripes (20) located one below and overlapping one another in the overlapping areas (22, 23) are processed successively in such a way that by means of the control of the input switches (53), input analog values associated with the pixels of the stripe (20) are read in by sampling into the input memory elements of the respective programmable cells (30), by means of the control of the internal switches (41), after reading in an entire stripe (20), an image processing operation is performed by the cellular network on the input analog values of the stripe (20), the image processing operation resulting output analog values associated with the pixels, and the output analog values are loaded into the output memory elements of the respective programmable cells (30), and then by means of the control of the output switches (54), by performing a scheduled read-out of the output analog values associated with the pixels in the useful area (21), a part of the output video signals is generated.
2. The video signal processing computer according to claim 1, characterized in that the input memory elements and the output memory elements are organized in memory banks, wherein the programmable cells (30) corresponding to the pixels in the useful area (21) comprise at least two input memory banks and at least two output memory banks, and the programmable cells (30) corresponding to the pixels in the overlapping area (22, 23) comprise at least two input memory banks and at least one output memory bank, wherein each input memory bank has input memory elements in a number corresponding to that of the input video signals and wherein each output memory bank has output memory elements in a number corresponding to that of the output video signals, and wherein the programmable cells (30) comprise an internal bus (40) to which the local processor cores (32) and the memory elements in the memory banks are connected via the internal switches (41).
3. The video signal processing computer according to claim 1, characterized in that the stripe (20) comprises a useful area (21) extending over a number of the video lines and a lower overlapping area (23) below the useful area (21 ), the lower overlapping area (23) extending over one or more video lines.
4. The video signal processing computer according to claim 3, characterized in that the stripe (20) comprises an upper overlapping area (22) above the useful area (21), the upper overlapping area (22) extending over one or more video lines.
5. The video signal processing computer according to claim 3, characterized in that the stripe (20) comprises a storing area (24), which extends over the video line above the useful area (21) and includes one or more lines of memory elements.
6. The video signal processing computer according to claim 3, characterized in that the stripe (20) comprises accumulator lines (25) in a number corresponding to that of the input video signals, the accumulator lines (25) being located below the lower overlapping area (23), wherein the accumulator lines (25) store analog values obtained by weighted addition of one or more lines of a boundary area (26) below the lower overlapping area (23).
7. The video signal processing computer according to any of claims 3 to 6, characterized in that the stripe (20) covers entire video lines of a frame.
8. The video signal processing computer according to any of claims 3 to 6, characterized in that the stripe (20) is segmented into overlapping sections located side by side, and that the video signal processing computer (10) comprises cellular networks in a number corresponding to that of the sections of the stripe (20), wherein each cellular network processes a respective section.
9. The video signal processing computer according to claim 2, characterized in that the programmable cells (30) comprise further local analog memory elements (36) used in the image processing operation as well as a local analog output unit (34) for performing arithmetic operations, which further local analog memory elements (36) and the local analog output unit (34) are connected to the internal bus (40) via the internal switches (41).
10. The video signal processing computer according to claim 2, characterized in that the programmable cells (30) comprise local logic memory elements (37) used in the image processing operation as well as a local logic unit (35) for carrying out logic operations, wherein the local logical memory elements (37) and the local logical unit (35) are connected to the internal bus (40) via the internal switches (41).
11. The video signal processing computer according to claim 1 , characterized by comprising a video input module (72) for pre-processing standard video signals to generate monochromatic luminance signals as input video signals for the cellular network, and a video output module (73) for generating standard processed video signals by post-processing the output video signals generated as monochromatic luminance signals, wherein the video input module (72) and the video output module (73) are controlled by the control unit (71).
12. The video signal processing computer according to claim 11 , characterized by comprising an analog memory controlled by the control unit (71), the analog memory storing analog values associated with the pixels of the frame preceding the frame actually processed, and inputting these analog values to the cellular network for image processing.
13. The video signal processing computer according to claim 11 , characterized by comprising an analog memory controlled by the control unit (71), the analog memory feeding back at least one part of the one or more output video signals as input video signals to the cellular network.
14. A cellular chip for a video signal processing computer converting one or more input video signals into one or more output video signals, the cellular chip comprising a cellular network formed by a plurality of programmable cells, each of the programmable cells comprising a local processor core, local analog memory elements and an internal bus connecting the local analog memory elements with the local processor core, wherein the local processor core and the local analog memory elements are connected to the internal bus via controllable internal switches, c h a r a c t e r i z e d in that the programmable cells (30) forming the cellular network are arranged in a stripe (20) consisting of several lines, which stripe (20) comprises a useful area (21) and an overlapping area (22, 23), the programmable cells (30) in the useful area (21) comprise at least two input memory banks and at least two output memory banks, the programmable cells (30) in the overlapping area (22, 23) comprise at least two input memory banks and at least one output memory bank, wherein each input memory bank comprises input memory elements in a number corresponding to that of the input video signals, the input memory elements storing sampled values of the input video signals, and each output memory bank comprises output memory elements in a number corresponding to that of the output video signals, the output memory elements storing calculated sample values of the output video signals, and wherein the input memory elements are connected via controllable input switches (53) to conductor lines (51) of the input video signals, and the output memory elements are connected via controllable output switches (54) to conductor lines (52) of the output video signals.
15. The cellular chip according to claim 14, characterized in that the local processor cores (32), the internal switches (41), the input switches (53) and the output switches (54) are controlled by a digital control unit (71).
16. The cellular chip according to claim 15, characterized in that the programmable cells (30) comprise further local analog memory elements (36) and a local analog output unit (34) for carrying out arithmetic operations with analog values stored in the local analog memory elements (36), the further local analog memory elements (36) and the local analog output unit (34) being connected to the internal bus (40) via internal switches (41) controlled by the control unit (71).
17. The cellular chip according to claim 15, characterized in that the programmable cells (30) comprise local logic memory elements (37) and a local logic unit (35) for carrying out logic operations with logical values stored in the local logical memory elements (37), the local logical memory elements (37) and the local logical unit (35) being connected to the internal bus (40) via internal switches (41) controlled by the control unit (71).
18. The cellular chip according to claim 14, characterized in that the stripe (20) comprises a useful area (21) extending over several rows of the programmable cells and a lower overlapping area (23) below the useful area (21), the lower overlapping area (23) extending over one or more rows of the programmable cells.
19. The cellular chip according to claim 18, characterized in that the stripe (20) comprises an upper overlapping area (22) extending over one or more rows of the programmable cells above the useful area (21).
20. The cellular chip according to claim 18, characterized by having a storing area (24) comprising one or more memory lines above the useful area (21).
21. The cellular chip according to claim 18, characterized in that the stripe (20) comprises one or more accumulator lines (25) below the lower overlapping area (23).
22. A method for converting one or more input video signals into one or more output video signals, wherein a cellular network comprising a plurality of programmable cells is used for the conversion, and each of the programmable cells comprises a local processor core and local memory elements, c h a r a c t e r i z e d in that the input video signals are processed in stripes one below and overlapping one another and extending over a number of video lines, the stripe comprising pixels in a useful area not overlapping the neighboring stripe and pixels in an overlapping area overlapping with the neighboring stripe, wherein the cellular network comprises one programmable cell for each of the pixels in the stripe, and each of the programmable cells is assigned to the same respective pixel for each stripe, the programmable cells comprising local input and output memory elements alternately assigned to the successive stripes, wherein input analog values associated with each pixel in a given stripe and obtained by a scheduled sampling of the input video signals are stored in the local input memory elements assigned to the given stripe of the programmable cell assigned to the pixel in the given stripe, after storing the input analog values of an entire stripe, an image processing operation is carried out by the cellular network on the input analog values stored in the local input memory elements assigned to the given stripe, the image processing operation resulting output analog values associated with the pixels, and in each respective programmable cell, the output analog values are loaded into the local output memory elements which are assigned to the stripe, and then by a scheduled read-out of the output analog values from the programmable cells assigned to the pixels in the useful area of the given stripe, the given stripe associated part of the output video signals are generated.
23. The method according to claim 22, characterized in that the cellular network comprises in each of the programmable cells at least two input memory banks assigned alternately to the stripes one below the other and consisting of local input memory elements, and in the programmable cells corresponding to the pixels in the useful area at least two output memory banks consisting of local output memory elements and alternately assigned to stripes one below the other, wherein the input analog values of a given stripe are stored in the input memory banks assigned to the given stripe of the programmable cells assigned to the respective pixels in the given stripe, and the output analog values are stored in the output memory banks assigned to the given stripe of the programmable cells assigned to the respective pixels.
24. The method according to claim 23, characterized in that the local input memory elements in the input memory banks are connected via controllable input switches to the conductor lines carrying the input video signals, and the scheduled sampling is carried out by controlling the input switches.
25. The method according to claim 23, characterized in that the local input memory elements in the input memory banks and the local output memory elements in the output memory banks are connected via controllable internal switches with the local processor cores, and the image processing operation, the supply of the input analog values and the loading into the output memory banks are carried out by controlling the internal switches.
26. The method according to claim 23, characterized in that the local output memory elements in the output memory banks are connected via controllable output switches to the conductor lines carrying the output video signals and the scheduled read-out is carried out by controlling the output switches.
27. The method according to claim 23, characterized in that in the programmable cells corresponding to the pixels in the overlapping area, there is at least one output memory bank comprising local output memory elements.
28. A method for converting one or more input video signals into one or more output video signals, wherein the video signals represent frames consisting of video lines, c h a r a c t e r i z e d in that the one or more input video signals are processed in stripes one below and overlapping one another and extending over a number of video lines, the stripe comprising pixels in a useful area not overlapping the neighboring stripe and pixels in an overlapping area overlapping the neighboring stripe, wherein by a scheduled sampling of the one or more input video signals, input analog values associated with the pixels of a first stripe are stored in a first memory unit assigned to the stripe, and after storing the input analog values of an entire stripe, an image processing operation is carried out, the image processing operation resulting output analog values associated with the pixels, the output analog values are loaded into a second memory unit assigned to the stripe, and then the output analog values associated with the pixels in the useful area of the stripe are read out in a scheduled way from the second memory unit, thereby generating a part of one or more output video signals.
29. The method according to claim 28, characterized in that a cellular network is used for the image processing operation, the cellular network comprising programmable cells in a number corresponding to that of the pixels in the stripe, wherein each of the programmable cells is assigned to the same respective pixel for each stripe.
30. The method according to claim 29, characterized in that the first memory unit comprises at least two input memory banks in the programmable cells, the input memory banks consisting of local input memory elements and being alternately assigned to the stripes one below the other, the second memory unit comprises at least two output memory banks in the programmable cells corresponding to the pixels in the useful area, the output memory banks consisting of local output memory elements and alternately assigned to the stripes one below the other, wherein the input analog values associated with a pixel in a given stripe are stored in the input memory bank, which is assigned to the given stripe, of the programmable cell assigned to the pixel in the given stripe, and the output analog values are stored in the output memory bank, which is assigned to the given stripe, of the programmable cell assigned to the pixel in the given stripe.
PCT/HU2001/000113 2000-11-15 2001-11-15 Video signal processing computer, cellular chip and method WO2002041246A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2002218423A AU2002218423A1 (en) 2000-11-15 2001-11-15 Video signal processing computer, cellular chip and method

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
HUP0004500 2000-11-15
HU0004500A HUP0004500A2 (en) 2000-11-15 2000-11-15 Computer for processing video signal, cellular chip and method for converting video signals

Publications (2)

Publication Number Publication Date
WO2002041246A2 true WO2002041246A2 (en) 2002-05-23
WO2002041246A3 WO2002041246A3 (en) 2002-07-25

Family

ID=89978763

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/HU2001/000113 WO2002041246A2 (en) 2000-11-15 2001-11-15 Video signal processing computer, cellular chip and method

Country Status (3)

Country Link
AU (1) AU2002218423A1 (en)
HU (1) HUP0004500A2 (en)
WO (1) WO2002041246A2 (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5355528A (en) * 1992-10-13 1994-10-11 The Regents Of The University Of California Reprogrammable CNN and supercomputer

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5355528A (en) * 1992-10-13 1994-10-11 The Regents Of The University Of California Reprogrammable CNN and supercomputer

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
CARMONA R ET AL: "A 0.5 /spl mu/m CMOS CNN analog random access memory chip for massive image processing" , CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS PROCEEDINGS, 1998 FIFTH IEEE INTERNATIONAL WORKSHOP ON LONDON, UK 14-17 APRIL 1998, NEW YORK, NY, USA,IEEE, US, PAGE(S) 271-276 XP010287725 ISBN: 0-7803-4867-2 cited in the application the whole document *
EL-SHAFEI A A H ET AL: "A time-multiplexing simulator for cellular neural network" , CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS PROCEEDINGS, 1998 FIFTH IEEE INTERNATIONAL WORKSHOP ON LONDON, UK 14-17 APRIL 1998, NEW YORK, NY, USA,IEEE, US, PAGE(S) 224-229 XP010287735 ISBN: 0-7803-4867-2 *
PINEDA DE GYVEZ J ET AL: "Large-image CNN hardware processing using a time multiplexing scheme" , CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS, 1996. CNNA-96. PROCEEDINGS., 1996 FOURTH IEEE INTERNATIONAL WORKSHOP ON SEVILLE, SPAIN 24-26 JUNE 1996, NEW YORK, NY, USA,IEEE, US, PAGE(S) 405-410 XP010210282 ISBN: 0-7803-3261-X *
RADVANYI A G ET AL: "A CNN solution for depth estimation from binocular stereo imagery" , CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS PROCEEDINGS, 1998 FIFTH IEEE INTERNATIONAL WORKSHOP ON LONDON, UK 14-17 APRIL 1998, NEW YORK, NY, USA,IEEE, US, PAGE(S) 218-223 XP010287690 ISBN: 0-7803-4867-2 paragraph [0004] *
SLOT K ET AL: "Cellular neural network based VLSI architecture for image processing" , CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS, 1996. CNNA-96. PROCEEDINGS., 1996 FOURTH IEEE INTERNATIONAL WORKSHOP ON SEVILLE, SPAIN 24-26 JUNE 1996, NEW YORK, NY, USA,IEEE, US, PAGE(S) 249-254 XP010210256 ISBN: 0-7803-3261-X *

Also Published As

Publication number Publication date
HU0004500D0 (en) 2001-02-28
HUP0004500A2 (en) 2002-06-29
WO2002041246A3 (en) 2002-07-25
AU2002218423A1 (en) 2002-05-27

Similar Documents

Publication Publication Date Title
EP3471392B1 (en) Panoramic camera and photographing method thereof
CA1254659A (en) Programmed implementation of real-time multiresolution signal processing apparatus
US6069351A (en) Focal plane processor for scaling information from image sensors
US4942470A (en) Real time processor for video signals
US5428390A (en) Apparatus and method for focal plane zoom and pan
US7830431B2 (en) Simultaneous readout of CMOS APS imagers
JP4993808B2 (en) Method and circuit for embedded processing of image data in an image reading device
CN108833812A (en) A kind of imaging sensor and Image Motional Information processing method
KR20070031805A (en) Image processing apparatus and method for image resizing matching data supply speed
US7212237B2 (en) Digital camera with electronic zooming function
US6762792B1 (en) Digital still camera
US20050068426A1 (en) Imaging device
KR20220030877A (en) Image sensor employing varied intra-frame analog binning
US6786411B2 (en) Image sensor array readout for simplified image compression
US20010043760A1 (en) Image interpolation apparatus
WO2002041246A2 (en) Video signal processing computer, cellular chip and method
JP3920659B2 (en) AF evaluation value calculation device
EP1870811B1 (en) Line memory packaging apparatus and television receiver
US6762799B2 (en) Image data synchronizing apparatus and method
KR20040045926A (en) Prefetching of pixel-data using a line buffer approach with variable sampling patterns
JP2002526993A (en) Method and apparatus for correcting defective pixels of an image sensor
WO2022255493A1 (en) Imaging device, imaging method, and imaging program
US20060098012A1 (en) Apparatus and method for processing image with reduced memory capacity and power consumption
JP3972645B2 (en) Solid-state image sensor
JPS63174185A (en) Input/output timing generating circuit for partial image data

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PH PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
121 Ep: the epo has been informed by wipo that ep was designated in this application
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP