CN103902574A - Real-time data loading method and device based on data flow technology - Google Patents

Real-time data loading method and device based on data flow technology Download PDF

Info

Publication number
CN103902574A
CN103902574A CN201210578948.6A CN201210578948A CN103902574A CN 103902574 A CN103902574 A CN 103902574A CN 201210578948 A CN201210578948 A CN 201210578948A CN 103902574 A CN103902574 A CN 103902574A
Authority
CN
China
Prior art keywords
real
data
time
operator
expression formula
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201210578948.6A
Other languages
Chinese (zh)
Inventor
郭向红
尤新霞
庞哲翀
郭翔宇
孙颖飞
王波
张大亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Group Inner Mongolia Co Ltd
Original Assignee
China Mobile Group Inner Mongolia Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Group Inner Mongolia Co Ltd filed Critical China Mobile Group Inner Mongolia Co Ltd
Priority to CN201210578948.6A priority Critical patent/CN103902574A/en
Publication of CN103902574A publication Critical patent/CN103902574A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses

Abstract

The invention discloses a real-time data loading method based on a data flow technology. The method comprises the steps that a data flow operator set is set; a plurality of operators are selected from the operator set according to service demands to generate a real-time query processing expression of data flow; the real-time loading network of the data flow is set according to the real-time query processing expression; the data flow entering in the real-time loading network is driven and controlled, and buffer memory space is dynamically distributed for the data flow; result data flow is output and is loaded to a real-time data warehouse. The invention further discloses a real-time data loading device based on the data flow technology. The method and device are used for achieving real-time conversion and continuous loading of data, and the real-time loading network can be established in real time.

Description

A kind of real time data loading method and device based on data flow technique
Technical field
The present invention relates to data management technique field, relate in particular to real time data loading method and device based on data flow technique in a kind of Real-time Data Warehouse.
Background technology
Real-time Data Warehouse is that market changes fast and the new branch of the data warehouse technology that Real-Time Enterprise management decision occurs in order to adapt to.Real-time Data Warehouse has been expanded the ability in traditional data warehouse, makes the development of data warehouse after having experienced form stage, analysis phase and forecast period, has entered into the Real-time Decision stage.Real-time Data Warehouse is in order to meet the demand of business application to real time data and active decision, needs the variation that occurs in real-time capture-data source, and makes a policy according to the rule setting in advance, thereby the timely loading of data source data has been proposed to new requirement.Real time data loading technique need to shorten to data source one from common one month once, weekly or once a day by the data pick-up cycle fresh data, just can load in real time and be integrated in data warehouse, this frequency that data are loaded has improved greatly, also traditional data warehouse load capability has been proposed to challenge.
Data warehouse has multiple load mode at present, as: based on the load mode of database script technology, extract, change and load (Extraction-Transformation-Loading based on data, ETL) load mode of technology, integrate (Enterprise Application Integration based on enterprise's application, EAI) load mode of technology, and load mode based on changing data capture (Change Data Capture, CDC) technology.
Wherein, described database script technology is the set for the statement of creation database object, in the time that database is processed, and the file that conventionally it to be saved as such as .sql be suffix.At present, the number of patent application that utilizes database script technology is 200710000916.7, denomination of invention is " data base script generalization is realized system and method ", its implementation method providing can improve development efficiency and save maintenance time, but the weak point of database script technology is: on loading frequency, do not have continuity, postpone also greatlyr, and be also difficult to guarantee the consistance of data.
Described ETL technology is responsible for the data in distribution, heterogeneous data source, as: relation data, flat data file etc. is drawn into interim middle layer, then clean, conversion, integrated, be finally loaded in data warehouse or Data Mart, become the basis of on-line analytical processing, data mining.Use the benefit of ETL technology to be that it is to realize the ideal solution that large-scale data loads, support to multiple data sources and powerful data-switching ability can be provided, and can process well isomeric data.As the topmost Data Extraction Technology in traditional data warehouse, at present carry out the patent of invention of data pick-up based on ETL technology a lot, for example: number of patent application is 200610041433.7, the technical scheme that denomination of invention is " method providing is provided metering data a kind of ETL of utilization technology "; The patent No. is 200910203276.9, and denomination of invention is the technical scheme of " a kind of implementation method and device of ETL scheduling "; And the patent No. is 200910137527.8, the technical scheme that denomination of invention is " a kind of method and system that realize ETL scheduling " etc.The main target that these technical schemes realize is that ETL technology is adjusted, and makes it for specific field, as metering data; Or effectively promote the execution efficiency of ETL scheduling flow.But there is following defect in this class technology: data loading frequency does not have continuity, delay is larger, and during ETL tasks carrying, be that default data source does not change, and the data that are just being loaded can not be used for inquiry and analysis processing.
Described EAI technology is based on various different platforms, different schemes and the heterogeneous applications integrated technology of setting up.EAI contacts by setting up fabric the heterogeneous system, application, data source etc. that traverse whole enterprise, completes seamless the sharing and exchanges data between Enterprise Resources Plan (ERP) system, customer relation management (CRM) system, supply chain management (SCM) system, database, data warehouse and other the important built-in system in enterprises.EAI technology is supplemented and expansion as ETL technology, also has application in a lot of fields.Utilize at present the technical scheme of EAI technology to have: number of patent application is 200810110883.6, denomination of invention is " EAI submodule of digital logistic management system " etc., this class technical scheme is mainly to utilize EAI submodule to realize unified logistics information data warehouse, accelerates the communication of logistics information and shares.But EAI technology is subject to the restriction of data scale conventionally, although can complete and carry out continuous Data dissemination between origin system and goal systems, but can only carry out basic data-switching, on the structure of data load networks, lack dirigibility, need to ad hoc develop loading processing for different analysis platforms.
Described CDC technology is a kind of capturing technology of delta data, and it is acted on behalf of, delta data service, changed the module such as distribution mechanisms and realize efficient real-time data integration by change capture.CDC technology, as incipient new data load mode in recent years, receives increasing concern and application.At present, utilize the technical scheme of CDC technology to have: number of patent application is 200910018202.8, denomination of invention is " the change data pick-up method realizing based on OracleCDC technology " etc., the main target that these technical schemes realize be guarantee information directly between source and target the most effectively, in path, circulate the most fast.But CDC technology has module composition and the processing logic of more complicated, drop into a large amount of manpower financial capacities so develop this Technology Need, and such technology also can only guarantee the most basic data-switching.
The feature of above-mentioned various data load modes is as shown in table 1 below:
Attribute Script ETL EAI CDC
Data volume Medium Very high Low High
Frequency Intermittent Intermittent Continuity Continuity
Postpone In by the time high In by the time high Low Low
Data consistency Nothing Nothing Guarantee Guarantee
Conversion Moderate Senior Substantially Substantially
Processing expenditure High High Medium Low
Table 1
From table 1, be not difficult to find, although above-mentioned various load mode has plurality of advantages and purposes, these methods can only be carried out periodicity and be loaded or load in batches, fail to realize the flexible structure of load networks.
Summary of the invention
In view of this, fundamental purpose of the present invention is to provide a kind of real time data loading method and device based on data flow technique, can realize the real-time conversion of data and load continuously, and can build flexibly real-time loading network.
For achieving the above object, technical scheme of the present invention is achieved in that
The invention provides a kind of real time data loading method based on data flow technique, data flow operations operator collection is set; The method also comprises:
Concentrate the real-time query processing expression formula of choosing several operator generated data streams from operator according to service needed; The real-time loading network of data stream is set according to described real-time query processing expression formula; Drive and control the data stream entering in real-time loading network, and be data stream dynamic assignment spatial cache; Output rusults data stream, and be loaded in Real-time Data Warehouse.
Wherein, described operator is concentrated and is comprised: basic operator and composition operators;
Wherein, described basic operator is the essential operator that carries out data cleansing and conversion, the combination operator that described composition operators is basic operator.
Wherein, the described real-time loading network that data stream is set according to described real-time query processing expression formula, for:
Each data source in corresponding real-time query processing expression formula, builds network input node;
By each composition operators and Operators in real-time query processing expression formula, be converted into the expression formula that obtains the basic operator of semantic equivalence with composition operators and Operators;
Each basic operator in described basic operator expression formula after corresponding conversion builds basic operator node in network;
Build unique network output node at network end-point;
Between each node, limit is set by logical process order.
The method also comprises:
Be called described input node name with the name of data content in data source.
Further, the method also comprises:
Common as described basic operator node name take the processing logic of the title of basic operator, basic operator and the title of the basic handled data stream of operator node.
Wherein, described driving control enters the data stream in real-time loading network, and is data stream dynamic assignment spatial cache, for:
The storage organization of each tuple in data stream is set, and the spatial cache of data stream is carried out to initialization; Drive the data of controlling in real-time loading network; For data stream dynamic assignment spatial cache.
Wherein, the data in described driving control real-time loading network, for:
The corresponding node of starting point that every limit is set is the producer of stream spatial cache corresponding to this limit, for operation result being input to the correspondence position of stream spatial cache; The node corresponding to terminal on every limit is the consumer of stream spatial cache, carry out computing and form result tuple for extract tuple from stream spatial cache, and input using this result tuple as follow-up computing.
Wherein, described is data stream dynamic assignment spatial cache, for:
In the time producing a new tuple in real-time loading network, it is described new tuple application spatial cache; In the time deleting a tuple, discharge immediately the shared spatial cache of deleted tuple.
The present invention also provides a kind of real time data charger based on data flow technique, and this device comprises: module and operation module are set; Wherein,
The described module that arranges, for data flow operations operator collection is set, and concentrates the real-time query processing expression formula of choosing several operator generated data streams from described operator according to service needed; The real-time loading network of data stream is set according to described real-time query processing expression formula;
Described operation module, enters for driving to control the data stream that the set real-time loading network of module is set, and is data stream dynamic assignment spatial cache; Output rusults data stream, and be loaded in Real-time Data Warehouse.
Further, the described module that arranges further comprises: operator collection arranges module, real-time query processing expression formula arranges module and real-time loading network settings module; Wherein,
Described operator collection arranges module, for data flow operations operator collection is set;
Described real-time query processing expression formula arranges module, for according to service needed, the concentrated real-time query processing expression formula of choosing several operator generated datas streams of operator of module setting is set from described operator collection;
Described real-time loading network settings module, arranges the real-time loading network of data stream for the real-time query processing expression formula of module setting is set according to described real-time query processing expression formula.
Real time data loading method and device based on data flow technique provided by the invention, arranges data flow operations operator collection; Concentrate the real-time query processing expression formula of choosing several operator generated data streams from operator according to service needed; The real-time loading network of data stream is set according to described real-time query processing expression formula; Drive and control the data stream entering in real-time loading network, and be data stream dynamic assignment spatial cache; Output rusults data stream, and be loaded in Real-time Data Warehouse.The present invention is provided with data flow operations operator collection for the conversion and the filtration that realize data stream, based on the closure of various dataflow computing, the real-time loading network being provided with based on data flow technique is realized the processing to data stream, can corresponding real-time loading network be set flexibly according to different business demand, method to set up has universality.
In addition, the present invention drives and controls the data stream entering in real-time loading network, and is data stream dynamic assignment spatial cache, therefore can significantly reduce data and overstock the problem of the loading time delay causing, can efficiently complete conversion and the filtration of data, guarantee the continuous loading of data; That is to say, the accessible data volume of the present invention is large, and loading frequency is continuous, loads delay low, can more effectively improve the freshness of data in Real-time Data Warehouse.
Accompanying drawing explanation
Fig. 1 is the realization flow schematic diagram that the present invention is based on the real time data loading method of data flow technique;
Fig. 2 is the storage organization schematic diagram of tuple of the present invention;
Fig. 3 is the structural representation that the present invention flows spatial cache;
Fig. 4 is the structural representation of real-time loading network in the embodiment of the present invention;
Fig. 5 is the structural representation that the present invention is based on the real time data charger of data flow technique.
Embodiment
The present invention utilizes the processing capability in real time of data stream to improve the freshness that Data Warehouse loads, the further responding ability of enterprise to high real-time data, and its basic thought is;
Data flow operations operator collection is set; Concentrate the real-time query processing expression formula of choosing several operator generated data streams from operator according to service needed; The real-time loading network of data stream is set according to described real-time query processing expression formula; Drive and control the data stream entering in real-time loading network, and be data stream dynamic assignment spatial cache; Output rusults data stream, and be loaded in Real-time Data Warehouse.
Below in conjunction with drawings and the specific embodiments, the present invention is described in further detail.
Fig. 1 is the realization flow schematic diagram that the present invention is based on the real time data loading method of data flow technique, and as shown in Figure 1, the realization flow of the method is as follows:
Step 101: data flow operations operator collection is set;
Be specially: by building complete data flow operations operator collection, to complete the operation that may need data stream execution.Data stream is to follow the tuple sequence that a kind of pattern arrives, the list of for example conversing.Data stream S of the present invention can be expressed as S={Schema by two tuples, Tuples}.Wherein, described Schema=<A 1, A 2..., A n, T>, Tuples=<V 1, V 2..., V n, V t>.Schema represents the pattern of data stream S, and it is by attribute A1 ..., the sequence that An and T form, described T is a special timestamp attribute.In the situation that not causing ambiguity, can carry out with data stream S the pattern of direct representative of data flow.Described Tuples represents in data stream at each moment V tthe sequence of the tuple arriving, V i(1≤i≤n) is that tuple is at attribute V ion value.
In the present invention, the concentrated operator of described data flow operations operator is divided into two kinds, comprising: basic operator and composition operators.Basic operator has been the essential operator of data cleansing and conversion, and described basic operator comprises existing Operators; Composition operators is the encapsulation being arranged on basic operator, and complicated semantic for processing easily, composition operators can be converted into the combination between basic operator.Provide fundamental operation definition, function and the corresponding actual demand example of several basic operators below, as shown in table 2:
Figure BDA00002666978300071
Figure BDA00002666978300081
Table 2
Certainly, described in the present invention, basic operator comprises Operators of the prior art.
On the basis of above-mentioned basic operator, can be by common business demand be combined to form in order composition operators by basic operator.This is that the i.e. result to dataflow computing or data stream, so can apply new computing on operation result again because the basic operator of data stream has closure.These new computings do not strengthen ability to express, because they can be expressed by the compound of basic operator; But composition operators that some are often used are defined as Operators or basic operator can simplify the expression of application.Provide several frequently seen composition operators below, as shown in table 3:
Figure BDA00002666978300091
Table 3
Step 102: according to service needed, concentrate the real-time query processing expression formula of choosing several operator generated data streams from operator;
Here, for the demand of real time business, concentrate and choose several suitable operators from data stream operator, the real-time query processing expression formula of generated data stream, the cleaning of expression data and semantics.
Concrete, first, produce the processing logic of respective nodes according to the definition of basic operator, for the composition operators of user's setting, need in respective nodes, be written into the processing logic of the composition operators of user's setting; For Operators in prior art, can first be converted into the computing of basic operator or the computing of the composition operators that user arranges, then determine its operation logic according to compound operation.
Secondly,, for the demand of real time business, according to basic operators set and packaged composition operators set, choose appropriate operator for the cleaning of complete paired data and the expression of conversion logic, the real-time query processing expression formula of generation based on the processing of data stream operator.
Step 103: the real-time loading network that data stream is set according to described real-time query processing expression formula;
The real-time loading network operation that arranges of the present invention is in order to realize cleaning and the conversion to data stream, and the network of a data streams is set according to real-time query processing expression formula.According to the closure of dataflow computing, the result of every kind of computing remains the form of data stream, can be used as the input of another kind of computing.Specifically comprise the following steps:
Step 1031: each data source in corresponding real-time query processing expression formula, builds network input node, with this node of name nominating of data content in data source;
Step 1032: by each composition operators and Operators in real-time query processing expression formula, be converted into the basic operator expression formula with its semantic equivalence;
So far, all operators in real-time query processing expression formula are basic operator;
Step 1033: each the basic operator in the described basic operator expression formula after transforming in corresponding step 1032, in network, build basic operator node, described basic operator node is named jointly with the title of this basic operator, processing logic and the handled stream name thereof of basic operator;
Step 1034: build unique network output node at network end-point;
Step 1035: between each node, limit is set by logical process order;
Concrete, for all nodes in the real-time loading network having arranged, if operator A is output as input with operator B, increase the directed edge to operator A institute corresponding node by node corresponding to operator B; In the network completing at structure, if operator A is the computing of n order, in network, the in-degree of A corresponding node is n so, and out-degree is 1.
Step 104: drive and control the data stream entering in real-time loading network, and be data stream dynamic assignment spatial cache;
Here, described in enter real-time loading network data in the time flowing through each node, all need to be on this node through the processing of the specified operation logic of the corresponding operator of node, result, by the next node flowing in network, finally arrives outlet.Specifically comprise the following steps:
Step 1041: the storage organization that each tuple in data stream is set;
Concrete, as the elementary cell of data stream, each tuple has been described a value sequence following pattern of traffic in particular moment.Wherein time stamp T is to use attribute very frequently, so be placed on first-in-chain(FIC).Attribute pointer is the pointer that points to attribute list, can find the value of respective tuple on all properties by attribute pointer.The storage organization of described tuple as shown in Figure 2.
Step 1042: the spatial cache of data stream is carried out to initialization; The directed edge of the spatial cache of data stream between two compute node, as shown in Figure 3, tuple M to the spatial cache of tuple N respectively between two compute node as producers and consumers.
Step 1043: drive the data of controlling in real-time loading network;
Concrete, the starting point institute corresponding node that every limit is set is the producer of stream spatial cache corresponding to this limit, is responsible for its operation result to be input to the appropriate position in stream spatial cache; The node that terminal is corresponding is the consumer of stream spatial cache, is responsible for taking out tuple from stream spatial cache and operates and form result tuple, and input using this result as follow-up computing.
Step 1044: be data stream dynamic assignment spatial cache;
In order to utilize as much as possible all free spaces in whole real-time loading network, in real-time loading network, every corresponding each stream spatial cache in limit is dynamic assignment according to actual needs, being in network when a new tuple of every generation, is its application spatial cache; When a tuple of every deletion, discharge its shared spatial cache at once.
Step 105: Output rusults data stream, and be loaded in Real-time Data Warehouse;
Be specially: obtain the data in the cushion space of real-time loading network output node, be the data stream through real-time loading network switch and after filtering, can be loaded into continuously in data warehouse.Because the version of each Real-time Data Warehouse is different with supported actual loaded mode, can process respectively according to the situation in real data warehouse.
Adopt the example of a concrete communication service that the implementation procedure of real time data loading method is described below.Suppose to have three input traffics, be respectively: call menu Calllist, account information Account and geography information Location, its pattern is respectively:
Calllist(IMEI,Timestamp,Duration,Eventtype,Callid,Item);
Account(IMEI,Area,Balance,Opentime);
Location(IMEI,Dest_CI,Source_CI,Callid,Area)。
In practical application, need to be by same mobile subscriber fixed time section in each stream, as: the call-information in 8 o'clock to 9 o'clock is aggregated into Communication, and to understand each user's call type, remaining sum and call destination, the pattern of Communication is:
Communication(IMEI,Eventtype,Balance,Dest_CI)。
If International Mobile Equipment Identity code (IMEI) the attribute value of same user in three inlet flow patterns is identical, that just can select Nature Link and project to reach this object, and performing step is as follows:
Step 1: data flow operations operator collection is set.
The operator that relates to this business comprises basic operator ∏ L (S) and existing Operators
Figure BDA00002666978300121
Wherein, Operators
Figure BDA00002666978300122
can be converted into basic operator σ s1[S1 ∩ S2]=S2[S1 ∩ S2]with basic operator S 1<t1, t2>× S 2<t3, t4>compound.
Step 2: determine the real-time query processing expression formula that finishing service is required according to basic operator and Operators collection.
Data stream real-time query processing expression formula corresponding to this business demand is:
Figure BDA00002666978300131
&Pi; IMEI , Dest _ CI ( Location < 8 : 00,9 : 00 > ) ) .
Step 3: the real-time loading network that data stream is set.
The data source Callist relating in corresponding real-time query processing expression formula, Account and Location set up respectively data input node, Callist as shown in Figure 4, tri-nodes of Account and Location.The composition operators and the Operators that relate in real-time query processing expression formula are changed into basic operator, basic operator to each appearance arranges corresponding network processing node, it is the dark node in Fig. 4, and add corresponding sides connection front and back node by logical process order, set up network output node at network end-point, be Communication node, finally obtain the real-time loading network structure shown in Fig. 4.
Step 4: the data input node Callist of real-time loading network, Account and Location reception sources data stream, data stream sequential flow is through operator node, unblock node, that is: σ calllist.IMEI=Account.IMEInode, ∏ iMEI, Eventtipe, Balance, Dest_CInode and σ calllist.IMEI=Location.IMEInode.Each node, to the each tuple in self buffer zone, upstream, does not do and postpones storage, but processes immediately, and result is outputed in buffer zone, downstream.Block node, that is: Calllist × Account node and
Figure BDA00002666978300133
node buffer memory in buffer zone meets the data of time-constrain, all exists when data cached and does respective handling when blocking node both sides, exports association results to node downstream buffer zone, and removes tuple overtime in buffer zone, upstream.
Step 5: network output node, the content of Communication node buffer zone is the data after real-time loading network switch and filtration, can be loaded into continuously in Real-time Data Warehouse.
The present invention also provides a kind of real time data charger based on data flow technique, and as shown in Figure 5, this device comprises: module and operation module are set; Wherein,
The described module that arranges, for data flow operations operator collection is set, and according to service needed, concentrates the real-time query processing expression formula of choosing several operator generated data streams from described operator; The real-time loading network of data stream is set according to described real-time query processing expression formula;
Described operation module, enters for driving to control the data stream that the set real-time loading network of module is set, and is data stream dynamic assignment spatial cache; Output rusults data stream, and be loaded in Real-time Data Warehouse.
The described module that arranges further comprises: operator collection arranges module, real-time query processing expression formula arranges module and real-time loading network settings module; Wherein,
Described operator collection arranges module, for data flow operations operator collection is set;
Described real-time query processing expression formula arranges module, for according to service needed, the concentrated real-time query processing expression formula of choosing several operator generated datas streams of operator of module setting is set from described operator collection;
Described real-time loading network settings module, arranges the real-time loading network of data stream for the real-time query processing expression formula of module setting is set according to described real-time query processing expression formula.
The above, be only preferred embodiment of the present invention, is not intended to limit protection scope of the present invention.

Claims (10)

1. the real time data loading method based on data flow technique, is characterized in that, data flow operations operator collection is set; The method also comprises:
Concentrate the real-time query processing expression formula of choosing several operator generated data streams from operator according to service needed; The real-time loading network of data stream is set according to described real-time query processing expression formula; Drive and control the data stream entering in real-time loading network, and be data stream dynamic assignment spatial cache; Output rusults data stream, and be loaded in Real-time Data Warehouse.
2. the real time data loading method based on data flow technique according to claim 1, is characterized in that, described operator is concentrated and comprised: basic operator and composition operators;
Wherein, described basic operator is the essential operator that carries out data cleansing and conversion, the combination operator that described composition operators is basic operator.
3. the real time data loading method based on data flow technique according to claim 1 and 2, is characterized in that, the described real-time loading network that data stream is set according to described real-time query processing expression formula, for:
Each data source in corresponding real-time query processing expression formula, builds network input node;
By each composition operators and Operators in real-time query processing expression formula, be converted into the expression formula that obtains the basic operator of semantic equivalence with composition operators and Operators;
Each basic operator in described basic operator expression formula after corresponding conversion builds basic operator node in network;
Build unique network output node at network end-point;
Between each node, limit is set by logical process order.
4. the real time data loading method based on data flow technique according to claim 3, is characterized in that, the method also comprises:
Be called described input node name with the name of data content in data source.
5. the real time data loading method based on data flow technique according to claim 3, is characterized in that, the method also comprises:
Common as described basic operator node name take the processing logic of the title of basic operator, basic operator and the title of the basic handled data stream of operator node.
6. the real time data loading method based on data flow technique according to claim 1 and 2, is characterized in that, described driving control enters the data stream in real-time loading network, and is data stream dynamic assignment spatial cache, for:
The storage organization of each tuple in data stream is set, and the spatial cache of data stream is carried out to initialization; Drive the data of controlling in real-time loading network; For data stream dynamic assignment spatial cache.
7. the real time data loading method based on data flow technique according to claim 6, is characterized in that, the data in described driving control real-time loading network, for:
The corresponding node of starting point that every limit is set is the producer of stream spatial cache corresponding to this limit, for operation result being input to the correspondence position of stream spatial cache; The node corresponding to terminal on every limit is the consumer of stream spatial cache, carry out computing and form result tuple for extract tuple from stream spatial cache, and input using this result tuple as follow-up computing.
8. the real time data loading method based on data flow technique according to claim 6, is characterized in that, described is data stream dynamic assignment spatial cache, for:
In the time producing a new tuple in real-time loading network, it is described new tuple application spatial cache; In the time deleting a tuple, discharge immediately the shared spatial cache of deleted tuple.
9. the real time data charger based on data flow technique, is characterized in that, this device comprises: module and operation module are set; Wherein,
The described module that arranges, for data flow operations operator collection is set, and concentrates the real-time query processing expression formula of choosing several operator generated data streams from described operator according to service needed; The real-time loading network of data stream is set according to described real-time query processing expression formula;
Described operation module, enters for driving to control the data stream that the set real-time loading network of module is set, and is data stream dynamic assignment spatial cache; Output rusults data stream, and be loaded in Real-time Data Warehouse.
10. the real time data charger based on data flow technique according to claim 9, is characterized in that, the described module that arranges further comprises: operator collection arranges module, real-time query processing expression formula arranges module and real-time loading network settings module; Wherein,
Described operator collection arranges module, for data flow operations operator collection is set;
Described real-time query processing expression formula arranges module, for according to service needed, the concentrated real-time query processing expression formula of choosing several operator generated datas streams of operator of module setting is set from described operator collection;
Described real-time loading network settings module, arranges the real-time loading network of data stream for the real-time query processing expression formula of module setting is set according to described real-time query processing expression formula.
CN201210578948.6A 2012-12-27 2012-12-27 Real-time data loading method and device based on data flow technology Pending CN103902574A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210578948.6A CN103902574A (en) 2012-12-27 2012-12-27 Real-time data loading method and device based on data flow technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210578948.6A CN103902574A (en) 2012-12-27 2012-12-27 Real-time data loading method and device based on data flow technology

Publications (1)

Publication Number Publication Date
CN103902574A true CN103902574A (en) 2014-07-02

Family

ID=50993902

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210578948.6A Pending CN103902574A (en) 2012-12-27 2012-12-27 Real-time data loading method and device based on data flow technology

Country Status (1)

Country Link
CN (1) CN103902574A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106921614A (en) * 2015-12-24 2017-07-04 北京国双科技有限公司 Business data processing method and device
CN110413917A (en) * 2019-08-01 2019-11-05 山东浪潮通软信息科技有限公司 A method of realizing web report data caching
CN111241176A (en) * 2019-12-30 2020-06-05 天津浪淘科技股份有限公司 Data management system
CN112115191A (en) * 2020-09-22 2020-12-22 南京北斗创新应用科技研究院有限公司 Branch optimization method executed by big data ETL model

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1953490A (en) * 2006-09-06 2007-04-25 南京中兴软创科技有限责任公司 A method to extract and provide the charging data with the technology of ETL
US20070276851A1 (en) * 2006-05-02 2007-11-29 International Business Machines Corporation System and method for optimizing federated and ETL'd databases having multidimensionally constrained data
CN101221561A (en) * 2007-01-08 2008-07-16 中兴通讯股份有限公司 Data base script generalization implementing system and method
CN101295315A (en) * 2007-04-27 2008-10-29 软件股份公司 Method and database system for executing a xml database query
CN101533417A (en) * 2009-04-28 2009-09-16 阿里巴巴集团控股有限公司 A method and system for realizing ETL scheduling
CN101567013A (en) * 2009-06-02 2009-10-28 阿里巴巴集团控股有限公司 Method and apparatus for implementing ETL scheduling

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070276851A1 (en) * 2006-05-02 2007-11-29 International Business Machines Corporation System and method for optimizing federated and ETL'd databases having multidimensionally constrained data
CN1953490A (en) * 2006-09-06 2007-04-25 南京中兴软创科技有限责任公司 A method to extract and provide the charging data with the technology of ETL
CN101221561A (en) * 2007-01-08 2008-07-16 中兴通讯股份有限公司 Data base script generalization implementing system and method
CN101295315A (en) * 2007-04-27 2008-10-29 软件股份公司 Method and database system for executing a xml database query
CN101533417A (en) * 2009-04-28 2009-09-16 阿里巴巴集团控股有限公司 A method and system for realizing ETL scheduling
CN101567013A (en) * 2009-06-02 2009-10-28 阿里巴巴集团控股有限公司 Method and apparatus for implementing ETL scheduling

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
欧阳琳: "分布式数据流处理系统动态负载管理研究", 《中国博士学位论文全文数据库》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106921614A (en) * 2015-12-24 2017-07-04 北京国双科技有限公司 Business data processing method and device
CN110413917A (en) * 2019-08-01 2019-11-05 山东浪潮通软信息科技有限公司 A method of realizing web report data caching
CN110413917B (en) * 2019-08-01 2023-01-24 浪潮通用软件有限公司 Method for realizing web report data caching
CN111241176A (en) * 2019-12-30 2020-06-05 天津浪淘科技股份有限公司 Data management system
CN112115191A (en) * 2020-09-22 2020-12-22 南京北斗创新应用科技研究院有限公司 Branch optimization method executed by big data ETL model
CN112115191B (en) * 2020-09-22 2022-02-15 南京北斗创新应用科技研究院有限公司 Branch optimization method executed by big data ETL model

Similar Documents

Publication Publication Date Title
CA3099664C (en) Cloud-edge topologies
CN105930446B (en) A kind of telecom client label generating method based on Hadoop distributed computing technology
CN102169500A (en) Dynamic service flow display device
CN102521712B (en) A kind of process instance data processing method and device
CN110750650A (en) Construction method and device of enterprise knowledge graph
CN103997523B (en) Smart city operation system and its implementation based on cloud service
CN103902574A (en) Real-time data loading method and device based on data flow technology
CN101873334A (en) State-driven executable service flow execution method
CN106131185A (en) The processing method of a kind of video data, Apparatus and system
CN112379884A (en) Spark and parallel memory computing-based process engine implementation method and system
CN106156047A (en) A kind of SNAPSHOT INFO processing method and processing device
CN103399735A (en) Method for developing intermediate layer of remote function call interface
Gopalakrishnan et al. Simulation-based planning of maintenance activities by a shifting priority method
CN114710571A (en) Data packet processing system
CN102137449B (en) Business process method and system for business support system
CN105446812A (en) Multitask scheduling configuration method
CN110505301A (en) A kind of aeronautical manufacture workshop industry big data processing frame
CN114579097A (en) Cloud native data API construction method based on single data stream
CN114138811A (en) Column calculation optimization method based on Spark SQL
Rodrigues et al. MAINTENANCE SCHEDULING OF HEATING NETWORKS USING SIMULATION IN WITNESS.
CN114596046A (en) Integrated platform based on unified digital model of business center station and data center station
CN106777375A (en) A kind of data adaptation system
CN113111097A (en) Method for realizing high-speed query of ocean data by using distributed database technology
Chen et al. Energy efficiency analysis of e-commerce customer management system based on mobile edge computing
CN112988705A (en) Data middlebox construction method for enterprise-level production

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20140702

RJ01 Rejection of invention patent application after publication