CN102915229B - A kind of distributed computing method and system - Google Patents

A kind of distributed computing method and system Download PDF

Info

Publication number
CN102915229B
CN102915229B CN201110219900.1A CN201110219900A CN102915229B CN 102915229 B CN102915229 B CN 102915229B CN 201110219900 A CN201110219900 A CN 201110219900A CN 102915229 B CN102915229 B CN 102915229B
Authority
CN
China
Prior art keywords
distributed computing
distributed
file
computing platform
pending
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201110219900.1A
Other languages
Chinese (zh)
Other versions
CN102915229A (en
Inventor
沈雷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Shiji Guangsu Information Technology Co Ltd
Original Assignee
Shenzhen Shiji Guangsu Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Shiji Guangsu Information Technology Co Ltd filed Critical Shenzhen Shiji Guangsu Information Technology Co Ltd
Priority to CN201110219900.1A priority Critical patent/CN102915229B/en
Publication of CN102915229A publication Critical patent/CN102915229A/en
Application granted granted Critical
Publication of CN102915229B publication Critical patent/CN102915229B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention provides a kind of distributed computing method, comprise the following steps: based on the general utility functions of different distributions formula computing platform, build the unified DLL between multiple Distributed Computing Platforms; The application demand concrete according to user, programmes by described unified DLL, builds distributed application program; Call distribution script described distributed application program is submitted to Distributed Computing Platform, start pending task, under described Distributed Computing Platform, carry out described distributed application program. Adopt the method, distributed application program can be carried out in the situation that not changing or seldom change on different Distributed Computing Platforms, therefore improved the portability between different Distributed Computing Platforms. In addition, also provide a kind of distributed computing system.

Description

A kind of distributed computing method and system
[technical field]
The present invention relates to Distributed Calculation, relate in particular to a kind of distributed computing method and system.
[background technology]
Distributed Computing Platform is the bottom service platform of a kind of support application program distributed execution thereon.At present, all each own a set of DLL and submission methods of operation of each Distributed Computing Platform. For someThe application program that Distributed Computing Platform is write, if without significantly changing, be to run on otherPlatform, and some situation or even impossible transplanting. For different Distributed Computing Platforms, openThe personnel of sending out must learn different application development methods and the step of submitting operation to, therefore in different dividingIt is very difficult between cloth formula computing platform, moving. In a word, in traditional distributed computing method,Portability between different Distributed Computing Platforms is poor.
[summary of the invention]
Based on this, be necessary to provide a kind of distributed computing method, can improve different Distributed Calculation flatPortability between platform.
A kind of distributed computing method, comprises the following steps:
Based on the general utility functions of different distributions formula computing platform, build the unification between multiple Distributed Computing PlatformsDLL;
According to user's application demand, programme by described unified DLL, build Distributed ApplicationProgram;
Call distribution script described distributed application program is submitted to Distributed Computing Platform, start pendingTask, under described Distributed Computing Platform, carry out described distributed application program.
In a preferred embodiment, described unified DLL comprise context interface, for resolving input numberAccording to the interface, file access interface and the input and output byte stream interface that also generate output data.
In a preferred embodiment, described in, call distribution script described distributed application program is submitted to distributionThe step of formula computing platform comprises:
Obtain input file list, generate pending listed files;
According to pending listed files described in the mapper number cutting in Distributed Computing Platform;
Pending listed files after described cutting is submitted to described Distributed Computing Platform.
In a preferred embodiment, described in, call distribution script described distributed application program is submitted to distributionThe step of formula computing platform also comprises:
Collect the configuration information of input by command line parameter;
According to described configuration information, raw respectively to the mapper in described Distributed Computing Platform and reducerBecome encapsulation script;
Described encapsulation script is submitted to described Distributed Computing Platform.
In a preferred embodiment, in the pending listed files after described cutting, recorded pending fileFile path;
The described step of carrying out described distributed application program under Distributed Computing Platform also comprises:
Obtain the pending listed files after cutting by described Distributed Computing Platform, according to described pendingThe file path of file is processed described pending file, output result.
In addition, be also necessary to provide a kind of distributed computing system, can improve different Distributed Calculation flatPortability between platform.
A kind of distributed computing system, comprising:
Platform package module, for the general utility functions based on different distributions formula computing platform, builds multiple distributionsUnified DLL between formula computing platform;
Application package module, for the application demand concrete according to user, enters by described unified DLLRow programming, builds distributed application program;
Executive Module, for calling distribution script, that described distributed application program is submitted to Distributed Calculation is flatPlatform, starts pending task, carries out described distributed application program on described Distributed Computing Platform.
In a preferred embodiment, described unified DLL comprise context interface, for resolving input numberAccording to the interface, file access interface and the input and output byte stream interface that also generate output data.
In a preferred embodiment, described execution package module comprises:
Listed files generation module, for obtaining input file list, generates pending listed files;
Cutting module, for according to pending file described in the mapper number cutting of Distributed Computing PlatformList, and the pending listed files after described cutting is submitted to described Distributed Computing Platform.
In a preferred embodiment, described execution package module also comprises:
Configuration information collection module, for collecting the configuration information of input by command line parameter;
Encapsulation script generation module, for according to described configuration information, in described Distributed Computing PlatformMapper and reducer generate respectively encapsulation script, and described encapsulation script is submitted to described Distributed CalculationPlatform.
In a preferred embodiment, in the pending listed files after described cutting, recorded pending fileFile path;
Described execution package module also comprises:
Processing module, for obtain the pending listed files after cutting by described Distributed Computing Platform,According to the file path of described pending file, described pending file is processed output result.
Above-mentioned distributed computing method and system, by building the unified DLL of each Distributed Computing Platform,Basic function general each Distributed Computing Platform (being also the most frequently used most important function conventionally) can be addedEnter in unified DLL, and encapsulate the concrete of unified DLL according to different Distributed Computing PlatformsRealize, make the unified DLL building isolate the specific implementation of different Distributed Computing Platforms; ProfitProgramme with unified interface, developer is without a lot of details of being concerned about Distributed Computing PlatformWith dialect; In the time carrying out the distributed application program generating, distribution script has been isolated different distributed metersOtherness while calculating platform submission task, makes generated distributed application program not change or seldom changeIn moving situation, can on different Distributed Computing Platforms, carry out, therefore improve different distributed metersCalculate the portability between platform.
[brief description of the drawings]
Fig. 1 is the schematic flow sheet of distributed computing method in an embodiment;
Fig. 2 is by the frame diagram of the relevant function of unified DLL implementation platform in Fig. 1;
Fig. 3 is the schematic flow sheet of its function of ISolver Interface realization in an embodiment;
Fig. 4 calls distribution script that distributed application program is submitted to Distributed Calculation is flat in an embodimentThe schematic flow sheet of platform;
Fig. 5 is submitted to Distributed Calculation for calling distribution script in another embodiment by distributed application programThe schematic flow sheet of platform;
Fig. 6 is the structural representation of distributed computing system in an embodiment;
Fig. 7 is the structural representation of carrying out package module in an embodiment.
[detailed description of the invention]
In one embodiment, as shown in Figure 1, a kind of distributed computing method, comprises the following steps:
Step S102, based on the general utility functions of different distributions formula computing platform, builds multiple Distributed Calculation flatThe unified DLL of interstation.
Step S104, according to user's application demand, programmes by described unified DLL, buildsDistributed application program.
Step S106, calls distribution script distributed application program is submitted to Distributed Computing Platform, startsPending task is carried out distributed application program under Distributed Computing Platform.
Different Distributed Computing Platforms is to have certain general character, and different Distributed Computing Platforms canTo realize some general utility functions, these general utility functions are also the most frequently used most important basic functions. For example, prop upThe Distributed Computing Platform of holding MapReduce (a kind of computation model, for large-scale data processing) is mostThere is following general character: have distributed file system, each file system has corresponding access interface;MapReduce routine processes input data are also exported key-value pair (Key-valuepair) or result is directly defeatedGo out in distributed file system; There is the function of importing the configuration parameter of task from the external world into; There is taskRelevant statistical information and status information; When submission task, to provide mapper (in MapReduce for realityThe user application of existing Map step), reducer is (in MapReduce for realizing Reduce stepUser application, and can specify by how many mapper, reducer executed in parallel etc.
Based on these general character of Distributed Computing Platform, build the unified programming between multiple Distributed Computing PlatformsInterface, makes general basic function that these DLLs have each Distributed Computing Platform (conventionally alsoThe most frequently used most important function). The Distributed Computing Platform of what as shown in Figure 2, left side represented is bottom andDistributed file system. For some Distributed Computing Platforms, its distributed file system is included in platformAmong, in Fig. 2, the just division of functional module, is not concrete system architecture, below repeats no more.
As shown in Figure 2, right side part has been shown the unified DLL of framework, comprise IContext interface,ISolver interface, I/OStream interface and IFile interface. Wherein:
IContext interface is context interface, HadoopContext class and NativeContext in Fig. 2Class is all the specific implementation of this interface. The defined function of IContext interface comprises: initialize/anti-initialChange function, for initializing, analyse the data structure of structure oneself; Upgrade task status function, for to distributionThe executing state of formula file system feedback current task; Refresh counter function, for adding up some task lettersBreath, for example, processed how many records etc.; Output collecting function, for collecting the key-value pair of output; Read and joinPut informational function, for reading the configuration information of implementation period; Input global configuration informational function, for obtainingAbout the configured in one piece information of task, for example machine of login, user name, password etc.; Read and set and work asThe function in preceding document path, is used to specify the comspec of the current input file of processing; Open otherThe function of iostream, for opening the iostream of user's specified path.
ISolver interface is for resolving input data and generating the interface of exporting data, the function bag of its realizationDraw together: A. resolves the function of input word throttling; B. force to resolve the function of spare word throttling; C. generate the merit of outputEnergy. For example, LineSolver interface is a specific implementation of ISolver interface, for by input byteCirculation turns to the character string (being the text of a line a line) of a line a line. Each style of writing is originally from byte stream solutionThe object with business implication of separating out, is stored in the instantiation of ISolver interface. The byte stream of inputPass to ISolver interface, ISolver interface is resolved by function A, cutting line of text. When not having not moreWhen many input data are read from file, trigger the function B of ISolver interface, for by remaining byte streamGenerate a line text. Every generation a line text triggers the function C of ISolver interface, according to current thisRow text generation output, this output can and be opened other by the output collecting function of IContext interfaceIostream function completes. It is pointed out that ISolver interface can be according to service needed, processingThe data of arbitrary format, are not restricted to line of text.
As shown in Figure 3, the flow process of its function of ISolver Interface realization is specific as follows:
Step S302, reads appointment input file.
Step S304, judges whether to arrive end-of-file, if so, enters step S306, otherwise enters stepS312。
Step S306, calls the function B of ISolver interface.
Step S308, judges whether to generate new business object, if so, enters step S310, otherwise knotBundle. Business object is and the pending object of concrete traffic aided, for example, is a line text.
Step S310, calls the function C of ISolver interface.
Step S312, calls the function A of ISolver interface.
Step S314, judges whether to generate new business object, if so, enters step S316, otherwise entersEnter step S318.
Step S316, calls the function C of ISolver interface, returns to step S302.
Step S318, buffering area is full, reports an error.
IFile interface is file access interface; I/OStream interface is input and output byte stream interface, comprisesIInputSteam interface (input word throttling interface) and IOutputStream interface (output byte stream interface).As shown in Figure 2, because unified DLL has been isolated the specific implementation of concrete Distributed Computing Platform, useFamily only need to these DLLs alternately, without the details of Distributed Computing Platform of being concerned about bottom,When needs during from a platform transplantation to another platform, only need to be selected Distributed Application and carry out distributionWhat the Distributed Computing Platform of formula application program was corresponding realizes module. Therefore, the unified of above-mentioned structure compiledJourney interface has well been isolated the otherness of different distributions formula computing platform, by using these DLLs to openThe distributed application program of sending out can be flat in different Distributed Calculation in the situation that not changing or seldom changeOn platform, carry out, therefore improved the portability between different distributions formula computing platform.
As shown in Figure 2, mid portion has been shown the volume of writing according to the particular type of Distributed Computing PlatformThe specific implementation of journey interface. The operation of different Distributed Computing Platforms to file and the context of implementation periodInformation is all different, and the mid portion of Fig. 2 is file operation and the contextual information of program implementation period are enteredThe specific implementation of row. For example, HadoopFile class has encapsulated (the distributed literary composition of Hadoop platform to HDFSPart system) operation of file, XFSFile class has encapsulated to XFS (distributed file system of TBorg platform)The operation of file, NativeFile class has encapsulated the operation to local file, and HadoopContext class has encapsulatedThe contextual information of Hadoop program implementation period, NativeContext class has encapsulated local virtual distributed environmentThe contextual information of implementation period.
The mid portion that it should be noted that Fig. 2 has only been shown in an embodiment and has been distributed for HadoopThe encapsulation of formula computing platform, has different encapsulation for other different Distributed Computing Platforms, its methodPrinciple is identical, repeats no more at this. On user will be to the concrete Distributed Computing Platform of certain appointment,While utilizing the application of Development of Framework platform independence of the present embodiment, just need to use corresponding file operation and holdContextual specific implementation module of the departure date (as HadoopFile, HadoopContext). These realize module canTo be provided by the framework of the present embodiment, if framework does not provide, need user by IFile andThe interfaces such as IContext are realized.
In one embodiment, the distributed application program generating, needs to rely on distribution script to be submitted toDistributed Computing Platform. Distribution script is provided by the present embodiment scheme, for providing unified to userThe submission task state of platform independence, is submitted to concrete Distributed Calculation by distributed application program and dataPlatform. The submission instruction of different Distributed Computing Platforms is widely different, and distribution script is by accepting one group of systemThe function choosing-item of one form, and translated into the instruction that concrete Distributed Computing Platform is corresponding and go to carry out,Thereby isolate the otherness of different distributions formula computing platform in the time of the task of submission to. Script is defined carries in distributionHand over task state all very basic, conventional, important, be equivalent to get the task of each Distributed Computing PlatformSubmit the common factor of function to.
In the time that Distributed Computing Platform is supported MapReduce normal form, call the main option that distribution script is supportedComprise the executable program of listed files that user inputs, output directory, mapper and option, reducerExecutable program and option etc., these options are all general on various MapReduce Distributed Computing PlatformsBe suitable for. For example, (be only example, distribution script is not limited to down the grammer of the distribution script calling hereinState form, as long as there is identical function) be:
homework.py[OPTION]INPUTFILES...OUTPUT_DIR
Wherein, homework.py is the title of distribution script; INPUTFILES represents the path of input file,Can write many groups, cut apart with space, can use asterisk wildcard; OUTPUT_DIR represents the path that output is deposited;[OPTION] can comprise following option:
-m<mapper>,<num_of_key_fields>,<numberofmappers>
Wherein,<mapper>specify the Program path of mapper;<num_of_key_fields>be mapperThe shared field number of keypart in the key-value pair of output;<numberofmappers>represent to execute the taskThe number of mapper.-m parameter must be filled in.
-r<reducer>,<numberofreducers>
Wherein,<reducer>specify the Program path of reducer;<numberofreducers>represent to carry out and appointThe number of the reducer of business.-r parameter is optional.
-n<job-name>
Wherein,<job-name>represent the identification name of task, user can choose arbitrarily.
-o<gz|bz2>
Wherein,<gz|bz2>represent the acquiescence compressed format of output data, represent not compress if do not fill in.
-a<...>
Represent other parameters, for where necessary, can manually import some and Distributed Computing Platform into by userRelevant configuration information.
For example, the distribution script calling is: homework.py-mwc_map, 1,10-rwc_reduce, 3-oBz2-nword_countinput1/a*input2/b*output_dir, represents to call treating of this distribution script startupThe name of executing the task is called word_count, submits executable program wc_map to, and wc_reduce is to cluster,For the treatment of the file starting with b under the file starting with a under input1 catalogue and input2 catalogue, will tieFruit is stored in output_dir, and compresses with bz2. When Distributed Calculation, adopt 10 mapper and 3reducer。
As shown in Figure 4, in one embodiment, call distribution script distributed application program is submitted to pointThe step of cloth formula computing platform comprises following process:
Step S402, obtains input file list, generates pending listed files.
For different Distributed Computing Platforms, need to adopt different orders to obtain pending file rowTable. In pending listed files, record the complete trails of each pending file, and with the shape of textFormula is kept in a local temporary files.
Step S404, according to the pending listed files of mapper number cutting in Distributed Computing Platform.
Mapper number can be specified by user, each in the pending listed files generating in step S402File all will be delivered to mapper as input, because user can specify multiple mapper parallel processings, thereforeNeed to carry out cutting to pending listed files, to make the workload of each mapper as far as possible average.
In the present embodiment, carry out cutting by file number, for example, in pending listed files, have 10 and treatProcess file, user has specified 3 mapper, can be divided into (3,3,4), has two mapper respectivelyProcess 3 pending files, have a mapper to process 4 pending files. Pending literary composition after cuttingPart list is corresponding with each mapper, the number of the pending listed files after cutting and the number of mapperOrder is consistent, and the pending listed files after each cutting, as a temporary file, has wherein recorded eachThe path of the pending file of mapper.
Step S406, submits to Distributed Computing Platform by the pending listed files after cutting.
Pending listed files after cutting uploads in a temporary path of Distributed Computing Platform, so thatMapper reads.
As shown in Figure 5, in one embodiment, call distribution script described distributed application program is submitted toStep to Distributed Computing Platform also comprises following process:
Step S502, collects the configuration information of input by command line parameter.
Call distribution script, collect the configuration information of user's input by command line parameter, these configure letterBreath comprises the number of the mapper of above-mentioned user's appointment, the catalogue of input file list, output directory, andSetting and the User Defined of some and Distributed Computing Platform arrange etc.
Step S504, according to configuration information, to the mapper in Distributed Computing Platform and reducer differenceGenerate encapsulation script.
According to the configuration information of collecting, mapper and reducer are generated respectively to an encapsulation script, protectExist in local temporary files, for the configuration information of user's input is passed to the form of environmental varianceMapper and reducer. For example, under (SuSE) Linux OS, the encapsulation script of generation is inserted configuration informationEnter in the environmental variance list of mapper and reducer. Due to mainstream operation system (for example Windows,Linux, Mac operating system etc.) on all support environment variablees of application program, adopt environmental variance transmitThe configuration information of user's input, has more versatility.
Step S506, submits to Distributed Computing Platform by encapsulation script.
The encapsulation script generating in step S504, moves the necessary file of pending task together with other,Be submitted to together in the temporary path of Distributed Computing Platform. Other move the necessary literary composition of pending taskPart comprise user specify the local file that will upload (can specify by the parameter in distribution script) and with distributionThe bottom document that formula computing platform is relevant.
It should be noted that the flow process in flow process and the Fig. 5 in Fig. 4 can carry out simultaneously, also can carry outAfter complete any one of them flow process, carry out another one flow process.
The file road of in one embodiment, having recorded pending file in the pending listed files after cuttingFootpath. In this embodiment, the step of carrying out distributed application program under Distributed Computing Platform is specially: logicalCross Distributed Computing Platform and obtain the pending listed files after cutting, according to the file path of pending filePending file is processed to output result.
Concrete, the pending listed files after cutting and the encapsulation script of above-mentioned generation are submitted to distributed meterCalculate after platform, distribution of notifications formula computing platform starts pending task. Due to the pending literary composition after cuttingPart list is multiple texts, wherein every line item mapper need file path to be processed, pointIt is the text of every row that mapper in cloth formula computing platform obtains inputting, and every a line is distributed to mapperFile path to be processed, reducer receives the output data of mapper. Like this, limited mapperAlways file path line by line of input, mapper goes to file reading path pair after getting file pathThe file content of answering. Because the mode of different Distributed Computing Platform transmission input data is very much not different, thisSample design is unified the mode of input data, with respect to direct in traditional distributed computing methodFile content is passed to the mode of mapper; Meanwhile, framework does not limit pending file contentActual format, has farthest retained the flexibility of deal with data, thus realize cross-platform versatility andAvailability.
After pending file being processed by Distributed Computing Platform, export result. In taskAlso more exportable statistical informations, error message etc. after executing.
In one embodiment, as shown in Figure 6, a kind of distributed computing system, comprises platform package module102, application package module 104 and execution package module 106, wherein:
Platform package module 102, for the general utility functions based on different distributions formula computing platform, builds multiple pointsUnified DLL between cloth formula computing platform, and mutual with application package module 104 to unify DLL.
In one embodiment, as shown in Figure 2, based on the general character of Distributed Computing Platform, build multiple pointsUnified DLL between cloth formula computing platform, makes these DLLs have each Distributed Computing PlatformGeneral basic function (being also the most frequently used most important function conventionally). These DLLs comprise contextInterface, for resolve input data and generate output data interface, file access interface and input and output wordThrottling interface. The specific descriptions of DLL, with reference to above, repeat no more at this.
Application package module 104, for according to user's application demand, is programmed by unified DLL,Build distributed application program.
In the present embodiment, application package module 104, for application the demand concrete according to user, calls flatPlatform package module 102 completes concrete data processing business, programmes by unified DLL, and structureBuild a complete distributed application program.
Executive Module 106 is submitted to Distributed Computing Platform for calling distribution script by distributed application program,Start pending task, on Distributed Computing Platform, carry out distributed application program.
In one embodiment, the distributed application program generating, needs to rely on distribution script to be submitted toDistributed Computing Platform. Distribution script is provided by the present embodiment scheme, for providing unified to userThe submission task state of platform independence, is submitted to concrete Distributed Calculation by distributed application program and dataPlatform. The submission instruction of different Distributed Computing Platforms is widely different, and distribution script is by accepting one group of systemThe function choosing-item of one form, and translated into the instruction that concrete Distributed Computing Platform is corresponding and go to carry out,Thereby isolate the otherness of different distributions formula computing platform in the time of the task of submission to. Script is defined carries in distributionHand over task state all very basic, conventional, important, be equivalent to get the task of each Distributed Computing PlatformSubmit the common factor of function to.
In the time that Distributed Computing Platform is supported MapReduce normal form, call the main option that distribution script is supportedComprise the executable program of listed files that user inputs, output directory, mapper and option, reducerExecutable program and option etc., these options are all general on various MapReduce Distributed Computing PlatformsBe suitable for.
As shown in Figure 7, in one embodiment, carry out package module 106 and comprise listed files generation module116, cutting module 126, configuration information collection module 136, encapsulation script generation module 146 and processing mouldPiece 156, wherein:
Listed files generation module 116, for obtaining input file list, generates pending listed files.
For different Distributed Computing Platforms, need to adopt different orders to obtain pending file rowTable. In pending listed files, record the complete trails of each pending file, and with the shape of textFormula is kept in a local temporary files.
Cutting module 126 is for being listed as according to the pending file of the mapper number cutting of Distributed Computing PlatformTable, submits to Distributed Computing Platform by the pending listed files after cutting.
Mapper number can be specified by user, and the each file in the pending listed files of generation is wanted conductMapper is delivered in input, because user can specify multiple mapper parallel processings, therefore needs pendingListed files carries out cutting, to make the workload of each mapper as far as possible average.
In the present embodiment, carry out cutting by file number, for example, in pending listed files, have 10 and treatProcess file, user has specified 3 mapper, can be divided into (3,3,4), has two mapper respectivelyProcess 3 pending files, have a mapper to process 4 pending files. Pending literary composition after cuttingPart list is corresponding with each mapper, the number of the pending listed files after cutting and the number of mapperOrder is consistent, and the pending listed files after each cutting, as a temporary file, has wherein recorded eachThe path of the pending file of mapper. Pending listed files after cutting uploads to Distributed Computing PlatformA temporary path in so that mapper reads.
Configuration information collection module 136 is for collecting the configuration information of input by command line parameter.
Call distribution script, collect the configuration information of user's input by command line parameter, these configure letterBreath comprises the number of the mapper of above-mentioned user's appointment, the catalogue of input file list, output directory, andSetting and the User Defined of some and Distributed Computing Platform arrange etc.
Encapsulation script generation module 146 is for according to configuration information, to the mapper in Distributed Computing PlatformGenerate respectively encapsulation script with reducer, encapsulation script is submitted to Distributed Computing Platform.
According to the configuration information of collecting, mapper and reducer are generated respectively to an encapsulation script, protectExist in local temporary files, for the configuration information of user's input is passed to the form of environmental varianceMapper and reducer. For example, due to mainstream operation system (Windows, Linux, Mac operating systemDeng) on all support environment variablees of application program, adopt environmental variance to transmit the configuration information of user input,Have more versatility. The encapsulation script generating, moves the necessary file of pending task together with other,Be submitted to together in the temporary path of Distributed Computing Platform. Other move the necessary literary composition of pending taskPart comprise user specify the local file that will upload (can specify by the parameter in distribution script) and with distributionThe bottom document that formula computing platform is relevant.
Processing module 156 is for obtain the pending file after cutting by Distributed Computing Platform, according to treatingProcess the file path of file pending file is processed, output result.
Concrete, the pending listed files after cutting and the encapsulation script of above-mentioned generation are submitted to distributed meterCalculate after platform, distribution of notifications formula computing platform starts pending task. Due to the pending literary composition after cuttingPart list is multiple texts, wherein every line item mapper need file path to be processed, pointIt is the text of every row that mapper in cloth formula computing platform obtains inputting, and every a line is distributed to mapperFile path to be processed, reducer receives the output data of mapper. Like this, limited mapperAlways file path line by line of input, mapper goes to file reading path pair after getting file pathThe file content of answering. Because the mode of different Distributed Computing Platform transmission input data is very much not different, thisSample design is unified the mode of input data, with respect to direct in traditional distributed computing methodFile content is passed to the mode of mapper; Meanwhile, framework does not limit pending file contentActual format, has farthest retained the flexibility of deal with data, thus realize cross-platform versatility andAvailability.
After pending file being processed by Distributed Computing Platform, export result. In taskAlso more exportable statistical informations, error message etc. after executing.
It should be noted that distributed computing method provided by the present invention and system, be specially adapted to supportThe Distributed Computing Platform of MapReduce normal form, for the Distributed Computing Platform of other normal forms, also can adoptRealize by similar principles. Adopt above-mentioned distributed computing method and system, can improve different distributions formula and calculate flatPortability between platform, the Distributed Calculation that implementation platform is irrelevant.
The above embodiment has only expressed several embodiment of the present invention, and it describes comparatively concrete and detailed,But can not therefore be interpreted as the restriction to the scope of the claims of the present invention. It should be pointed out that for this areaThose of ordinary skill, without departing from the inventive concept of the premise, can also make some distortion andImprove, these all belong to protection scope of the present invention. Therefore, the protection domain of patent of the present invention should be with appendedClaim is as the criterion.

Claims (8)

1. a distributed computing method, comprises the following steps:
Based on the general utility functions of different distributions formula computing platform, build the unification between multiple Distributed Computing PlatformsDLL;
According to user's application demand, programme by described unified DLL, build Distributed ApplicationProgram;
Call distribution script described distributed application program is submitted to Distributed Computing Platform, start pendingTask, under described Distributed Computing Platform, carry out described distributed application program;
The described distribution script that calls is submitted to described distributed application program the step bag of Distributed Computing PlatformDraw together:
Obtain input file list, generate pending listed files;
According to pending listed files described in the mapper number cutting in Distributed Computing Platform;
Pending listed files after described cutting is submitted to described Distributed Computing Platform.
2. distributed computing method according to claim 1, is characterized in that, described unified programming connectsMouth comprises context interface, exports interface, the file access interface of data for resolving input data generationWith input and output byte stream interface.
3. distributed computing method according to claim 1, is characterized in that, described in call distribution pinThis step that described distributed application program is submitted to Distributed Computing Platform also comprises:
Collect the configuration information of input by command line parameter;
According to described configuration information, raw respectively to the mapper in described Distributed Computing Platform and reducerBecome encapsulation script;
Described encapsulation script is submitted to described Distributed Computing Platform.
4. distributed computing method according to claim 3, is characterized in that, treating after described cuttingProcess the file path that has recorded pending file in listed files;
The described step of carrying out described distributed application program under Distributed Computing Platform also comprises:
Obtain the pending listed files after cutting by described Distributed Computing Platform, according to described pendingThe file path of file is processed described pending file, output result.
5. a distributed computing system, is characterized in that, comprising:
Platform package module, for the general utility functions based on multiple Distributed Computing Platforms, builds multiple distributionsUnified DLL between formula computing platform;
Application package module, for according to user's application demand, compiles by described unified DLLJourney, builds distributed application program;
Carry out package module, for calling distribution script, described distributed application program is submitted to distributed meterCalculate platform, start pending task, on described Distributed Computing Platform, carry out described Distributed Application journeyOrder;
Described execution package module comprises:
Listed files generation module, for obtaining input file list, generates pending listed files;
Cutting module, for according to pending file described in the mapper number cutting of Distributed Computing PlatformList, and the pending listed files after described cutting is submitted to described Distributed Computing Platform.
6. distributed computing system according to claim 5, is characterized in that, described unified programming connectsMouth comprises context interface, exports interface, the file access interface of data for resolving input data generationWith input and output byte stream interface.
7. distributed computing system according to claim 5, is characterized in that, described execution Encapsulation MouldsPiece also comprises:
Configuration information collection module, for collecting the configuration information of input by command line parameter;
Encapsulation script generation module, for according to described configuration information, in described Distributed Computing PlatformMapper and reducer generate respectively encapsulation script, and described encapsulation script is submitted to described Distributed CalculationPlatform.
8. distributed computing system according to claim 7, is characterized in that, treating after described cuttingProcess the file path that has recorded pending file in listed files;
Described execution package module also comprises:
Processing module, for obtain the pending listed files after cutting by described Distributed Computing Platform,According to the file path of described pending file, described pending file is processed output result.
CN201110219900.1A 2011-08-02 2011-08-02 A kind of distributed computing method and system Active CN102915229B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110219900.1A CN102915229B (en) 2011-08-02 2011-08-02 A kind of distributed computing method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110219900.1A CN102915229B (en) 2011-08-02 2011-08-02 A kind of distributed computing method and system

Publications (2)

Publication Number Publication Date
CN102915229A CN102915229A (en) 2013-02-06
CN102915229B true CN102915229B (en) 2016-05-04

Family

ID=47613605

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110219900.1A Active CN102915229B (en) 2011-08-02 2011-08-02 A kind of distributed computing method and system

Country Status (1)

Country Link
CN (1) CN102915229B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105843813A (en) * 2015-01-14 2016-08-10 中国移动通信集团重庆有限公司 Method and device for processing big data
CN107797798A (en) * 2016-08-29 2018-03-13 美的智慧家居科技有限公司 WiFi chip development approach and device based on home appliance
CN106406985B (en) * 2016-09-21 2019-10-11 北京百度网讯科技有限公司 Distributed computing framework and distributed computing method
CN107038066B (en) * 2017-05-09 2020-06-16 吉林大学 Job computing system based on Web
CN107368300B (en) * 2017-06-26 2020-09-08 北京天元创新科技有限公司 MapReduce-based data summarization system and method
CN107291954B (en) * 2017-07-28 2020-07-31 南京邮电大学 OC L parallel query method based on MapReduce
CN107526706B (en) * 2017-08-04 2021-07-13 北京奇虎科技有限公司 Data processing method and device in distributed computing platform
CN108256118B (en) * 2018-02-13 2023-09-22 腾讯科技(深圳)有限公司 Data processing method, device, system, computing equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7496920B1 (en) * 2000-08-24 2009-02-24 Symantec Operating Corporation Dynamic computing environment using remotely allocable resources
CN101977242A (en) * 2010-11-16 2011-02-16 西安电子科技大学 Layered distributed cloud computing architecture and service delivery method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7496920B1 (en) * 2000-08-24 2009-02-24 Symantec Operating Corporation Dynamic computing environment using remotely allocable resources
CN101977242A (en) * 2010-11-16 2011-02-16 西安电子科技大学 Layered distributed cloud computing architecture and service delivery method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
利用Java对网络分布式计算相关技术的研究;牛铁;《中国优秀硕士学位论文全文数据库 信息科技辑》;20051215(第8期);正文第18页第1段-第21页第1段以及图3-1 *

Also Published As

Publication number Publication date
CN102915229A (en) 2013-02-06

Similar Documents

Publication Publication Date Title
CN102915229B (en) A kind of distributed computing method and system
Zhang et al. Research on lightweight MVC framework based on spring MVC and mybatis
CN103688250B (en) Use dynamic aspect to optimize data to process
CN103543993B (en) Application builds method and system
CN102314358B (en) Method for deploying conventional applications on cloud platform in SOA (service oriented architecture) way
CN104426967B (en) A kind of cross-platform and striding equipment mobile application development system
CN105843182A (en) Power dispatching accident handling scheme preparing system and power dispatching accident handling scheme preparing method based on OMS
CN102508908A (en) Method for acquiring subordinate financial business data and system for acquiring subordinate financial business data
CN102495735A (en) Web end UI (user interface) component application frame system
CN103425497B (en) The method and apparatus that a kind of network engineering script is changed across producer
CN100511135C (en) Unit test system and method thereof
CN102289593A (en) Multidisciplinary virtual experiment interactive simulation solution system
CN101853157A (en) Automatic identification method for application software GUI object
CN109145055A (en) A kind of method of data synchronization and system based on Flink
CN103425585A (en) OSGI (Open Service Gateway Initiative) integration testing method
CN103390018A (en) Web service data modeling and searching method based on SDD (service data description)
CN109976803A (en) A kind of generation method and device of file
CN100596139C (en) System and method for building component applications using metadata defined mapping between message and data domains
CN113010332A (en) Remote service calling method, device, equipment and storage medium
CN116149747A (en) Interface arrangement method, device, electronic equipment and computer readable storage medium
CN114356704A (en) Software definition method and system for cloud network terminal streaming data
CN104731606A (en) System and method based on identification grasping technology to achieve enterprise application integration management
JP6909877B2 (en) White Box How to Realize OTN Hardware Devices, Devices, Storage Media
CN109063059A (en) User behaviors log processing method, device and electronic equipment
CN112988165A (en) Kubernetes-based interactive modeling method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
ASS Succession or assignment of patent right

Owner name: SHENZHEN SHIJI LIGHT SPEED INFORMATION TECHNOLOGY

Free format text: FORMER OWNER: TENGXUN SCI-TECH (SHENZHEN) CO., LTD.

Effective date: 20131016

C41 Transfer of patent application or patent right or utility model
COR Change of bibliographic data

Free format text: CORRECT: ADDRESS; FROM: 518044 SHENZHEN, GUANGDONG PROVINCE TO: 518057 SHENZHEN, GUANGDONG PROVINCE

TA01 Transfer of patent application right

Effective date of registration: 20131016

Address after: A Tencent Building in Shenzhen Nanshan District City, Guangdong streets in Guangdong province science and technology 518057 16

Applicant after: Shenzhen Shiji Guangsu Information Technology Co., Ltd.

Address before: Shenzhen Futian District City, Guangdong province 518044 Zhenxing Road, SEG Science Park 2 East Room 403

Applicant before: Tencent Technology (Shenzhen) Co., Ltd.

C14 Grant of patent or utility model
GR01 Patent grant