CN102214098A - Dynamic webpage data acquisition method based on WebKit browser engine - Google Patents

Dynamic webpage data acquisition method based on WebKit browser engine Download PDF

Info

Publication number
CN102214098A
CN102214098A CN2011101618008A CN201110161800A CN102214098A CN 102214098 A CN102214098 A CN 102214098A CN 2011101618008 A CN2011101618008 A CN 2011101618008A CN 201110161800 A CN201110161800 A CN 201110161800A CN 102214098 A CN102214098 A CN 102214098A
Authority
CN
China
Prior art keywords
webkit
thread
data
dom tree
server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2011101618008A
Other languages
Chinese (zh)
Inventor
李飞燕
陈曦
杨艾琳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
National Sun Yat Sen University
Original Assignee
National Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National Sun Yat Sen University filed Critical National Sun Yat Sen University
Priority to CN2011101618008A priority Critical patent/CN102214098A/en
Publication of CN102214098A publication Critical patent/CN102214098A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses a dynamic webpage data acquisition method based on a WebKit browser engine. The dynamic webpage data acquisition method comprises the following steps of: sending an http request to a server, receiving original webpage data and constructing a document object module (DOM) tree, wherein the step of sending the http request, receiving the original webpage data, resolving js and constructing the DOM tree is realized by a WebKit bottom layer; aiming at different websites, maintaining corresponding configuration files, wherein the configuration files comprise js codes which trigger corresponding events and are transmitted to js execution interfaces provided by the WebKit in the form of a character string; and the WebKit updates the DOM tree according to the corresponding events; calling an I/O interface of the WebKit, converting the DOM tree into an html format and outputting the DOM tree in the form of the character string. By the method, the requirement of expandability is met in a configuration file manner, asynchronous parallel processing between a browser and the server is realized, the burden of the server is relieved, and a user experience is enhanced.

Description

A kind of dynamic page collecting method based on the WebKit browser engine
Technical field
The present invention relates to the computer information technology field, be specifically related to dynamic page collecting method based on the WebKit browser engine.
Background technology
Rise along with Web2.0, AJAX (Asynchronous JavaScript and XML, asynchronous JavaScript and XML) technology is fashionable for a time, and the mode of client and server end asynchronous interactive had both reduced the pressure of server end, and had brought better user experience.Yet, the a large amount of dynamic web pages that use this technology to produce obtain to network data and have caused a new difficult problem, traditional be used to gather the content that the Web metadata acquisition tool of static Web page such as content that web crawlers grasps present far fewer than the page, useful information in a large amount of dynamic web pages can't obtain makes with network data to be that the work of main process object can't be carried out smoothly, had a strong impact on the Web content monitoring, subject development such as network data excavation.
Therefore, how to improve traditional Web data acquisition system (DAS), make it to support dynamic page to be resolved, become a research focus of current information acquisition technique.The experts and scholars of internet arena have done many useful researchs to this problem and have attempted, and have proposed constructive thinking and solution.The main method of current dynamic page collection has two kinds substantially: the one, utilize the browser interface (as Firefox) of increasing income, and with the form of writing plug-in unit browser output result is gathered; The 2nd, utilize existing script rendering engine (as SpiderMonkey, Rhino etc.) relevant DOM (DocumentObject Model, DOM Document Object Model) object to be bound according to the needs of information acquisition, the result gathers to output.Yet, also there are some problems in present research: the one, and present research is mainly climbed the universal method of getting dynamic web page towards design large scale network reptile, supports that for some directed targetedly data acquisitions (as the collection of particular forum or business website merchandise news) effect is not ideal enough; The 2nd, most of scheme realizes comparatively complicated, and is not suitable for instant on a small scale data acquisition demand.
Based on above reason, this paper simply climbs on the reptile basis of getting class forum structured data at one and expands, and has proposed a kind of collection dynamic page data-selected scheme based on the WebKit browser engine.By adopting Qt (a cross-platform C++ graphic user interface storehouse) framework, make calling program that reliability and professional platform independence preferably be arranged; By the mode that interface is separated with configuration file, make program have good expandability; Network environment at complexity has designed waiting-timeout mechanism, and the robustness of program is greatly improved.
The main method of current dynamic page collection has two kinds substantially: the one, utilize the browser interface (as Firefox) of increasing income, and with the form of writing plug-in unit browser output result is gathered; The 2nd, utilize existing script rendering engine (as SpiderMonkey, Rhino etc.) relevant DOM object to be bound according to the needs of information acquisition, the result gathers to output.
Mainly climb the universal method of getting dynamic web page at present, support that for some directed targetedly data acquisitions (as the collection of particular forum or business website merchandise news) effect is not ideal enough towards design large scale network reptile; Secondly most of scheme realizes comparatively complicated, and is not suitable for instant on a small scale data acquisition demand.
Summary of the invention
This paper simply climbs on the reptile basis of getting class forum structured data at one and expands, and has proposed a kind of collection dynamic page data-selected scheme based on the WebKit browser engine.By adopting the Qt framework, make calling program that reliability and professional platform independence preferably be arranged; By the mode that interface is separated with configuration file, make program have good expandability; Network environment at complexity has designed waiting-timeout mechanism, and the robustness of program is greatly improved.
The embodiment of the invention provides a kind of dynamic page collecting method based on the WebKit browser engine, comprising:
Send the http request to server end, receive the parent page data, make up dom tree, described transmission http request receives the parent page data, resolves js and makes up dom tree by the WebKit bottom layer realization;
At different websites, safeguard corresponding configuration file, comprise the js code that triggers corresponding event in the configuration file, pass to the js executive's interface that WebKit provides with the form of character string, corresponding by WebKit according to incident, upgrade dom tree;
Call the I/O interface of WebKit, dom tree is changed into the html form, with the form output of character string.
Described method adopts three thread modes to realize Data Receiving, specifically comprises:
Thread one is responsible for normal Data Receiving, monitors the loadFinished signal, if normally receive then terminate thread two, it is success status that the receiving flag position is set;
Thread two is the timer thread, this thread monitoring reception time, promptly think overtime if surpass predetermined time of reception, and stop normal receiving thread, and the receiving flag position is set is status of fail;
Thread three externally provides interface, and the receiving flag state is provided, and subsequent step can be handled accordingly by this zone bit.
Described triggering webpage event update dom tree comprises:
Js code in thread one running configuration file, the operation of simulation trigger event, and circular wait server data; Then wake thread two up if the reception data are finished, accepting state indication interface externally is provided; Thread two is responsible for wait thread one data and is finished renewal DOM.
This paper has provided a scheme of gathering the dynamic page data based on the WebKit browser engine.Structure and critical workflow to integral body have been done detailed explanation.Through on a plurality of forums and merchandise sales website, testing, verified that this method is feasible efficient, and strengthened the robustness of program by designing realization waiting-timeout mechanism, can tackle comparatively complicated network environment.Realized the demand of extensibility by the mode of configuration file.Its successful part just is that it has made up Web more dynamic and that response is more sensitive and has used, and has realized that the asynchronous parallel between browser and the server is handled, and has not only alleviated the burden of server end but also brought unique user experience.Dynamic page data acquisition for instant middle and small scale has good practical the reference to be worth.
Description of drawings
In order to be illustrated more clearly in the embodiment of the invention or technical scheme of the prior art, to do to introduce simply to the accompanying drawing of required use in embodiment or the description of the Prior Art below, apparently, accompanying drawing in describing below only is some embodiments of the present invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain other accompanying drawing according to these accompanying drawings.
Fig. 1 is a Web application model structural representation of the prior art;
Fig. 2 is the AJAX application model structural representation in the embodiment of the invention;
Fig. 3 is the WebKit browser inner core synoptic diagram in the embodiment of the invention;
Fig. 4 is QWebView, QWebPage in the embodiment of the invention and the relation structure diagram between the QWebFrame class;
Fig. 5 is that the class forum structured data in the embodiment of the invention is gathered architectural schematic;
Fig. 6 is the dynamic page acquisition module structural representation in the embodiment of the invention;
Fig. 7 is the QWebpage class interface synoptic diagram in the embodiment of the invention;
Fig. 8 is the Data Receiving process flow diagram in the embodiment of the invention;
Fig. 9 is the triggering webpage event flow diagram in the embodiment of the invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the invention, the technical scheme in the embodiment of the invention is clearly and completely described, obviously, described embodiment only is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills belong to the scope of protection of the invention not making all other embodiment that obtained under the creative work prerequisite.
This paper simply climbs on the reptile basis of getting class forum structured data at one and expands, and has proposed a kind of collection dynamic page data-selected scheme based on the WebKit browser engine.By adopting the Qt framework, make calling program that reliability and professional platform independence preferably be arranged; By the mode that interface is separated with configuration file, make program have good expandability; Network environment at complexity has designed waiting-timeout mechanism, and the robustness of program is greatly improved.
2.1AJAX definition and gordian technique
AJAX is the abbreviation of Asynchronous JavaScript and XML (asynchronous JavaScript and XML), is at first proposed in 2005 by famous user experience expert Jesse-James Garrett.AJAX is not a kind of new technology, but the combination of a series of Web correlation techniques that have been widely used, as XML, CSS, DOM, XMLHttpRequest, JavaScript etc.Its successful part just is that it has made up Web more dynamic and that response is more sensitive and has used, and has realized that the asynchronous parallel between browser and the server is handled, and has not only alleviated the burden of server end but also brought unique user experience.The AJAX of standard comprises:
(1) adopt XHTML and CSS standardization to show
(2) adopt DOM to realize dynamically showing with mutual
(3) adopt XML and XSLT to carry out data interaction and processing
(4) adopting XMLHttpRequest to carry out asynchronous data obtains
(5) adopt JavaScript to bind and deal with data
2.2Ajax the difference of model and traditional Web application model
Use different with traditional Web, AJAX is not that the mode based on static page makes up application, it is carried out the page of less amount and forms, wherein each page is the AJAX assembly of a more small-sized use JavaScript exploitation, these assemblies use the XMLHttpRequest object with asynchronous mode and server communication, obtain from server end after the data that need using DOM API to upgrade content of pages.
In traditional Web application model, typical alternant way is to send the HTTP request by client browser to Web server.Web server is handled user's request, and result is returned to client browser with the form of html page.The user must wait in sending HTTP request or Web server processing procedure, even and if only change sub-fraction content in the page, Web server all will return a complete Web page, has wasted a large amount of time and bandwidth.Traditional Web application model as shown in Figure 1.
What the principle of work of AJAX and traditional Web application model difference were to adopt between browser and the Web server is asynchronous communication means, between client and server, increased a middle layer---AJAX engine, in order to handle the request of client, realized mutual asynchronization.User's operation is also not all submitted to server, and some data verifications and processing are finished by the AJAX engine, has only and need just submit request by the AJAX engine to server when server reads new data.The AJAX engine obtains desired data with the backstage method of operation, does not need heavily loaded full page, only needs to upgrade the content of required part, has significantly reduced volume of transmitted data, has shortened the response time, has not only alleviated the burden of server end but also promoted user experience.The AJAX application model as shown in Figure 2.
2.3WebKit browser kernel
The predecessor of WebKit is the KHTML of KDE group, is the Web browser engine of increasing income, the just kernel of browser in simple terms.The Safari of Apple, the Chrome of Google, NokiaS60, the Webkit that the default browser of Android mobile phone all adopts be as kernel, is one of three big browser kernels of same Gecko, Trident and the current main flow that claims.Its engine efficient stable, compatible good, the source code clear in structure is easy to safeguard.WebKit mainly comprises three parts from code structure, as shown in Figure 3.
Wherein the core is WebCore, and it has realized the modelling to document, comprises CSS, DOM, and Render etc., JavaScript Core are the supports to JavaScript.And Webkit has partly taken out and the directly realization of some corresponding notions of browser, as WebView, and WebPage, WebFrame etc.Application program does not need direct control WebCore and JavaScript Core, but carries out alternately with the API that the WebKit module provides.
2.4Qt Development Framework reaches the support to WebKit
Qt is famous cross-platform C++ application development frameworks, from Qt4.5 integrated since the WebKit, the difference that its abundant general-purpose interface has easily blured application program and Web content.Its support to WebKit mainly comprises the several classes shown in the table 1.
The class of among table one Qt WebKit being supported
Figure BSA00000518581200071
In these classes, topmost is QWebView, QWebPage, and three of QWebFrame, their relation is as shown in Figure 4.
The QWebView class can comprise the object of QWebPage and QWebFrame.QWebView is by creating the QwebPage object and then creating visual editable webpage.QWebFrame is the meta object of QWebPage, and each QWebPage object has a QWebFrame at least, is called mainframe, can obtain by QWebPage::mainframe () method.Can return the QWebPage object at its place by the page () method of calling QWebFramed.
2.5 class forum structured data is gathered the overall framework introduction
Forum, data are generally organized with double-layer structure in the business website, and this organizational form is called class forum structure, and ground floor is a list page in the typical class forum structure, and the second layer page is the main data volume of gathering.This type of data structure relative fixed, data volume greatly and comparatively concentrated have higher researching value.The acquisition system of these type of data of extracting adopts system as shown in Figure 5 substantially at present.
The mainly responsible maintenance of task scheduling modules is climbed and is got strategy, carries out the task distribution; List page Url abstraction module extracts ur1 in the list page according to different template, and safeguards to climb and get formation.The page capture module by http protocol access server, is gathered the page according to grasping strategy.Therefore this module needs accesses network, will handle complicated network anomaly situation, as overtime, network interruption etc.; Data memory module is responsible for storage system maintenance, need carry out a large amount of IO operations to database or file system.
2.6 General layout Plan
The dynamic page acquisition module is expansion and the improvement that traditional static page capture module is carried out.Committed step as shown in Figure 6.
Whole flow process is divided into three big steps:
One, sends the http request to server end, receive the parent page data, make up dom tree.Send request in this step, receive data, resolve js and make up dom tree by the WebKit bottom layer realization.
Two, at different websites, safeguard corresponding configuration file, comprise the js code that triggers corresponding event in the configuration file, pass to the js executive's interface that WebKit provides with the form of character string.Corresponding by WebKit according to incident, upgrade dom tree.
Three, call the I/O interface of WebKit, dom tree is changed into the html form, with the form output of character string.
The various interface that whole module mainly uses QWebPage to provide, the QWebPage class can be considered black box as shown in Figure 7.
The http request interface is responsible for sending request, and the QWebPage bottom layer realization receives operations such as data; Whether parameter is provided with interface and the agency is set, Loads Image automatically, whether can carries out js etc.; The JS calling interface reads in local js code, and the QWebPage bottom is resolved and carried out; The DOM export interface exports as the html format string with DOM.
2.7 Data Receiving is counted flow process
Because network condition is comparatively complicated comparatively speaking, server timeout, situations such as network interruption can impact normal data acquisition, need unusual condition be handled and be tackled at the reception data phase, yet WebKit provide interface to satisfy the demands.For this reason, adopt three thread modes to realize Data Receiving.Idiographic flow as shown in Figure 8.
Thread one is responsible for normal Data Receiving, monitors the loadFinished signal, if normally receive then terminate thread two, it is success status that the receiving flag position is set.
Thread two is the timer thread, this thread monitoring reception time, promptly think overtime if surpass predetermined time of reception, and stop normal receiving thread, and the receiving flag position is set is status of fail.
Thread three externally provides interface, and the receiving flag state is provided.Subsequent step can be handled accordingly by this zone bit.
2.8 trigger webpage event update DOM flow process
For some webpage, some incident and the server that need to trigger on the webpage carry out just obtaining desired data alternately.For example often need the button on some webpage clicking during the browser client browsing page, could on the page, see data.For the data acquisition that realizes robotization needs the behavior that program should be able to the simulates real real user, as click the mouse scroll through pages etc.Similar operations can be write the action of js code simulation trigger event by at the different web sites custom configuration file.Program provides local js code call interface, and the js code will be passed to WebKit with the form of character string, is triggered the operation of upgrading DOM by kernel modules at lower layers realization event.For guaranteeing and the robustness of server interaction that the mode that has designed the two-wire journey realizes above-mentioned functions.Idiographic flow as shown in Figure 9.
Js code in thread one running configuration file, the operation of simulation trigger event, and circular wait server data.Then wake thread two up if the reception data are finished, accepting state indication interface externally is provided.Thread two is responsible for wait thread one data and is finished renewal DOM.
This paper has provided a scheme of gathering the dynamic page data based on the WebKit browser engine.Structure and critical workflow to integral body have been done detailed explanation.Through on a plurality of forums and merchandise sales website, testing, verified that this method is feasible efficient, and strengthened the robustness of program by designing realization waiting-timeout mechanism, can tackle comparatively complicated network environment.Realized the demand of extensibility by the mode of configuration file.
Its successful part just is that it has made up Web more dynamic and that response is more sensitive and has used, and has realized that the asynchronous parallel between browser and the server is handled, and has not only alleviated the burden of server end but also brought unique user experience.Dynamic page data acquisition for instant middle and small scale has good practical the reference to be worth.
Because present research is mainly climbed the universal method of getting dynamic web page towards design large scale network reptile, support that for some directed targetedly data acquisitions (as the collection of particular forum or business website merchandise news) effect is not ideal enough; And most of scheme realization is comparatively complicated, and be not suitable for instant on a small scale data acquisition demand.
This programme is simply climbed on the reptile basis of getting class forum structured data at one and is expanded, and has proposed a kind of collection dynamic page data-selected scheme based on the WebKit browser engine.By adopting the Qt framework, make calling program that reliability and professional platform independence preferably be arranged; By the mode that interface is separated with configuration file, make program have good expandability; Network environment at complexity has designed waiting-timeout mechanism, and the robustness of program is greatly improved.
Need to prove, contents such as the information interaction between said apparatus and intrasystem each unit, implementation since with the inventive method embodiment based on same design, particular content can repeat no more referring to the narration among the inventive method embodiment herein.
One of ordinary skill in the art will appreciate that all or part of step in the whole bag of tricks of the foregoing description is to instruct relevant hardware to finish by program, this program can be stored in the computer-readable recording medium, storage medium can comprise: ROM (read-only memory) (ROM, Read Only Memory), random access memory (RAM, Random Access Memory), disk or CD etc.
More than to a kind of embedded home gateway web server system that the embodiment of the invention provided, be described in detail, used specific case herein principle of the present invention and embodiment are set forth, the explanation of above embodiment just is used for helping to understand method of the present invention and core concept thereof; Simultaneously, for one of ordinary skill in the art, according to thought of the present invention, the part that all can change in specific embodiments and applications, in sum, this description should not be construed as limitation of the present invention.

Claims (3)

1. the dynamic page collecting method based on the WebKit browser engine is characterized in that, comprising:
Send the http request to server end, receive the parent page data, make up dom tree, described transmission http request receives the parent page data, resolves js and makes up dom tree by the WebKit bottom layer realization;
At different websites, safeguard corresponding configuration file, comprise the js code that triggers corresponding event in the configuration file, pass to the js executive's interface that WebKit provides with the form of character string, corresponding by WebKit according to incident, upgrade dom tree;
Call the I/O interface of WebKit, dom tree is changed into the html form, with the form output of character string.
2. the dynamic page collecting method based on the WebKit browser engine as claimed in claim 1 is characterized in that, described method adopts three thread modes to realize Data Receiving, specifically comprises:
Thread one is responsible for normal Data Receiving, monitors the loadFinished signal, if normally receive then terminate thread two, it is success status that the receiving flag position is set;
Thread two is the timer thread, this thread monitoring reception time, promptly think overtime if surpass predetermined time of reception, and stop normal receiving thread, and the receiving flag position is set is status of fail;
Thread three externally provides interface, and the receiving flag state is provided, and subsequent step can be handled accordingly by this zone bit.
3. the dynamic page collecting method based on the WebKit browser engine as claimed in claim 2 is characterized in that, described triggering webpage event update dom tree comprises:
Js code in thread one running configuration file, the operation of simulation trigger event, and circular wait server data; Then wake thread two up if the reception data are finished, accepting state indication interface externally is provided; Thread two is responsible for wait thread one data and is finished renewal DOM.
CN2011101618008A 2011-06-15 2011-06-15 Dynamic webpage data acquisition method based on WebKit browser engine Pending CN102214098A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2011101618008A CN102214098A (en) 2011-06-15 2011-06-15 Dynamic webpage data acquisition method based on WebKit browser engine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2011101618008A CN102214098A (en) 2011-06-15 2011-06-15 Dynamic webpage data acquisition method based on WebKit browser engine

Publications (1)

Publication Number Publication Date
CN102214098A true CN102214098A (en) 2011-10-12

Family

ID=44745419

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2011101618008A Pending CN102214098A (en) 2011-06-15 2011-06-15 Dynamic webpage data acquisition method based on WebKit browser engine

Country Status (1)

Country Link
CN (1) CN102214098A (en)

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103139260A (en) * 2011-11-30 2013-06-05 国际商业机器公司 Method and system for reusing hypertext markup language (HTML) content
CN103365919A (en) * 2012-04-09 2013-10-23 纽海信息技术(上海)有限公司 Webpage analysis container and method
WO2013159745A1 (en) * 2012-04-28 2013-10-31 广州市动景计算机科技有限公司 Webpage browsing method, webapp framework, method and device for executing javascript and mobile terminal
CN103973805A (en) * 2014-05-20 2014-08-06 浪潮电子信息产业股份有限公司 Interaction method of dynamic web page and server
CN105204922A (en) * 2014-06-30 2015-12-30 金电联行(北京)信息技术有限公司 Collecting method of client terminal of data collecting platform
CN105512193A (en) * 2015-11-26 2016-04-20 上海携程商务有限公司 Data acquisition system and method based on browser expansion
CN105630512A (en) * 2016-02-17 2016-06-01 北京高绎信息技术有限公司 Method and system for implementing mobile device data tracking through software development toolkit
CN105630473A (en) * 2014-11-03 2016-06-01 中国科学院声学研究所 JavaScript event extension method supporting asynchronous call
CN105989134A (en) * 2015-02-26 2016-10-05 小米科技有限责任公司 Webpage recording method and device
CN106649567A (en) * 2016-11-15 2017-05-10 杭州安恒信息技术有限公司 Web crawler system based on browser kernel
CN106951450A (en) * 2017-02-22 2017-07-14 北京麒麟合盛网络技术有限公司 A kind of webpage information acquisition method, device and computing device
CN106997374A (en) * 2017-01-05 2017-08-01 深圳大宇无限科技有限公司 Deep linking acquisition methods and device
CN107025124A (en) * 2015-06-24 2017-08-08 上海中信信息发展股份有限公司 Web technologies develop the system architecture of desktop
CN107766509A (en) * 2017-10-23 2018-03-06 北京京东尚科信息技术有限公司 A kind of method and apparatus of webpage static backup
CN108255802A (en) * 2016-12-29 2018-07-06 北京国双科技有限公司 Generic text Analytical framework and the method and apparatus based on framework parsing text
CN109542437A (en) * 2018-11-16 2019-03-29 北京科罗菲特科技有限公司 A kind of HMI development approach based on Linux built-in browser
CN109582353A (en) * 2017-09-26 2019-04-05 北京国双科技有限公司 The method and device of embedding data acquisition code
CN109800369A (en) * 2018-12-14 2019-05-24 平安普惠企业管理有限公司 Hybrid app page loading method, device and computer equipment
CN110032493A (en) * 2019-03-13 2019-07-19 平安城市建设科技(深圳)有限公司 Monitoring method, device, terminal and the readable storage medium storing program for executing of the page
CN111125597A (en) * 2019-12-18 2020-05-08 百度在线网络技术(北京)有限公司 Webpage loading method, browser, electronic equipment and storage medium
CN111198998A (en) * 2019-12-31 2020-05-26 北京指掌易科技有限公司 Network page loading method, device and system based on Ajax request
CN111523074A (en) * 2020-04-26 2020-08-11 成都思维世纪科技有限责任公司 Acquisition system for dynamic page sensitive data of front-end rendering website
CN113742550A (en) * 2021-08-20 2021-12-03 广州市易工品科技有限公司 Data acquisition method, device and system based on browser
CN114428635A (en) * 2022-04-06 2022-05-03 杭州未名信科科技有限公司 Data acquisition method and device, electronic equipment and storage medium
CN117032656A (en) * 2023-10-09 2023-11-10 北京优锘科技股份有限公司 WebAssemble-based front-end multithreading encoding and decoding method, medium and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1605987A (en) * 2004-11-17 2005-04-13 中兴通讯股份有限公司 Method for realizing real time threads state monitoring in multiple thread system
CN1700177A (en) * 2005-06-24 2005-11-23 中国人民解放军国防科学技术大学 Method for constructing Web server based on soft flow construction and server thereof
CN101089856A (en) * 2007-07-20 2007-12-19 李沫南 Method for abstracting network data and web reptile system
US7536389B1 (en) * 2005-02-22 2009-05-19 Yahoo ! Inc. Techniques for crawling dynamic web content

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1605987A (en) * 2004-11-17 2005-04-13 中兴通讯股份有限公司 Method for realizing real time threads state monitoring in multiple thread system
US7536389B1 (en) * 2005-02-22 2009-05-19 Yahoo ! Inc. Techniques for crawling dynamic web content
CN1700177A (en) * 2005-06-24 2005-11-23 中国人民解放军国防科学技术大学 Method for constructing Web server based on soft flow construction and server thereof
CN101089856A (en) * 2007-07-20 2007-12-19 李沫南 Method for abstracting network data and web reptile system

Cited By (40)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9507759B2 (en) 2011-11-30 2016-11-29 International Business Machines Corporation Method and system for reusing HTML content
CN103139260B (en) * 2011-11-30 2015-09-30 国际商业机器公司 For reusing the method and system of HTML content
US10318616B2 (en) 2011-11-30 2019-06-11 International Business Machines Corporation Method and system for reusing HTML content
CN103139260A (en) * 2011-11-30 2013-06-05 国际商业机器公司 Method and system for reusing hypertext markup language (HTML) content
CN103365919A (en) * 2012-04-09 2013-10-23 纽海信息技术(上海)有限公司 Webpage analysis container and method
CN103365919B (en) * 2012-04-09 2018-07-31 北京京东尚科信息技术有限公司 Web analysis container and method
WO2013159745A1 (en) * 2012-04-28 2013-10-31 广州市动景计算机科技有限公司 Webpage browsing method, webapp framework, method and device for executing javascript and mobile terminal
US10185704B2 (en) 2012-04-28 2019-01-22 Guangzhou Ucweb Computer Technology Co., Ltd. Webpage browsing method, webapp framework, method and device for executing javascript and mobile terminal
RU2604326C2 (en) * 2012-04-28 2016-12-10 Гуанчжоу Юсивэб Компьютер Тэкнолоджи Ко., Лтд Webpage browsing method, webapp framework, method and device for executing javascript and mobile terminal
CN103973805A (en) * 2014-05-20 2014-08-06 浪潮电子信息产业股份有限公司 Interaction method of dynamic web page and server
CN105204922B (en) * 2014-06-30 2018-12-07 金电联行(北京)信息技术有限公司 A kind of data acquisition platform client acquisition method
CN105204922A (en) * 2014-06-30 2015-12-30 金电联行(北京)信息技术有限公司 Collecting method of client terminal of data collecting platform
CN105630473B (en) * 2014-11-03 2019-01-22 中国科学院声学研究所 Support the JavaScript event extended method of asynchronous call
CN105630473A (en) * 2014-11-03 2016-06-01 中国科学院声学研究所 JavaScript event extension method supporting asynchronous call
CN105989134A (en) * 2015-02-26 2016-10-05 小米科技有限责任公司 Webpage recording method and device
CN107025124A (en) * 2015-06-24 2017-08-08 上海中信信息发展股份有限公司 Web technologies develop the system architecture of desktop
CN105512193A (en) * 2015-11-26 2016-04-20 上海携程商务有限公司 Data acquisition system and method based on browser expansion
CN105630512A (en) * 2016-02-17 2016-06-01 北京高绎信息技术有限公司 Method and system for implementing mobile device data tracking through software development toolkit
WO2017140227A1 (en) * 2016-02-17 2017-08-24 北京高绎信息技术有限公司 Method and system for realizing mobile device data tracking using software development kit
CN106649567A (en) * 2016-11-15 2017-05-10 杭州安恒信息技术有限公司 Web crawler system based on browser kernel
CN108255802B (en) * 2016-12-29 2021-08-24 北京国双科技有限公司 Universal text parsing architecture and method and device for parsing text based on architecture
CN108255802A (en) * 2016-12-29 2018-07-06 北京国双科技有限公司 Generic text Analytical framework and the method and apparatus based on framework parsing text
CN106997374A (en) * 2017-01-05 2017-08-01 深圳大宇无限科技有限公司 Deep linking acquisition methods and device
CN106951450A (en) * 2017-02-22 2017-07-14 北京麒麟合盛网络技术有限公司 A kind of webpage information acquisition method, device and computing device
CN106951450B (en) * 2017-02-22 2020-04-07 麒麟合盛网络技术股份有限公司 Webpage information acquisition method and device and computing equipment
CN109582353A (en) * 2017-09-26 2019-04-05 北京国双科技有限公司 The method and device of embedding data acquisition code
CN107766509A (en) * 2017-10-23 2018-03-06 北京京东尚科信息技术有限公司 A kind of method and apparatus of webpage static backup
CN109542437A (en) * 2018-11-16 2019-03-29 北京科罗菲特科技有限公司 A kind of HMI development approach based on Linux built-in browser
CN109800369A (en) * 2018-12-14 2019-05-24 平安普惠企业管理有限公司 Hybrid app page loading method, device and computer equipment
CN110032493A (en) * 2019-03-13 2019-07-19 平安城市建设科技(深圳)有限公司 Monitoring method, device, terminal and the readable storage medium storing program for executing of the page
CN111125597A (en) * 2019-12-18 2020-05-08 百度在线网络技术(北京)有限公司 Webpage loading method, browser, electronic equipment and storage medium
CN111125597B (en) * 2019-12-18 2023-10-27 百度在线网络技术(北京)有限公司 Webpage loading method, browser, electronic equipment and storage medium
CN111198998A (en) * 2019-12-31 2020-05-26 北京指掌易科技有限公司 Network page loading method, device and system based on Ajax request
CN111198998B (en) * 2019-12-31 2023-08-08 北京指掌易科技有限公司 Method, device and system for loading network page based on Ajax request
CN111523074A (en) * 2020-04-26 2020-08-11 成都思维世纪科技有限责任公司 Acquisition system for dynamic page sensitive data of front-end rendering website
CN113742550A (en) * 2021-08-20 2021-12-03 广州市易工品科技有限公司 Data acquisition method, device and system based on browser
CN113742550B (en) * 2021-08-20 2024-04-19 广州市易工品科技有限公司 Browser-based data acquisition method, device and system
CN114428635A (en) * 2022-04-06 2022-05-03 杭州未名信科科技有限公司 Data acquisition method and device, electronic equipment and storage medium
CN117032656A (en) * 2023-10-09 2023-11-10 北京优锘科技股份有限公司 WebAssemble-based front-end multithreading encoding and decoding method, medium and device
CN117032656B (en) * 2023-10-09 2024-02-02 北京优锘科技股份有限公司 WebAssemble-based front-end multithreading encoding and decoding method, medium and device

Similar Documents

Publication Publication Date Title
CN102214098A (en) Dynamic webpage data acquisition method based on WebKit browser engine
US10216855B2 (en) Mobilizing an existing web application
CN101127038B (en) System and method for downloading website static web page
CN101122921B (en) Method forming tree-shaped display structure based on ajax and html
CN102693280B (en) Webpage browsing method, WebApp framework, method and device for executing JavaScript, and mobile terminal
CN102184266B (en) Method for automatically generating dynamic wireless application protocol (WAP) website for separation of page from data
CN103268361B (en) Extracting method, the device and system of URL are hidden in webpage
CN103412890A (en) Webpage loading method and device
CN103034724B (en) Browser is carried out input the method and device that data are recovered
CN105243159A (en) Visual script editor-based distributed web crawler system
CN102194003A (en) Web page popup window method and device
CN102591647A (en) Converting desktop applications to web applications
CN101799753B (en) Method and device for realizing tree structure
CN103034568A (en) Method and device for recovering input data of browser
CA2911670A1 (en) System and method for identifying web elements present on a web-page
CN104049991A (en) Method and system for converting network applications into mobile applications
CN102520966B (en) Method for prompting codes and device
CN103577599A (en) Method and device for storing local data through mobile terminal
CN103019538A (en) Method and system for implementing application interface in terminal
CN103645908A (en) Full life circle development achievement system of intemetware
CN106897347A (en) A kind of web page display method, Action Events recording method and device
CN105528369B (en) Webpage code-transferring method, device and server
CN102830974A (en) Visual auxiliary development tool for rapid generation of JAVA codes
CN103853717A (en) Web crawler
CN101876998A (en) Method and system for editing data

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20111012