CN102214098A - Dynamic webpage data acquisition method based on WebKit browser engine - Google Patents
Dynamic webpage data acquisition method based on WebKit browser engine Download PDFInfo
- Publication number
- CN102214098A CN102214098A CN2011101618008A CN201110161800A CN102214098A CN 102214098 A CN102214098 A CN 102214098A CN 2011101618008 A CN2011101618008 A CN 2011101618008A CN 201110161800 A CN201110161800 A CN 201110161800A CN 102214098 A CN102214098 A CN 102214098A
- Authority
- CN
- China
- Prior art keywords
- webkit
- thread
- data
- dom tree
- server
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Abstract
The invention discloses a dynamic webpage data acquisition method based on a WebKit browser engine. The dynamic webpage data acquisition method comprises the following steps of: sending an http request to a server, receiving original webpage data and constructing a document object module (DOM) tree, wherein the step of sending the http request, receiving the original webpage data, resolving js and constructing the DOM tree is realized by a WebKit bottom layer; aiming at different websites, maintaining corresponding configuration files, wherein the configuration files comprise js codes which trigger corresponding events and are transmitted to js execution interfaces provided by the WebKit in the form of a character string; and the WebKit updates the DOM tree according to the corresponding events; calling an I/O interface of the WebKit, converting the DOM tree into an html format and outputting the DOM tree in the form of the character string. By the method, the requirement of expandability is met in a configuration file manner, asynchronous parallel processing between a browser and the server is realized, the burden of the server is relieved, and a user experience is enhanced.
Description
Technical field
The present invention relates to the computer information technology field, be specifically related to dynamic page collecting method based on the WebKit browser engine.
Background technology
Rise along with Web2.0, AJAX (Asynchronous JavaScript and XML, asynchronous JavaScript and XML) technology is fashionable for a time, and the mode of client and server end asynchronous interactive had both reduced the pressure of server end, and had brought better user experience.Yet, the a large amount of dynamic web pages that use this technology to produce obtain to network data and have caused a new difficult problem, traditional be used to gather the content that the Web metadata acquisition tool of static Web page such as content that web crawlers grasps present far fewer than the page, useful information in a large amount of dynamic web pages can't obtain makes with network data to be that the work of main process object can't be carried out smoothly, had a strong impact on the Web content monitoring, subject development such as network data excavation.
Therefore, how to improve traditional Web data acquisition system (DAS), make it to support dynamic page to be resolved, become a research focus of current information acquisition technique.The experts and scholars of internet arena have done many useful researchs to this problem and have attempted, and have proposed constructive thinking and solution.The main method of current dynamic page collection has two kinds substantially: the one, utilize the browser interface (as Firefox) of increasing income, and with the form of writing plug-in unit browser output result is gathered; The 2nd, utilize existing script rendering engine (as SpiderMonkey, Rhino etc.) relevant DOM (DocumentObject Model, DOM Document Object Model) object to be bound according to the needs of information acquisition, the result gathers to output.Yet, also there are some problems in present research: the one, and present research is mainly climbed the universal method of getting dynamic web page towards design large scale network reptile, supports that for some directed targetedly data acquisitions (as the collection of particular forum or business website merchandise news) effect is not ideal enough; The 2nd, most of scheme realizes comparatively complicated, and is not suitable for instant on a small scale data acquisition demand.
Based on above reason, this paper simply climbs on the reptile basis of getting class forum structured data at one and expands, and has proposed a kind of collection dynamic page data-selected scheme based on the WebKit browser engine.By adopting Qt (a cross-platform C++ graphic user interface storehouse) framework, make calling program that reliability and professional platform independence preferably be arranged; By the mode that interface is separated with configuration file, make program have good expandability; Network environment at complexity has designed waiting-timeout mechanism, and the robustness of program is greatly improved.
The main method of current dynamic page collection has two kinds substantially: the one, utilize the browser interface (as Firefox) of increasing income, and with the form of writing plug-in unit browser output result is gathered; The 2nd, utilize existing script rendering engine (as SpiderMonkey, Rhino etc.) relevant DOM object to be bound according to the needs of information acquisition, the result gathers to output.
Mainly climb the universal method of getting dynamic web page at present, support that for some directed targetedly data acquisitions (as the collection of particular forum or business website merchandise news) effect is not ideal enough towards design large scale network reptile; Secondly most of scheme realizes comparatively complicated, and is not suitable for instant on a small scale data acquisition demand.
Summary of the invention
This paper simply climbs on the reptile basis of getting class forum structured data at one and expands, and has proposed a kind of collection dynamic page data-selected scheme based on the WebKit browser engine.By adopting the Qt framework, make calling program that reliability and professional platform independence preferably be arranged; By the mode that interface is separated with configuration file, make program have good expandability; Network environment at complexity has designed waiting-timeout mechanism, and the robustness of program is greatly improved.
The embodiment of the invention provides a kind of dynamic page collecting method based on the WebKit browser engine, comprising:
Send the http request to server end, receive the parent page data, make up dom tree, described transmission http request receives the parent page data, resolves js and makes up dom tree by the WebKit bottom layer realization;
At different websites, safeguard corresponding configuration file, comprise the js code that triggers corresponding event in the configuration file, pass to the js executive's interface that WebKit provides with the form of character string, corresponding by WebKit according to incident, upgrade dom tree;
Call the I/O interface of WebKit, dom tree is changed into the html form, with the form output of character string.
Described method adopts three thread modes to realize Data Receiving, specifically comprises:
Thread one is responsible for normal Data Receiving, monitors the loadFinished signal, if normally receive then terminate thread two, it is success status that the receiving flag position is set;
Thread two is the timer thread, this thread monitoring reception time, promptly think overtime if surpass predetermined time of reception, and stop normal receiving thread, and the receiving flag position is set is status of fail;
Thread three externally provides interface, and the receiving flag state is provided, and subsequent step can be handled accordingly by this zone bit.
Described triggering webpage event update dom tree comprises:
Js code in thread one running configuration file, the operation of simulation trigger event, and circular wait server data; Then wake thread two up if the reception data are finished, accepting state indication interface externally is provided; Thread two is responsible for wait thread one data and is finished renewal DOM.
This paper has provided a scheme of gathering the dynamic page data based on the WebKit browser engine.Structure and critical workflow to integral body have been done detailed explanation.Through on a plurality of forums and merchandise sales website, testing, verified that this method is feasible efficient, and strengthened the robustness of program by designing realization waiting-timeout mechanism, can tackle comparatively complicated network environment.Realized the demand of extensibility by the mode of configuration file.Its successful part just is that it has made up Web more dynamic and that response is more sensitive and has used, and has realized that the asynchronous parallel between browser and the server is handled, and has not only alleviated the burden of server end but also brought unique user experience.Dynamic page data acquisition for instant middle and small scale has good practical the reference to be worth.
Description of drawings
In order to be illustrated more clearly in the embodiment of the invention or technical scheme of the prior art, to do to introduce simply to the accompanying drawing of required use in embodiment or the description of the Prior Art below, apparently, accompanying drawing in describing below only is some embodiments of the present invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain other accompanying drawing according to these accompanying drawings.
Fig. 1 is a Web application model structural representation of the prior art;
Fig. 2 is the AJAX application model structural representation in the embodiment of the invention;
Fig. 3 is the WebKit browser inner core synoptic diagram in the embodiment of the invention;
Fig. 4 is QWebView, QWebPage in the embodiment of the invention and the relation structure diagram between the QWebFrame class;
Fig. 5 is that the class forum structured data in the embodiment of the invention is gathered architectural schematic;
Fig. 6 is the dynamic page acquisition module structural representation in the embodiment of the invention;
Fig. 7 is the QWebpage class interface synoptic diagram in the embodiment of the invention;
Fig. 8 is the Data Receiving process flow diagram in the embodiment of the invention;
Fig. 9 is the triggering webpage event flow diagram in the embodiment of the invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the invention, the technical scheme in the embodiment of the invention is clearly and completely described, obviously, described embodiment only is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills belong to the scope of protection of the invention not making all other embodiment that obtained under the creative work prerequisite.
This paper simply climbs on the reptile basis of getting class forum structured data at one and expands, and has proposed a kind of collection dynamic page data-selected scheme based on the WebKit browser engine.By adopting the Qt framework, make calling program that reliability and professional platform independence preferably be arranged; By the mode that interface is separated with configuration file, make program have good expandability; Network environment at complexity has designed waiting-timeout mechanism, and the robustness of program is greatly improved.
2.1AJAX definition and gordian technique
AJAX is the abbreviation of Asynchronous JavaScript and XML (asynchronous JavaScript and XML), is at first proposed in 2005 by famous user experience expert Jesse-James Garrett.AJAX is not a kind of new technology, but the combination of a series of Web correlation techniques that have been widely used, as XML, CSS, DOM, XMLHttpRequest, JavaScript etc.Its successful part just is that it has made up Web more dynamic and that response is more sensitive and has used, and has realized that the asynchronous parallel between browser and the server is handled, and has not only alleviated the burden of server end but also brought unique user experience.The AJAX of standard comprises:
(1) adopt XHTML and CSS standardization to show
(2) adopt DOM to realize dynamically showing with mutual
(3) adopt XML and XSLT to carry out data interaction and processing
(4) adopting XMLHttpRequest to carry out asynchronous data obtains
(5) adopt JavaScript to bind and deal with data
2.2Ajax the difference of model and traditional Web application model
Use different with traditional Web, AJAX is not that the mode based on static page makes up application, it is carried out the page of less amount and forms, wherein each page is the AJAX assembly of a more small-sized use JavaScript exploitation, these assemblies use the XMLHttpRequest object with asynchronous mode and server communication, obtain from server end after the data that need using DOM API to upgrade content of pages.
In traditional Web application model, typical alternant way is to send the HTTP request by client browser to Web server.Web server is handled user's request, and result is returned to client browser with the form of html page.The user must wait in sending HTTP request or Web server processing procedure, even and if only change sub-fraction content in the page, Web server all will return a complete Web page, has wasted a large amount of time and bandwidth.Traditional Web application model as shown in Figure 1.
What the principle of work of AJAX and traditional Web application model difference were to adopt between browser and the Web server is asynchronous communication means, between client and server, increased a middle layer---AJAX engine, in order to handle the request of client, realized mutual asynchronization.User's operation is also not all submitted to server, and some data verifications and processing are finished by the AJAX engine, has only and need just submit request by the AJAX engine to server when server reads new data.The AJAX engine obtains desired data with the backstage method of operation, does not need heavily loaded full page, only needs to upgrade the content of required part, has significantly reduced volume of transmitted data, has shortened the response time, has not only alleviated the burden of server end but also promoted user experience.The AJAX application model as shown in Figure 2.
2.3WebKit browser kernel
The predecessor of WebKit is the KHTML of KDE group, is the Web browser engine of increasing income, the just kernel of browser in simple terms.The Safari of Apple, the Chrome of Google, NokiaS60, the Webkit that the default browser of Android mobile phone all adopts be as kernel, is one of three big browser kernels of same Gecko, Trident and the current main flow that claims.Its engine efficient stable, compatible good, the source code clear in structure is easy to safeguard.WebKit mainly comprises three parts from code structure, as shown in Figure 3.
Wherein the core is WebCore, and it has realized the modelling to document, comprises CSS, DOM, and Render etc., JavaScript Core are the supports to JavaScript.And Webkit has partly taken out and the directly realization of some corresponding notions of browser, as WebView, and WebPage, WebFrame etc.Application program does not need direct control WebCore and JavaScript Core, but carries out alternately with the API that the WebKit module provides.
2.4Qt Development Framework reaches the support to WebKit
Qt is famous cross-platform C++ application development frameworks, from Qt4.5 integrated since the WebKit, the difference that its abundant general-purpose interface has easily blured application program and Web content.Its support to WebKit mainly comprises the several classes shown in the table 1.
The class of among table one Qt WebKit being supported
In these classes, topmost is QWebView, QWebPage, and three of QWebFrame, their relation is as shown in Figure 4.
The QWebView class can comprise the object of QWebPage and QWebFrame.QWebView is by creating the QwebPage object and then creating visual editable webpage.QWebFrame is the meta object of QWebPage, and each QWebPage object has a QWebFrame at least, is called mainframe, can obtain by QWebPage::mainframe () method.Can return the QWebPage object at its place by the page () method of calling QWebFramed.
2.5 class forum structured data is gathered the overall framework introduction
Forum, data are generally organized with double-layer structure in the business website, and this organizational form is called class forum structure, and ground floor is a list page in the typical class forum structure, and the second layer page is the main data volume of gathering.This type of data structure relative fixed, data volume greatly and comparatively concentrated have higher researching value.The acquisition system of these type of data of extracting adopts system as shown in Figure 5 substantially at present.
The mainly responsible maintenance of task scheduling modules is climbed and is got strategy, carries out the task distribution; List page Url abstraction module extracts ur1 in the list page according to different template, and safeguards to climb and get formation.The page capture module by http protocol access server, is gathered the page according to grasping strategy.Therefore this module needs accesses network, will handle complicated network anomaly situation, as overtime, network interruption etc.; Data memory module is responsible for storage system maintenance, need carry out a large amount of IO operations to database or file system.
2.6 General layout Plan
The dynamic page acquisition module is expansion and the improvement that traditional static page capture module is carried out.Committed step as shown in Figure 6.
Whole flow process is divided into three big steps:
One, sends the http request to server end, receive the parent page data, make up dom tree.Send request in this step, receive data, resolve js and make up dom tree by the WebKit bottom layer realization.
Two, at different websites, safeguard corresponding configuration file, comprise the js code that triggers corresponding event in the configuration file, pass to the js executive's interface that WebKit provides with the form of character string.Corresponding by WebKit according to incident, upgrade dom tree.
Three, call the I/O interface of WebKit, dom tree is changed into the html form, with the form output of character string.
The various interface that whole module mainly uses QWebPage to provide, the QWebPage class can be considered black box as shown in Figure 7.
The http request interface is responsible for sending request, and the QWebPage bottom layer realization receives operations such as data; Whether parameter is provided with interface and the agency is set, Loads Image automatically, whether can carries out js etc.; The JS calling interface reads in local js code, and the QWebPage bottom is resolved and carried out; The DOM export interface exports as the html format string with DOM.
2.7 Data Receiving is counted flow process
Because network condition is comparatively complicated comparatively speaking, server timeout, situations such as network interruption can impact normal data acquisition, need unusual condition be handled and be tackled at the reception data phase, yet WebKit provide interface to satisfy the demands.For this reason, adopt three thread modes to realize Data Receiving.Idiographic flow as shown in Figure 8.
Thread one is responsible for normal Data Receiving, monitors the loadFinished signal, if normally receive then terminate thread two, it is success status that the receiving flag position is set.
Thread two is the timer thread, this thread monitoring reception time, promptly think overtime if surpass predetermined time of reception, and stop normal receiving thread, and the receiving flag position is set is status of fail.
Thread three externally provides interface, and the receiving flag state is provided.Subsequent step can be handled accordingly by this zone bit.
2.8 trigger webpage event update DOM flow process
For some webpage, some incident and the server that need to trigger on the webpage carry out just obtaining desired data alternately.For example often need the button on some webpage clicking during the browser client browsing page, could on the page, see data.For the data acquisition that realizes robotization needs the behavior that program should be able to the simulates real real user, as click the mouse scroll through pages etc.Similar operations can be write the action of js code simulation trigger event by at the different web sites custom configuration file.Program provides local js code call interface, and the js code will be passed to WebKit with the form of character string, is triggered the operation of upgrading DOM by kernel modules at lower layers realization event.For guaranteeing and the robustness of server interaction that the mode that has designed the two-wire journey realizes above-mentioned functions.Idiographic flow as shown in Figure 9.
Js code in thread one running configuration file, the operation of simulation trigger event, and circular wait server data.Then wake thread two up if the reception data are finished, accepting state indication interface externally is provided.Thread two is responsible for wait thread one data and is finished renewal DOM.
This paper has provided a scheme of gathering the dynamic page data based on the WebKit browser engine.Structure and critical workflow to integral body have been done detailed explanation.Through on a plurality of forums and merchandise sales website, testing, verified that this method is feasible efficient, and strengthened the robustness of program by designing realization waiting-timeout mechanism, can tackle comparatively complicated network environment.Realized the demand of extensibility by the mode of configuration file.
Its successful part just is that it has made up Web more dynamic and that response is more sensitive and has used, and has realized that the asynchronous parallel between browser and the server is handled, and has not only alleviated the burden of server end but also brought unique user experience.Dynamic page data acquisition for instant middle and small scale has good practical the reference to be worth.
Because present research is mainly climbed the universal method of getting dynamic web page towards design large scale network reptile, support that for some directed targetedly data acquisitions (as the collection of particular forum or business website merchandise news) effect is not ideal enough; And most of scheme realization is comparatively complicated, and be not suitable for instant on a small scale data acquisition demand.
This programme is simply climbed on the reptile basis of getting class forum structured data at one and is expanded, and has proposed a kind of collection dynamic page data-selected scheme based on the WebKit browser engine.By adopting the Qt framework, make calling program that reliability and professional platform independence preferably be arranged; By the mode that interface is separated with configuration file, make program have good expandability; Network environment at complexity has designed waiting-timeout mechanism, and the robustness of program is greatly improved.
Need to prove, contents such as the information interaction between said apparatus and intrasystem each unit, implementation since with the inventive method embodiment based on same design, particular content can repeat no more referring to the narration among the inventive method embodiment herein.
One of ordinary skill in the art will appreciate that all or part of step in the whole bag of tricks of the foregoing description is to instruct relevant hardware to finish by program, this program can be stored in the computer-readable recording medium, storage medium can comprise: ROM (read-only memory) (ROM, Read Only Memory), random access memory (RAM, Random Access Memory), disk or CD etc.
More than to a kind of embedded home gateway web server system that the embodiment of the invention provided, be described in detail, used specific case herein principle of the present invention and embodiment are set forth, the explanation of above embodiment just is used for helping to understand method of the present invention and core concept thereof; Simultaneously, for one of ordinary skill in the art, according to thought of the present invention, the part that all can change in specific embodiments and applications, in sum, this description should not be construed as limitation of the present invention.
Claims (3)
1. the dynamic page collecting method based on the WebKit browser engine is characterized in that, comprising:
Send the http request to server end, receive the parent page data, make up dom tree, described transmission http request receives the parent page data, resolves js and makes up dom tree by the WebKit bottom layer realization;
At different websites, safeguard corresponding configuration file, comprise the js code that triggers corresponding event in the configuration file, pass to the js executive's interface that WebKit provides with the form of character string, corresponding by WebKit according to incident, upgrade dom tree;
Call the I/O interface of WebKit, dom tree is changed into the html form, with the form output of character string.
2. the dynamic page collecting method based on the WebKit browser engine as claimed in claim 1 is characterized in that, described method adopts three thread modes to realize Data Receiving, specifically comprises:
Thread one is responsible for normal Data Receiving, monitors the loadFinished signal, if normally receive then terminate thread two, it is success status that the receiving flag position is set;
Thread two is the timer thread, this thread monitoring reception time, promptly think overtime if surpass predetermined time of reception, and stop normal receiving thread, and the receiving flag position is set is status of fail;
Thread three externally provides interface, and the receiving flag state is provided, and subsequent step can be handled accordingly by this zone bit.
3. the dynamic page collecting method based on the WebKit browser engine as claimed in claim 2 is characterized in that, described triggering webpage event update dom tree comprises:
Js code in thread one running configuration file, the operation of simulation trigger event, and circular wait server data; Then wake thread two up if the reception data are finished, accepting state indication interface externally is provided; Thread two is responsible for wait thread one data and is finished renewal DOM.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2011101618008A CN102214098A (en) | 2011-06-15 | 2011-06-15 | Dynamic webpage data acquisition method based on WebKit browser engine |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2011101618008A CN102214098A (en) | 2011-06-15 | 2011-06-15 | Dynamic webpage data acquisition method based on WebKit browser engine |
Publications (1)
Publication Number | Publication Date |
---|---|
CN102214098A true CN102214098A (en) | 2011-10-12 |
Family
ID=44745419
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2011101618008A Pending CN102214098A (en) | 2011-06-15 | 2011-06-15 | Dynamic webpage data acquisition method based on WebKit browser engine |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102214098A (en) |
Cited By (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103139260A (en) * | 2011-11-30 | 2013-06-05 | 国际商业机器公司 | Method and system for reusing hypertext markup language (HTML) content |
CN103365919A (en) * | 2012-04-09 | 2013-10-23 | 纽海信息技术(上海)有限公司 | Webpage analysis container and method |
WO2013159745A1 (en) * | 2012-04-28 | 2013-10-31 | 广州市动景计算机科技有限公司 | Webpage browsing method, webapp framework, method and device for executing javascript and mobile terminal |
CN103973805A (en) * | 2014-05-20 | 2014-08-06 | 浪潮电子信息产业股份有限公司 | Interaction method of dynamic web page and server |
CN105204922A (en) * | 2014-06-30 | 2015-12-30 | 金电联行(北京)信息技术有限公司 | Collecting method of client terminal of data collecting platform |
CN105512193A (en) * | 2015-11-26 | 2016-04-20 | 上海携程商务有限公司 | Data acquisition system and method based on browser expansion |
CN105630512A (en) * | 2016-02-17 | 2016-06-01 | 北京高绎信息技术有限公司 | Method and system for implementing mobile device data tracking through software development toolkit |
CN105630473A (en) * | 2014-11-03 | 2016-06-01 | 中国科学院声学研究所 | JavaScript event extension method supporting asynchronous call |
CN105989134A (en) * | 2015-02-26 | 2016-10-05 | 小米科技有限责任公司 | Webpage recording method and device |
CN106649567A (en) * | 2016-11-15 | 2017-05-10 | 杭州安恒信息技术有限公司 | Web crawler system based on browser kernel |
CN106951450A (en) * | 2017-02-22 | 2017-07-14 | 北京麒麟合盛网络技术有限公司 | A kind of webpage information acquisition method, device and computing device |
CN106997374A (en) * | 2017-01-05 | 2017-08-01 | 深圳大宇无限科技有限公司 | Deep linking acquisition methods and device |
CN107025124A (en) * | 2015-06-24 | 2017-08-08 | 上海中信信息发展股份有限公司 | Web technologies develop the system architecture of desktop |
CN107766509A (en) * | 2017-10-23 | 2018-03-06 | 北京京东尚科信息技术有限公司 | A kind of method and apparatus of webpage static backup |
CN108255802A (en) * | 2016-12-29 | 2018-07-06 | 北京国双科技有限公司 | Generic text Analytical framework and the method and apparatus based on framework parsing text |
CN109542437A (en) * | 2018-11-16 | 2019-03-29 | 北京科罗菲特科技有限公司 | A kind of HMI development approach based on Linux built-in browser |
CN109582353A (en) * | 2017-09-26 | 2019-04-05 | 北京国双科技有限公司 | The method and device of embedding data acquisition code |
CN109800369A (en) * | 2018-12-14 | 2019-05-24 | 平安普惠企业管理有限公司 | Hybrid app page loading method, device and computer equipment |
CN110032493A (en) * | 2019-03-13 | 2019-07-19 | 平安城市建设科技(深圳)有限公司 | Monitoring method, device, terminal and the readable storage medium storing program for executing of the page |
CN111125597A (en) * | 2019-12-18 | 2020-05-08 | 百度在线网络技术(北京)有限公司 | Webpage loading method, browser, electronic equipment and storage medium |
CN111198998A (en) * | 2019-12-31 | 2020-05-26 | 北京指掌易科技有限公司 | Network page loading method, device and system based on Ajax request |
CN111523074A (en) * | 2020-04-26 | 2020-08-11 | 成都思维世纪科技有限责任公司 | Acquisition system for dynamic page sensitive data of front-end rendering website |
CN113742550A (en) * | 2021-08-20 | 2021-12-03 | 广州市易工品科技有限公司 | Data acquisition method, device and system based on browser |
CN114428635A (en) * | 2022-04-06 | 2022-05-03 | 杭州未名信科科技有限公司 | Data acquisition method and device, electronic equipment and storage medium |
CN117032656A (en) * | 2023-10-09 | 2023-11-10 | 北京优锘科技股份有限公司 | WebAssemble-based front-end multithreading encoding and decoding method, medium and device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1605987A (en) * | 2004-11-17 | 2005-04-13 | 中兴通讯股份有限公司 | Method for realizing real time threads state monitoring in multiple thread system |
CN1700177A (en) * | 2005-06-24 | 2005-11-23 | 中国人民解放军国防科学技术大学 | Method for constructing Web server based on soft flow construction and server thereof |
CN101089856A (en) * | 2007-07-20 | 2007-12-19 | 李沫南 | Method for abstracting network data and web reptile system |
US7536389B1 (en) * | 2005-02-22 | 2009-05-19 | Yahoo ! Inc. | Techniques for crawling dynamic web content |
-
2011
- 2011-06-15 CN CN2011101618008A patent/CN102214098A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1605987A (en) * | 2004-11-17 | 2005-04-13 | 中兴通讯股份有限公司 | Method for realizing real time threads state monitoring in multiple thread system |
US7536389B1 (en) * | 2005-02-22 | 2009-05-19 | Yahoo ! Inc. | Techniques for crawling dynamic web content |
CN1700177A (en) * | 2005-06-24 | 2005-11-23 | 中国人民解放军国防科学技术大学 | Method for constructing Web server based on soft flow construction and server thereof |
CN101089856A (en) * | 2007-07-20 | 2007-12-19 | 李沫南 | Method for abstracting network data and web reptile system |
Cited By (40)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9507759B2 (en) | 2011-11-30 | 2016-11-29 | International Business Machines Corporation | Method and system for reusing HTML content |
CN103139260B (en) * | 2011-11-30 | 2015-09-30 | 国际商业机器公司 | For reusing the method and system of HTML content |
US10318616B2 (en) | 2011-11-30 | 2019-06-11 | International Business Machines Corporation | Method and system for reusing HTML content |
CN103139260A (en) * | 2011-11-30 | 2013-06-05 | 国际商业机器公司 | Method and system for reusing hypertext markup language (HTML) content |
CN103365919A (en) * | 2012-04-09 | 2013-10-23 | 纽海信息技术(上海)有限公司 | Webpage analysis container and method |
CN103365919B (en) * | 2012-04-09 | 2018-07-31 | 北京京东尚科信息技术有限公司 | Web analysis container and method |
WO2013159745A1 (en) * | 2012-04-28 | 2013-10-31 | 广州市动景计算机科技有限公司 | Webpage browsing method, webapp framework, method and device for executing javascript and mobile terminal |
US10185704B2 (en) | 2012-04-28 | 2019-01-22 | Guangzhou Ucweb Computer Technology Co., Ltd. | Webpage browsing method, webapp framework, method and device for executing javascript and mobile terminal |
RU2604326C2 (en) * | 2012-04-28 | 2016-12-10 | Гуанчжоу Юсивэб Компьютер Тэкнолоджи Ко., Лтд | Webpage browsing method, webapp framework, method and device for executing javascript and mobile terminal |
CN103973805A (en) * | 2014-05-20 | 2014-08-06 | 浪潮电子信息产业股份有限公司 | Interaction method of dynamic web page and server |
CN105204922B (en) * | 2014-06-30 | 2018-12-07 | 金电联行(北京)信息技术有限公司 | A kind of data acquisition platform client acquisition method |
CN105204922A (en) * | 2014-06-30 | 2015-12-30 | 金电联行(北京)信息技术有限公司 | Collecting method of client terminal of data collecting platform |
CN105630473B (en) * | 2014-11-03 | 2019-01-22 | 中国科学院声学研究所 | Support the JavaScript event extended method of asynchronous call |
CN105630473A (en) * | 2014-11-03 | 2016-06-01 | 中国科学院声学研究所 | JavaScript event extension method supporting asynchronous call |
CN105989134A (en) * | 2015-02-26 | 2016-10-05 | 小米科技有限责任公司 | Webpage recording method and device |
CN107025124A (en) * | 2015-06-24 | 2017-08-08 | 上海中信信息发展股份有限公司 | Web technologies develop the system architecture of desktop |
CN105512193A (en) * | 2015-11-26 | 2016-04-20 | 上海携程商务有限公司 | Data acquisition system and method based on browser expansion |
CN105630512A (en) * | 2016-02-17 | 2016-06-01 | 北京高绎信息技术有限公司 | Method and system for implementing mobile device data tracking through software development toolkit |
WO2017140227A1 (en) * | 2016-02-17 | 2017-08-24 | 北京高绎信息技术有限公司 | Method and system for realizing mobile device data tracking using software development kit |
CN106649567A (en) * | 2016-11-15 | 2017-05-10 | 杭州安恒信息技术有限公司 | Web crawler system based on browser kernel |
CN108255802B (en) * | 2016-12-29 | 2021-08-24 | 北京国双科技有限公司 | Universal text parsing architecture and method and device for parsing text based on architecture |
CN108255802A (en) * | 2016-12-29 | 2018-07-06 | 北京国双科技有限公司 | Generic text Analytical framework and the method and apparatus based on framework parsing text |
CN106997374A (en) * | 2017-01-05 | 2017-08-01 | 深圳大宇无限科技有限公司 | Deep linking acquisition methods and device |
CN106951450A (en) * | 2017-02-22 | 2017-07-14 | 北京麒麟合盛网络技术有限公司 | A kind of webpage information acquisition method, device and computing device |
CN106951450B (en) * | 2017-02-22 | 2020-04-07 | 麒麟合盛网络技术股份有限公司 | Webpage information acquisition method and device and computing equipment |
CN109582353A (en) * | 2017-09-26 | 2019-04-05 | 北京国双科技有限公司 | The method and device of embedding data acquisition code |
CN107766509A (en) * | 2017-10-23 | 2018-03-06 | 北京京东尚科信息技术有限公司 | A kind of method and apparatus of webpage static backup |
CN109542437A (en) * | 2018-11-16 | 2019-03-29 | 北京科罗菲特科技有限公司 | A kind of HMI development approach based on Linux built-in browser |
CN109800369A (en) * | 2018-12-14 | 2019-05-24 | 平安普惠企业管理有限公司 | Hybrid app page loading method, device and computer equipment |
CN110032493A (en) * | 2019-03-13 | 2019-07-19 | 平安城市建设科技(深圳)有限公司 | Monitoring method, device, terminal and the readable storage medium storing program for executing of the page |
CN111125597A (en) * | 2019-12-18 | 2020-05-08 | 百度在线网络技术(北京)有限公司 | Webpage loading method, browser, electronic equipment and storage medium |
CN111125597B (en) * | 2019-12-18 | 2023-10-27 | 百度在线网络技术(北京)有限公司 | Webpage loading method, browser, electronic equipment and storage medium |
CN111198998A (en) * | 2019-12-31 | 2020-05-26 | 北京指掌易科技有限公司 | Network page loading method, device and system based on Ajax request |
CN111198998B (en) * | 2019-12-31 | 2023-08-08 | 北京指掌易科技有限公司 | Method, device and system for loading network page based on Ajax request |
CN111523074A (en) * | 2020-04-26 | 2020-08-11 | 成都思维世纪科技有限责任公司 | Acquisition system for dynamic page sensitive data of front-end rendering website |
CN113742550A (en) * | 2021-08-20 | 2021-12-03 | 广州市易工品科技有限公司 | Data acquisition method, device and system based on browser |
CN113742550B (en) * | 2021-08-20 | 2024-04-19 | 广州市易工品科技有限公司 | Browser-based data acquisition method, device and system |
CN114428635A (en) * | 2022-04-06 | 2022-05-03 | 杭州未名信科科技有限公司 | Data acquisition method and device, electronic equipment and storage medium |
CN117032656A (en) * | 2023-10-09 | 2023-11-10 | 北京优锘科技股份有限公司 | WebAssemble-based front-end multithreading encoding and decoding method, medium and device |
CN117032656B (en) * | 2023-10-09 | 2024-02-02 | 北京优锘科技股份有限公司 | WebAssemble-based front-end multithreading encoding and decoding method, medium and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102214098A (en) | Dynamic webpage data acquisition method based on WebKit browser engine | |
US10216855B2 (en) | Mobilizing an existing web application | |
CN101127038B (en) | System and method for downloading website static web page | |
CN101122921B (en) | Method forming tree-shaped display structure based on ajax and html | |
CN102693280B (en) | Webpage browsing method, WebApp framework, method and device for executing JavaScript, and mobile terminal | |
CN102184266B (en) | Method for automatically generating dynamic wireless application protocol (WAP) website for separation of page from data | |
CN103268361B (en) | Extracting method, the device and system of URL are hidden in webpage | |
CN103412890A (en) | Webpage loading method and device | |
CN103034724B (en) | Browser is carried out input the method and device that data are recovered | |
CN105243159A (en) | Visual script editor-based distributed web crawler system | |
CN102194003A (en) | Web page popup window method and device | |
CN102591647A (en) | Converting desktop applications to web applications | |
CN101799753B (en) | Method and device for realizing tree structure | |
CN103034568A (en) | Method and device for recovering input data of browser | |
CA2911670A1 (en) | System and method for identifying web elements present on a web-page | |
CN104049991A (en) | Method and system for converting network applications into mobile applications | |
CN102520966B (en) | Method for prompting codes and device | |
CN103577599A (en) | Method and device for storing local data through mobile terminal | |
CN103019538A (en) | Method and system for implementing application interface in terminal | |
CN103645908A (en) | Full life circle development achievement system of intemetware | |
CN106897347A (en) | A kind of web page display method, Action Events recording method and device | |
CN105528369B (en) | Webpage code-transferring method, device and server | |
CN102830974A (en) | Visual auxiliary development tool for rapid generation of JAVA codes | |
CN103853717A (en) | Web crawler | |
CN101876998A (en) | Method and system for editing data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C12 | Rejection of a patent application after its publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20111012 |