CN105162676B - A kind of wechat data capture method and system - Google Patents

A kind of wechat data capture method and system Download PDF

Info

Publication number
CN105162676B
CN105162676B CN201510363826.9A CN201510363826A CN105162676B CN 105162676 B CN105162676 B CN 105162676B CN 201510363826 A CN201510363826 A CN 201510363826A CN 105162676 B CN105162676 B CN 105162676B
Authority
CN
China
Prior art keywords
wechat
data
android
account
public
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510363826.9A
Other languages
Chinese (zh)
Other versions
CN105162676A (en
Inventor
沙灜
包秀国
程工
陈学敏
贺敏
梁棋
马宏远
王卿
庞琳
李雄
刘玮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
National Computer Network and Information Security Management Center
Original Assignee
Institute of Information Engineering of CAS
National Computer Network and Information Security Management Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS, National Computer Network and Information Security Management Center filed Critical Institute of Information Engineering of CAS
Priority to CN201510363826.9A priority Critical patent/CN105162676B/en
Publication of CN105162676A publication Critical patent/CN105162676A/en
Application granted granted Critical
Publication of CN105162676B publication Critical patent/CN105162676B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a kind of wechat data capture method and system, it is combined to realize by Android platform application test suite and browser testing component.Based on Android platform test suite mode, client can be operated with modelling customer behavior, including log in, check, screen sliding etc.;Thus the web page address combination active acquisition technique for the wechat public platform account history message that mode is obtained, can obtain the complete history message of wechat public's account.Mode is combined using Android platform test suite and browser testing component, changed by monitoring browser DOM element, can automate and realize that webpage version wechat is logged in, wechat data are obtained comprehensively and timely and effectively.

Description

A kind of wechat data capture method and system
Technical field
Field is gathered the present invention relates to social network data, is related to a kind of wechat data capture method and system, specifically relates to And the wechat data capture method and system being combined based on Android platform application test suite and browser testing component.
Background technology
According to statistics, China mobile user is more than 900,000,000, by by the end of December, 2011, and Chinese netizen's scale reaches 3.56 hundred million, intelligence Can mobile phone netizen up to 1.9 hundred million, while with the arrival and the popularization of smart mobile phone in 3G epoch, the user surfed the Net using smart mobile phone Quantity shows the visible trend more than the number of users using online computing.With reference to quick 3G/4G networks, mobile phone has height The characteristics of speed, multimedia, personalization, the interactive tools for being easy to link up are carried with as people.
Information Communication under Web2.0 technologies is greatly reinforced due to the polymerization of network, promotes new Media Ecology The appearance of environment.Media information is also fought in different parts new media, utilization of the media information publisher to network and mobile phone from traditional platform Resort to every conceivable means, the media information such as SMS, mobile phone microblogging is visible everywhere.Media letter based on mobile telephone instant communication (IM) Breath is propagated also in starting developing stage, but by the concern such as all multimedias and tissue individual.
On January 21st, 2011, Tengxun formally releases the wechat based on QQ user.It is short that this quickly sends voice by network Letter, video, picture and word, support the mobile phone chats software of many people's group chats, user is carried out shape by wechat and good friend The contact for being similar to the modes such as short message, multimedia message more abundant in formula.Wechat can be described as between mobile phone QQ and microblogging The third social networks, it is changing the social life mode of people.Accumulative after more than 40 edition upgradings, wechat Itself form a three-dimensional communication matrix:X-coordinate is voice, word, picture, video;Y-coordinate is cell phone address book, intelligence Cell-phone customer terminal, QQ, microblogging, mailbox;Z coordinate is LBS positioning, drift bottle, shake, Quick Response Code identification.Crisscross solid The social chain of change, covers work, the multi-level demand face of life, and in this three dimensions, each communication chain is complete Intersect, each platform intercommunication share, this be other IM instruments it is incomparable.
Now facing towards the data acquisition technology of wechat platform mainly has:(1) client is manually cracked, including communication protocol is broken Solution, can quick obtaining data, but client cracks and needs stronger reverse Engineering Technology simultaneously, while with wechat version Upgrading, its security mechanism is constantly upgraded, and there is a possibility that failure.Crack difficulty larger, cost is too high.(2) Web agreements mould Intend, the communication protocol of webpage version wechat is analyzed by modes such as packet capturings, the purpose of data acquisition is reached by protocol emulation, still Consider with wechat edition upgrading, there is agreement upgrading change, the possibility of protocol emulation failure is long from long-term practical standpoint Phase safeguards that input cost is larger.Meanwhile, the method can not avoid artificial two-dimensional code scanning and log in, in larger scale data acquisition environment Down, it is necessary to extra frequently artificial expense.
The data that the technology that social network data is obtained is mainly towards PC ends social network media are adopted due to traditional Collection, and wechat only provides the client of mobility device and web modes (the wherein macOSX platforms comprising basic chat feature Wechat client is that web modes are encapsulated, and function only includes basic chat feature), so for user's history message and data etc. The acquisition of information and the more message of long-range consideration can only be from wechat mobile device end.
To sum up, wechat is due to its security mechanism and the closure of itself ecosphere, for wechat client crack or Communication protocol crack cost input it is excessive and with edition upgrading crack achievement be difficult to ensure that it is permanent effectively.
The content of the invention
In order to realize the data acquisition towards wechat platform, while ensure the relative fullness and instantaneity of its data, In addition consider to avoid the artificial operation of pure net page version acquisition modes, the present invention proposes a kind of wechat data capture method and be System, is combined to realize by Android platform application test suite and browser testing component.
It should be noted that Android platform application test suite is initially the automation function survey towards Android platform app Examination, the interface opened by Android realizes the positioning and operation of Android control.Because app applications can resolve to different control compositions Tree.And browser testing component initially faces automated function test or the pressure test of desktop browsers.
To achieve these goals, the present invention is achieved through the following technical solutions:
A kind of wechat data capture method, is by the way that Android platform application test suite is mutually tied with browser testing component Close and obtain what wechat delayed data and instant data were realized respectively.
An interface can only be operated simultaneously in view of single app, therefore the acquisition of wechat delayed data and instant data is Carry out respectively.
Wechat delayed data is obtained to comprise the following steps:
Target public account (the public accounts of wechat of data i.e. to be obtained) 1-1) is obtained by task allocation schedule mechanism, And paid close attention to.
Wechat client 1-2) is operated by Android platform application test suite modelling customer behavior, into public's account money Expect interface, obtain account data information.
History message interface 1-3) is entered by wechat public's account profile interface, forwarding is clicked on, selection is sent to friend, with Machine selects any good friend, and into confirming to forward interface, the Android control at interface is forwarded by positioning, with extracting the history message page Location.
1-4) by browser testing component opening steps 1-3) extract history message page address, obtain phase after loading The Page messages data answered.
Above-mentioned history message page address and corresponding Page messages data 1-5) are subjected to analysis and hyperlink request splices, Obtain all history messages of wechat public's account.
Step 1-5) detailed process be:History message page address and Page messages data for acquisition, are analyzed To the rule of different parameters, the form for carrying out next round request based on this rule splices and sends application to server, not open close Cross the data returned and carry out next round request splicing, the request of data of AJAX communication process simulation is realized, so as to obtain the wechat All history messages of public's account.
Described Page messages data include message id, accurate issuing time, quote the URL of front cover, the source of message The data such as location, message content.
The instant data of wechat are obtained to comprise the following steps:
Webpage version wechat 2-1) is opened by browser testing component, obtains and downloads Quick Response Code.
2-2) by step 2-1) Quick Response Code that downloads to is transferred to Android simulator or Android prototype, then flat by Android Platform application test suite is opened and logs in wechat client, client automatically scanning Quick Response Code, before obtaining and select from photograph album The Quick Response Code transmitted, automatically scanning Quick Response Code simultaneously passes through ACK button logon web page version wechat.
2-3) by browser testing component, webpage version wechat page DOM element tree node is monitored, quick analysis obtains micro- Believe instant data.
Further, step 2-2) in the adb instruments that are provided by Android by step 2-1) Quick Response Code that downloads to passes It is defeated by Android simulator or Android prototype.
Surveyed present invention also offers a kind of wechat data-acquisition system, including Android platform application test suite, browser Try component and data acquisition module;
The browser testing component is used to open webpage version wechat, obtains and downloads Quick Response Code, and monitors page DOM Tree node is to realize the acquisition of instant messages;
The Android platform application test suite is used for modelling customer behavior and carries out Android end app two-dimensional code scannings and confirmation Logon operation is to realize that Web ends wechat is logged in;And enter public's account history message page for modelling customer behavior, pass through The control property of forwarding capability is parsed, public's account history message page address is obtained, using this address as seed, with reference to history Message page element, splicing obtains new request of data and linked to the new request of data of server initiation, then the number for passing through return According to new request of data is spliced, circulated with this, obtain the whole history message data of the account;
The data acquisition module is used to Android platform test suite and browser testing component being combined, and obtains wechat Instant data and delayed data.
Further, said system also include data memory module, for store obtain the instant data of wechat and it is non-i.e. When data.
The positive effect of the present invention is as follows:
Based on Android platform test suite mode, client can be operated with modelling customer behavior, including log in, check, sliding Screen etc.;Thus the web page address combination active acquisition technique for the wechat public platform account history message that mode is obtained, can be obtained The complete history message of wechat public's account.Android platform test suite has got around client and cracked, reduce development cost with And disabler is cracked with what edition upgrading may be brought, in combination with the mode of active collection, message can be dramatically speeded up Acquisition speed.In addition, the manner can obtain the information such as the data of wechat public platform account.
1) instant message can not be obtained directly by changing interface, it is necessary to obtain newest history message page under Android simulator Face URL simultaneously obtains newest message by URL, thus needs more interfaces to click on switching and in order to ensure large-scale data The instantaneity of acquisition, exist safeguard a history message page URL table possibility demand, it is necessary to resource consumption it is excessive.It is based on The mode of browser testing component, can operate WEB page, advantage is being capable of modelling customer behavior behaviour with modelling customer behavior Make, dom tree is positioned and operate by XPath modes to obtain data, while only needing to operate according to user in later maintenance Carry out less logic change and be just adapted to redaction, it is to avoid because wechat edition upgrading and may caused by because original The problem of later maintenance cost that basic communication protocol failure is brought is excessive.
2) mode is combined using Android platform test suite and browser testing component, by monitoring browser DOM members Element change, quick obtaining wechat platform instant message.In this way, it is avoided that under pure simulator Android mobile client and disappears The tedious steps obtained are ceased, because under mobile client, the acquisition of instant message can not pass through Android official or third party's work Tool is directly obtained, it is necessary to which the web page address for refreshing history message is obtained, compared to directly being obtained by web modes by page elements It is excessively cumbersome for the mode of data.In addition, the artificial Quick Response Code that this mode can be prevented effectively from logon web page version wechat is swept Operation is retouched, in the case of large-scale data acquisition, human resources and extra hardware device resources are greatly saved.
In summary, the wechat data acquisition being combined based on Android platform application test suite and browser testing component Method and system, which can be automated, realizes that webpage version wechat is logged in, and wechat data are obtained comprehensively and timely and effectively.
Brief description of the drawings
Fig. 1 is wechat data-acquisition system frame diagram in the embodiment of the present invention.
Embodiment
The present invention includes instant message and non-instant message is obtained, specific as follows:
(1) the wechat non-instant message based on Android platform application test suite is obtained.Four steps can be specifically divided into Suddenly:
A) target account and concern are obtained:Target account is obtained by task allocation schedule mechanism, and judged whether Concern, because in the case where not paying close attention to, the quantity for the account historical message that can be obtained is limited.Concrete operations are to enter Public's account interface, clicks on the upper right corner and is searched for into public's account, input keyword or public's account id are scanned for, such as crucial Word " bad joke ", obtains searching structure list, clicks on one of result, such as " bad joke is selected ", has checked whether " to pay close attention to " Printed words, are paid close attention to if so, then clicking on.
B) wechat public account data information is obtained:Artificial behavior operation is simulated by Android platform application test suite micro- Believe client, into public's account profile interface, obtain account profile information.Concrete operations are that click has focused on public's account Some public's account of list, clicks on upper right corner button, and into profile interface, its open money is obtained by parsing each control property Material, such as function introduction, type, official's authentication information content.
C) history message address acquisition:History message interface is entered by wechat public's account profile interface, because wechat is public Many account history message interfaces are actually based on Android default browser kernel WebKit realizations, it is impossible to directly obtain message, pass through Send or the button such as share and extract history message page address.Concrete operations are, in public's account profile interface, click on " history Message ", into history message interface, clicks on upper right corner button, then clicks in pop-up box and be shared with friends button, any selection Confirmation frame is shared in one good friend, ejection one, now, and can analyze it by Android platform test suite shares confirmation frame control category Property, obtain this public's account history message address.
D) analysis splicing is carried out by step c) the history message page addresses obtained and to the Page messages data of acquisition, and All history messages of wechat public's account are obtained by web data acquiring technology.Concrete operations are to be gone through by previously obtained History message addresses send request, obtain 20 records before the history message, and analysis the last item record obtains history next time and disappeared The parameter of request is ceased, the request form before being spliced into continues to send new request of data to server, with this repeatedly, until Judge that more fields is false in certain data once obtained.
(2) the wechat instant message of mode is combined based on Android platform application test suite and browser testing component Obtain, can specifically be divided into three steps:
A) webpage version wechat is opened and Quick Response Code is downloaded:The switching and message for being limited to Android platform wechat client are obtained The complexity taken, webpage version wechat acquisition instant message is more quick and convenient, and the sole mode that webpage version wechat is logged in is two Dimension code scanning is logged in, so opening webpage version wechat by browser testing component, is obtained and is downloaded Quick Response Code.
B) Quick Response Code is transmitted and scanned:The adb instruments provided by the step a) Quick Response Codes downloaded using Android official are passed through Quick Response Code is transferred to Android simulator or Android prototype by push orders, is then opened simultaneously by Android platform application test suite Wechat client is logged in, function is swept by wherein sweeping, the Quick Response Code transmitted before, client are obtained and selected from photograph album By automatically scanning Quick Response Code.Pass through ACK button logon web page version wechat.
C) instant message is obtained:After previous step confirmation logon web page version wechat, browser testing group is continued through Part, monitors DOM element, it is assumed that existing n bars record, then constantly searches the element of (n+1)th record, if any then quick analysis is obtained Take instant message.

Claims (6)

1. a kind of wechat data capture method, is by the way that Android platform application test suite and browser testing component are combined Acquisition wechat delayed data and instant data are realized respectively, wherein:
Wechat delayed data is obtained to comprise the following steps:
The target public's account 1-1) is obtained by task allocation schedule mechanism, and paid close attention to;
Wechat client 1-2) is operated by Android platform application test suite modelling customer behavior, into public's account data circle Face, obtains account data information;
History message interface 1-3) is entered by wechat public's account profile interface, forwarding is clicked on, selection is sent to friend, random choosing Any good friend is selected, into confirming to forward interface, the Android control at interface is forwarded by positioning, history message page address is extracted;
1-4) by browser testing component opening steps 1-3) the history message page address extracted, obtained after loading corresponding Page messages data;
Above-mentioned history message page address and corresponding Page messages data 1-5) are subjected to analysis and hyperlink request splices, is obtained All history messages of wechat public's account;
The instant data of wechat are obtained to comprise the following steps:
Webpage version wechat 2-1) is opened by browser testing component, obtains and downloads Quick Response Code;
2-2) by step 2-1) Quick Response Code that downloads to is transferred to Android simulator or Android prototype, and then should by Android platform Opened with test suite and log in wechat client, client automatically scanning Quick Response Code is transmitted before obtaining and selecting from photograph album The Quick Response Code come over, automatically scanning Quick Response Code simultaneously passes through ACK button logon web page version wechat;
2-3) by browser testing component, webpage version wechat page DOM element tree node is monitored, quick analysis obtains wechat and is When data.
2. wechat data capture method as claimed in claim 1, it is characterised in that step 1-5) detailed process be:For The history message page address of acquisition and Page messages data, analysis obtain the rule of different parameters, under being carried out based on this rule The form splicing of one wheel request simultaneously sends application to server, and constantly carrying out next round request by the data of return splices, real The request of data of existing AJAX communication process simulation, so as to obtain all history messages of wechat public's account.
3. wechat data capture method as claimed in claim 1, it is characterised in that described Page messages data include message ID, accurate issuing time, URL, the source address of message and the message content for quoting front cover.
4. wechat data capture method as claimed in claim 1, it is characterised in that step 2-2) in provided by Android Adb instruments by step 2-1) Quick Response Code that downloads to is transferred to Android simulator or Android prototype.
5. a kind of wechat data-acquisition system, including Android platform application test suite, browser testing component and data acquisition Module;
The browser testing component is used to open webpage version wechat, obtains and downloads Quick Response Code, and monitors page dom tree section Put to realize the acquisition of instant messages;
The Android platform application test suite is used for modelling customer behavior progress Android end app two-dimensional code scannings and confirmation is logged in Operate to realize that Web ends wechat is logged in;And
Enter public's account history message page for modelling customer behavior, by parsing the control property of forwarding capability, obtain Public's account history message page address, using this address as seed, with reference to history message page elements, splicing obtains new number New request of data is initiated to server according to hyperlink request, then splices new request of data by the data of return, is circulated with this, Obtain the whole history message data of the account;
The data acquisition module is used to Android platform test suite and browser testing component being combined, and obtains wechat instant Data and delayed data.
6. wechat data-acquisition system as claimed in claim 5, it is characterised in that also including data memory module, for depositing Store up the instant data of wechat and delayed data obtained.
CN201510363826.9A 2015-04-03 2015-06-26 A kind of wechat data capture method and system Active CN105162676B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510363826.9A CN105162676B (en) 2015-04-03 2015-06-26 A kind of wechat data capture method and system

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
CN201510156229 2015-04-03
CN2015101562299 2015-04-03
CN201510363826.9A CN105162676B (en) 2015-04-03 2015-06-26 A kind of wechat data capture method and system

Publications (2)

Publication Number Publication Date
CN105162676A CN105162676A (en) 2015-12-16
CN105162676B true CN105162676B (en) 2017-08-11

Family

ID=54803436

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510363826.9A Active CN105162676B (en) 2015-04-03 2015-06-26 A kind of wechat data capture method and system

Country Status (1)

Country Link
CN (1) CN105162676B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105429865A (en) * 2015-12-31 2016-03-23 深圳中泓在线股份有限公司 WeChat public number data collection method and device based on browser
CN107644021A (en) * 2016-07-20 2018-01-30 北大方正集团有限公司 Information collecting method and information collecting device
CN106302407B (en) * 2016-08-02 2019-05-17 四川秘无痕信息安全技术有限责任公司 A method of monitoring wechat circle of friends sends data
CN110958282A (en) * 2018-09-27 2020-04-03 长沙博为软件技术股份有限公司 Method for realizing artificial message pushing simulation of webpage version WeChat based on network communication technology
CN110188257B (en) * 2019-04-16 2021-12-31 国家计算机网络与信息安全管理中心 Mobile application data acquisition method and device
CN110505072B (en) * 2019-09-27 2021-05-25 连尚(新昌)网络科技有限公司 Method, terminal device and computer readable medium for backing up chat records
CN113722631B (en) * 2020-05-20 2023-11-21 中国移动通信集团河北有限公司 Page synthesis method and device
CN113037925B (en) * 2021-04-15 2023-04-07 维沃移动通信有限公司 Information processing method, information processing apparatus, electronic device, and readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7200804B1 (en) * 1998-12-08 2007-04-03 Yodlee.Com, Inc. Method and apparatus for providing automation to an internet navigation application
CN101221572A (en) * 2008-01-25 2008-07-16 吴坤达 Web page data processing system
CN102014078A (en) * 2010-09-28 2011-04-13 苏州阔地网络科技有限公司 Method for realizing instant messaging based on flash on webpage

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7200804B1 (en) * 1998-12-08 2007-04-03 Yodlee.Com, Inc. Method and apparatus for providing automation to an internet navigation application
CN101221572A (en) * 2008-01-25 2008-07-16 吴坤达 Web page data processing system
CN102014078A (en) * 2010-09-28 2011-04-13 苏州阔地网络科技有限公司 Method for realizing instant messaging based on flash on webpage

Also Published As

Publication number Publication date
CN105162676A (en) 2015-12-16

Similar Documents

Publication Publication Date Title
CN105162676B (en) A kind of wechat data capture method and system
USRE48681E1 (en) System and method for tracking web interactions with real time analytics
CN104216921B (en) A kind of addition reminding method, apparatus and system for realizing quick links in browser
CN106528657A (en) Control method and device for browser skipping to application program
CN104732182B (en) Communication method based on two-dimensional code on webpage
CN103984552A (en) iTV Android application store system and achieving method thereof
CN106708557A (en) Update processing method and device capable of aiming at terminal application
CN104820668A (en) Compression of serialized data for communication from a client-side application
CN104462534A (en) Network information sharing method and device
CN102789351A (en) Method and device for switching browsed interfaces
CN106406851A (en) Webpage image capture method and system
CN107634947A (en) Limitation malice logs in or the method and apparatus of registration
CN102932469A (en) Method for achieving client browser and client browser
CN104484482B (en) The info web update method and system of the network platform
CN103607454B (en) The method that android system browser arranges privately owned proxy server
CN108701130A (en) Hints model is updated using auto-browsing cluster
CN104881774A (en) Method and apparatus for automatically establishing schedule
CN104462242B (en) Webpage capacity of returns statistical method and device
CN108932640B (en) Method and device for processing orders
CN111177623A (en) Information processing method and device
CN103399968B (en) A kind of micro-blog information acquisition method and system
CN102929489A (en) Implementation method of client browser and client browser
CN105095070B (en) QQ group's data capture method and system based on browser testing component
CN105630319A (en) Two-dimensional code generation method and apparatus as well as secure decoding method and apparatus for two-dimensional code
US10068250B2 (en) System and method for measuring mobile advertising and content by simulating mobile-device usage

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant