CN102110140A - Network-based method for analyzing opinion information in discrete text - Google Patents

Network-based method for analyzing opinion information in discrete text Download PDF

Info

Publication number
CN102110140A
CN102110140A CN 201110030156 CN201110030156A CN102110140A CN 102110140 A CN102110140 A CN 102110140A CN 201110030156 CN201110030156 CN 201110030156 CN 201110030156 A CN201110030156 A CN 201110030156A CN 102110140 A CN102110140 A CN 102110140A
Authority
CN
China
Prior art keywords
information
text
network
discrete text
discrete
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN 201110030156
Other languages
Chinese (zh)
Inventor
赵峰
李生红
陈秀真
李海燕
黄慧琼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guilin University of Electronic Technology
Original Assignee
Guilin University of Electronic Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guilin University of Electronic Technology filed Critical Guilin University of Electronic Technology
Priority to CN 201110030156 priority Critical patent/CN102110140A/en
Publication of CN102110140A publication Critical patent/CN102110140A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention relates to a network-based system for analyzing opinion information in a discrete text, belonging to the field of network information safety. The system comprises the following modules: a discrete text information acquisition module which acquires network information in a preset analysis cycle, a discrete text information tracking and restoring module which restores ellipsis and remote anaphora in the original content to obtain a text which contains a relatively complete text structure and semantic information, a semantic information mining and characteristic extracting module which realizes semantic information mining and characteristic extracting on text information by utilizing a latent semantic indexing technology, an opinion information clustering module which realizes information clustering by combining a niche genetic algorithm with a K-Means method, a hot opinion event discovery module which mines the hot opinion in the obtained topic and event, and a background information processing and data supporting center which analyzes data and provides a repertoire specially for a network, new words in the network, the existing class information and the existing hot topics. By applying the invention, the problem that information analysis is influenced as the text structure of the existing network opinion information is incomplete, ellipsis and remote anaphora are more and the new works in the network are more is solved, and the accuracy for discovery of the opinion and hot event is improved by adopting a high-efficiency clustering method.

Description

The public feelings information analytical approach of discrete text Network Based
Technical field
The present invention relates to network information analysis, specifically is a kind of public feelings information analytical approach of discrete text Network Based.
Background technology
Along with the development of Internet technology and the raising of people's living standard, network has become people and has obtained information and the daily most important platform that exchanges.According to " the 26th the China Internet network state of development statistical report " of CNNIC issue, Chinese netizen's scale reaches 4.2 hundred million people, and increases newly the first half of the year among the netizen, and the 62%th, the mobile phone netizen.These pivots the existing scale and the prospect of China Internet network, shown also that simultaneously the mode that people exchange develops computer network and mobile phone mobile network from traditional approach.Speech on the internet is the reaction of netizens' real-time point of view, can have an immense impact on to public opinion and trend thereof, therefore serious also can cause social event, and the information document that these speeches form does not possess the complete structure of an article of traditional documents, content omit and long-range refer to more, and comprise more network neologisms, therefore be necessary this is studied, develop corresponding public feelings information analytic system.(patent name is Chinese patent CN200810147645.2: a kind of method for collecting network public feelings viewpoint) be the method for calculating focus speech word frequency and word frequency variation, with the verb in the critical sentence and noun as eigenwert, by the cosine similarity of calculating between each critical sentence proper vector critical sentence is carried out cluster, obtain a plurality of viewpoint theme line collection, the method that adopts heavy emotion dictionary of cum rights and manual differentiation to combine is at last calculated the emotion tendency of each viewpoint theme line.This method is that unit carries out extraction of focus speech and critical sentence cluster with the speech with the method for statistics, and the text message that has the complete structure of an article in processing is feasible.But discover through us, under current network environment, variation has taken place in the structure of an article of public sentiment text message, particularly along with cellphone subscriber's the sharp increase and the development of network technique, the intercommunion platform such such as microblogging arises at the historic moment, and the information that participates in topic discussion by mobile phone increases.These public feelings informations no longer are to have the comparatively perfect complete structure of an article of certain length, institutional framework, the process object of network public sentiment information is that language is brief, ellipsis is more, the information of the non-complete discrete text form of structure, and omission term wherein and long-range referring to all are to need the problem handled.Simultaneously, in internet exchange platform now, new word and the cyberspeak that acquires a special sense have more important meaning to the reaction of netizen's public sentiment viewpoint, only can not draw the semantic information of these speech with the method for statistics, so topic and incident clustering accuracy are with influenced.In addition, except numerous subject documents, the comment document of these subject documents also having been comprised netizen's viewpoint on the internet, also is the important component part of network public-opinion tendency.
Summary of the invention
The present invention is directed to the characteristics of the existing network public feelings information of above-mentioned proposition, a kind of public feelings information analytical approach of discrete text Network Based is proposed, by the network information that collects is carried out the tracking and the recovery of discrete text, realize the content of network text information flow is omitted and the long-range effective reconstruction that refers to.Adopt potential semantic indexing technology to realize semantic information excavation and feature selecting on this basis.At last public feelings information is analyzed.
The present invention is achieved by the following technical solutions, and the public feelings information analytical approach of discrete text Network Based comprises discrete text information acquisition, discrete text information processing, and corresponding database, comprises the steps:
A. the discrete text information acquisition module is at first gathered the network information by the analytical cycle of setting, and is saved in local data base;
B. next, discrete text tracking of information and restoration module are restored raw content omission part and the long-range part that refers to;
C. on step b basis, semantic information is excavated with characteristic extracting module and is utilized potential semantic indexing technology that text message is carried out semanteme excavation and feature extraction;
D. the data that obtained by step c enter public feelings information cluster module, by the combine cluster of the information of carrying out of niche genetic algorithm and K-Means method; Simultaneously, support the data-guiding classification information at center that the network information is carried out topic and incident cluster by background information processing and data;
E. by focus public sentiment incident discovery module topic and incident that cluster obtains are carried out the excavation of focus public sentiment at last, obtain final result, hand over to the system manager, to carry out follow-up work of treatment as required.
(step a) is meant from network and obtains information flow, is saved in local data base with html format in described discrete text information acquisition.Because present network information flow comprises corresponding picture, audio frequency usually, even a large amount of advertisement picture, so the information of the html format that need preserve this locality is carried out denoising, removing information such as picture, audio frequency, advertisement, thereby reach the purpose that only keeps text message.The concrete steps of webpage denoising are: earlier the uniform data in the html file is standardized, mark that element intersects occurring as<abc〉<def</abc</def, pairing is reduced into<abc〉<def</def</abc<def</def complete form; Then html web page is stored with tree-like chain structure, handled the corresponding html tree of each html webpage of back; At last, present dynamic web page technique generally will be received within database in the page, layout is then used template, takes out content during demonstration and be put in the template from database, the feature of these templates is mainly to divide the space of a whole page with table element, and uses an independent form and show main text.Therefore our mode of handling web page text is, html tree according to above-mentioned generation merges the text in the table element, get text in the form of quantity of information maximum as main text, extract corresponding text information thus, comprise contents such as title, text and money order receipt to be signed and returned to the sender, obtain discrete text information.Simultaneously, set up document index in the denoising process, preserve important as UserID information and the time of participating in discussion, number etc.
Described discrete text tracking of information and recovery (step b), at first handle according to background information and network-specific that data support center provides determines in discrete text with maximum match principle with repertorie that content is omitted, long-range needs such as refer to are paid close attention to part, content omission, the long-range concrete form that refers to comprise only briefly to be quoted other people viewpoint (as " support building-owner ") and does not clearly provide the comment of own viewpoint, the comment of long-range hyperlink form etc.; On this basis according to the hierarchical structure of the html that forms in discrete text information acquisition module tree or visit long-range hyperlink and realize effective location, at last content is omitted part and long-rangely refer to raw content that the part utilization oriented and carry out content and replace abridged raw content, the long-range raw content that refers to.Simultaneously, this module also will be removed the special symbol in the discrete text.Discrete text through this series of processes has possessed the more complete structure of an article.The network-specific repertorie here, be meant at the unconventional language performance phenomenon that occurs under the network discrete text language environment, handle and data support center is found and increased the network-specific term that increases newly by background information, and progressively accumulate in conjunction with existing network-specific term and to form.
Described semantic information is excavated and feature extraction (step c), be to be that the public sentiment document carries out participle with the ICTCLAS of Chinese Academy of Sciences Words partition system to the discrete text after restoring, carry out weight calculation with TF-IDF, obtain word-document matrix, bigger and can keep on the semantic basis effectively dimensionality reduction in view of existing potential semantic indexing technology in view of the common dimension of resulting word-document matrix then, for reducing calculated amount, adopt potential semantic indexing technology that word-document matrix is carried out dimension-reduction treatment, find out speech and notion, the relation of notion and public sentiment document, and carry out feature extraction based on this, obtain being used for notion-public sentiment document matrix that the dimension of public sentiment analysis is lowered thus, carry out the information cluster so that this is entered next module as input.Simultaneously, in the process of participle, the needs of neologisms Network Based provide the user special-purpose dictionary by background information processing and data support center, to improve the accuracy of participle.The potential semantic indexing technology of the employing here obtains notion-public sentiment document matrix, and its disposal route is: for passing through the A that participle and weight calculation obtain M * n, the line display word of matrix, tabulation is shown and document it is carried out svd:
Figure BSA00000428419500031
U 0Embodied word-conceptual relation, V 0Embody notion-document relationships, the diagonal matrix ∑ 0Element arrange from big to small.Keep ∑ then 0Preceding m element and get U 0And V 0Preceding m row form corresponding matrix ∑, U and V respectively.Obtain A at last M * nApproximate solution A '=U ∑ V T, this A ' is notion-public sentiment document matrix, and it has kept original A to greatest extent M * nSemantic information, simultaneously the dimension of feature space is reduced to the m dimension.
(step d) is that to use the method that combines based on niche genetic algorithm and K-Means will have a public feelings information of same subject information or topic classification poly-to same class to described public feelings information cluster.Genetic algorithm is a kind of method of simulating biological evolution, and implementation step is: set initial population, calculating individual fitness, heredity selection, hereditary intersection, hereditary variation, form next population, judge whether satisfy stopping criterion.The ultimate principle of genetic algorithm is the survival of the fittest in natural selection, the survival of the fittest, but in the biological evolution process except the competition, also have cooperation to a certain degree, niche genetic algorithm utilizes this thought just.Among the present invention program population is divided into several microhabitats (niche), similarity is selected to exert an influence to heredity in the average class of each microhabitat (niche) internal condition document, and then influencing the interior individual fitness of microhabitat, intersection and mutation operation then carry out in whole population.Carry out cluster with K-Means in each evolution iteration, to calculate average similarity in individual fitness and the class, the K-Means cluster initial center in the initial population is selected at random, and the back of evolving each time later on selects the big K individuality of adaptive value as initial center.
Described focus public sentiment incident finds that (step e), the document index information of preserving in cluster result that is obtained by the cluster module and the discrete text denoising process is analyzed and obtained current much-talked-about topic and focus incident.Public feelings information number of files and the discussion number in the document index according to every class in the cluster result are excavated focus public sentiment incident within a certain period of time, and each regular update is gathered focus public sentiment incident of finding behind the html page and the existing focus that background information processing and data support center provide compare analysis, the result hands over to the system manager.
Background information is handled and the analysis result that is obtained by said process is preserved at data support center, with in the past preserved the analysis result that gets off and compared, find and increase newly-increased network-specific term, the classification information of the network-specific that set up the special-purpose dictionary of the user who is used for Words partition system, is used for discrete text tracking of information and recovery after, the focus public sentiment event information that has now found that with repertorie, cluster.
Compared with prior art, the present invention has following beneficial effect: 1) increase at the current network cellphone subscriber, the network public-opinion text message does not possess the complete structure of an article, long-range referring to content omitted discrete text phenomenons such as more, realize the tracking and the recovery of public feelings information, improve the accuracy that public sentiment is analyzed; 2) handle at existing public feelings information, increase the back-office support module, the network-specific of effectively extracting the network discrete text is further improved analytical effect with repertorie and the new dictionary of network; 3) provide clustering method, realized efficient public feelings information cluster based on niche genetic algorithm and K-Means combination.
Description of drawings
Fig. 1 is the inventive method workflow diagram.
Fig. 2 is that system of the present invention discrete text is followed the trail of and the restoration module synoptic diagram.
Fig. 3 is a system of the present invention public feelings information cluster module process flow diagram.
Embodiment
Below in conjunction with accompanying drawing embodiments of the invention are elaborated: present embodiment is being to implement under the prerequisite with the technical solution of the present invention, provided detailed embodiment and concrete operating process, but protection scope of the present invention is not limited to following embodiment.
As shown in Figure 1, in the method for the present invention, comprise that discrete text information acquisition module, discrete text tracking of information and restoration module, semantic information excavation and characteristic extracting module, public feelings information cluster module, focus public sentiment incident find that module and background information are handled and data are supported the center, and be used for Words partition system the special-purpose dictionary of user, be used for discrete text tracking of information and recovery network-specific after with repertorie, cluster classification information and the existing hotspot database that has now found that.Its treatment scheme is:
1) discrete text information acquisition
Obtain information flow from network, be saved in local data base with html format, then the information of the html format preserved is standardized and the webpage denoising, remove corresponding picture, audio frequency and advertisement, navigation bar etc., extract corresponding text information, comprise contents such as title, text and money order receipt to be signed and returned to the sender, obtain discrete text information.Simultaneously, set up document index in the denoising process, preserve important as UserID information and time, the number of participating in discussion.
The disposal route of webpage denoising process is: at first, HTML describes file structure with tag, all statements can nested loop, so handle for convenience, at first shape as<abc<def</abc</def marker ligand to after be reduced into<abc<def</def</abc<def</def form.So far, the framework of a html web page file can simply be expressed as follows:
<HTML>
<HEAD>
<TITLE〉</TITLE〉// title
<SCRIPT〉</SCRIPT〉// script
</HEAD>
<BODY〉</BODY〉// content, note
</HTML>
Then, the html web page of above-mentioned form is stored with tree-like chain structure, handled the corresponding html tree of each html webpage of back.At last, present dynamic web page technique generally will be received within database in the page, layout is then used template, takes out content during demonstration and be put in the template from database, the feature of these templates is mainly to divide the space of a whole page with table element, and uses an independent form and show main text.Therefore our mode of handling web page text is, according to the html tree of above-mentioned generation the text in the table element is merged, and gets text in the form of quantity of information maximum as main text.Obtain the discrete text that constitutes by title, text, money order receipt to be signed and returned to the sender thus.
2) discrete text tracking of information and recovery
As shown in Figure 2, according to the network-specific repertorie that background information is handled and data support center provides, determine that by maximum match principle content is omitted, long-range refer to etc. needs to pay close attention to part in discrete text, on this basis according to the hierarchical structure of the html that forms in discrete text information acquisition module tree or visit long-range hyperlink and realize effective location, at last content is omitted part and long-rangely refer to raw content that the part utilization oriented and carry out content and replace abridged raw content, the long-range raw content that refers to.Simultaneously, this module also will be removed the special symbol in the discrete text.Discrete text through this series of processes has possessed the more complete structure of an article.
This module is mainly used in money order receipt to be signed and returned to the sender and the remote linkage problem handled, and processing mode is, referring to as " supporting the building-owner " (being provided with repertorie by network-specific) and url link etc. of occurring in the text carried out content to replace.We are 1) in extracted main text, it is the content in the stalk tree of html number, the main content of pasting of subtree root node representative is directly to the child node of main answer of pasting as root node, to the answer of the money order receipt to be signed and returned to the sender child node as this money order receipt to be signed and returned to the sender.Like this, directly find corresponding node to carry out content to " building-owner ", " upstairs " and replace, long-range url is carried out links and accesses, recurrence is extracted content of text and is substituted.
3) semantic information is excavated and feature extraction
Concrete steps are: at first, the ICTCLAS Words partition system that uses Inst. of Computing Techn. Academia Sinica to develop carries out participle to the discrete text after restoring; Calculate term weighing with TF-IDF then, obtain word-document matrix A M * n; At last with A M * nCarry out svd
Figure BSA00000428419500051
Keep ∑ 0Preceding m element and get U 0And V 0Preceding m row form corresponding matrix ∑, U and V respectively, and then obtain notion-public sentiment document matrix A ': A '=U ∑ V T
4) public feelings information cluster
As shown in Figure 3, it is poly-to same class to use the method that combines based on niche genetic algorithm and K-Means will have a public feelings information of same subject information or topic classification.Concrete steps are:
S1: select text as initial cluster center at random, form initial population, use the k-means initial clustering to calculate individual initial fitness in the population;
S2: select in the population, intersect, variation;
S3: each text in the population is carried out cluster with the k-means algorithm;
S4: calculate the ideal adaptation degree;
S5:,, then replace parent and enter next circulation if its fitness is higher than parent to each individuality;
S6: satisfy end condition and then forward S7 to, otherwise forward S2 to;
S7: choose the high individuality of fitness as initial cluster center, and carry out cluster with k-means;
5) focus public sentiment incident is found
The document index information of preserving in cluster result that is obtained by the cluster module and the discrete text denoising process is analyzed and is obtained current much-talked-about topic and focus incident.Excavating within a certain period of time, the foundation of focus public sentiment incident is the public feelings information number of files of every class in the cluster result and the discussion number in the document index.Each regular update gathers the focus public sentiment incident of finding behind the html page and the existing focus that background information is handled and data support center provides compares analysis, and the result hands over to the system manager;
6) background information is handled and data support center
The analysis result that preservation is obtained by said process, with in the past preserved the analysis result that gets off and compared, find and increase newly-increased network-specific term, the classification information of the network-specific that set up the special-purpose dictionary of the user who is used for Words partition system, is used for discrete text tracking of information and recovery after, the focus public sentiment event information that has now found that with repertorie, cluster.
Systematic analysis flow process and concrete processing mode have been provided in the present embodiment, comprise used structure information storage, online public feelings information to regular update obtains real-time analysis result, comprise information clustering result and much-talked-about topic, focus incident, also preserve simultaneously and the used back-end data of replacement analysis process, the maintenance of these back-end datas needs system manager's the reliability of participation to guarantee that next round is analyzed.

Claims (8)

1. the public feelings information analytical approach of discrete text Network Based comprises discrete text information acquisition, discrete text information processing, and corresponding database, it is characterized in that: comprise the steps:
A. the discrete text information acquisition module is at first gathered the network information by the analytical cycle of setting, and is saved in local data base;
B. next, discrete text tracking of information and restoration module are restored raw content omission part and the long-range part that refers to;
C. on step b basis, semantic information is excavated with characteristic extracting module and is utilized potential semantic indexing technology that text message is carried out semanteme excavation and feature extraction;
D. the data that obtained by step c enter public feelings information cluster module, by the combine cluster of the information of carrying out of niche genetic algorithm and K-Means method; Simultaneously, support the data-guiding classification information at center that the network information is carried out topic and incident cluster by background information processing and data;
E. by focus public sentiment incident discovery module topic and incident that cluster obtains are carried out the excavation of focus public sentiment at last, obtain final result, hand over to the system manager.
2. the public feelings information analytical approach of discrete text Network Based according to claim 1, it is characterized in that, at step a, earlier obtain information flow from network, be saved in local data base with html format, the information of the html format that this locality is preserved is carried out denoising then, simultaneously, set up document index in the denoising process, time, the number of preserving UserID information and participating in discussion.
3. the public feelings information analytical approach of discrete text Network Based according to claim 1, it is characterized in that, at step b, at first in discrete text determining content with repertorie with maximum match principle according to the network-specific that background information is handled and data support center provides omits, the long-range part that refers to, on this basis according to the hierarchical structure of the html that forms in discrete text information acquisition module tree or visit long-range hyperlink and realize to the abridged raw content, effective location of the long-range raw content that refers to is omitted part to content at last and is long-rangely referred to raw content that the part utilization oriented and carry out content and replace; Simultaneously, the special symbol in the removal discrete text.
4. the public feelings information analytical approach of discrete text Network Based according to claim 1, it is characterized in that, at step c, to the discrete text after restoring is the public sentiment document, carry out participle with the ICTCLAS of Chinese Academy of Sciences Words partition system, carry out weight calculation with TF-IDF, obtain word-document matrix, adopt potential semantic indexing technology that word-document matrix is carried out dimension-reduction treatment then, find out speech and notion, the relation of notion and public sentiment document, and carry out feature extraction based on this, obtain being used for next step and carry out notion-public sentiment document matrix that dimension that the information cluster uses is lowered.
5. the public feelings information analytical approach of discrete text Network Based according to claim 1, it is characterized in that, in steps d, niche genetic algorithm is that population is divided into several microhabitats, similarity is selected to exert an influence to heredity in the average class of each microhabitat internal condition document, and then influencing the interior individual fitness of microhabitat, intersection and mutation operation then carry out in whole population; In each evolution iteration, carry out cluster with K-Means, to calculate average similarity in individual fitness and the class, K-Means cluster initial center in the initial population is selected at random, the back of evolving each time later on selects the big K individuality of adaptive value as initial center, will have with the method for K-Means that the public feelings information of same subject information or topic classification is poly-to arrive same class.
6. the public feelings information analytical approach of discrete text Network Based according to claim 1, it is characterized in that, at step e, the document index information of preserving in cluster result that obtains by the cluster module and the discrete text denoising process, public feelings information number of files and the discussion number in the document index according to every class in the cluster result are excavated focus public sentiment incident within a certain period of time, and each regular update is gathered focus public sentiment incident of finding behind the html page and the existing focus that background information processing and data support center provide compare analysis, the result hands over to the system manager.
7. the public feelings information analytical approach of discrete text Network Based according to claim 1 is characterized in that, background information is handled and data are supported the center
8. the public feelings information analytical approach of discrete text Network Based according to claim 2 is characterized in that, the concrete steps of denoising are: earlier with the standardization of the uniform data in the html file, the marker ligand that the element intersection occurs to being reduced into complete form; Then html web page is stored with tree-like chain structure, handled the corresponding html tree of each html webpage of back; Html tree according to above-mentioned generation merges the text in the table element at last, get text in the form of quantity of information maximum as main text, extract corresponding text information thus, comprise contents such as title, text and money order receipt to be signed and returned to the sender, obtain discrete text information.
CN 201110030156 2011-01-26 2011-01-26 Network-based method for analyzing opinion information in discrete text Pending CN102110140A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201110030156 CN102110140A (en) 2011-01-26 2011-01-26 Network-based method for analyzing opinion information in discrete text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201110030156 CN102110140A (en) 2011-01-26 2011-01-26 Network-based method for analyzing opinion information in discrete text

Publications (1)

Publication Number Publication Date
CN102110140A true CN102110140A (en) 2011-06-29

Family

ID=44174301

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201110030156 Pending CN102110140A (en) 2011-01-26 2011-01-26 Network-based method for analyzing opinion information in discrete text

Country Status (1)

Country Link
CN (1) CN102110140A (en)

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102662998A (en) * 2012-03-14 2012-09-12 华侨大学 Text semantic theme extracting method based on Baidu Encyclopedia
CN102708096A (en) * 2012-05-29 2012-10-03 代松 Network intelligence public sentiment monitoring system based on semantics and work method thereof
WO2013086931A1 (en) * 2011-12-13 2013-06-20 International Business Machines Corporation Event mining in social networks
CN103970730A (en) * 2014-04-29 2014-08-06 河海大学 Method for extracting multiple subject terms from single Chinese text
CN104714800A (en) * 2013-12-16 2015-06-17 国际商业机器公司 Method and system for constructing concepts according to task specification
CN104812036A (en) * 2015-05-15 2015-07-29 桂林电子科技大学 Sleep scheduling method and system for energy acquisition sensor network
CN104881734A (en) * 2015-05-11 2015-09-02 广东小天才科技有限公司 Method, device and system for guiding product improvement based on gray release
CN104965823A (en) * 2015-07-30 2015-10-07 成都鼎智汇科技有限公司 Big data based opinion extraction method
CN105005578A (en) * 2015-05-21 2015-10-28 中国电子科技集团公司第十研究所 Multimedia target information visual analysis system
CN105068991A (en) * 2015-07-30 2015-11-18 成都鼎智汇科技有限公司 Big data based public sentiment discovery method
CN105183765A (en) * 2015-07-30 2015-12-23 成都鼎智汇科技有限公司 Big data-based topic extraction method
CN105426391A (en) * 2015-10-27 2016-03-23 张贝贝 Method of acquiring diffusion pattern of network hot topic
CN105512245A (en) * 2015-11-30 2016-04-20 青岛智能产业技术研究院 Enterprise figure building method based on regression model
CN105786781A (en) * 2016-03-14 2016-07-20 裴克铭管理咨询(上海)有限公司 Job description text similarity calculation method based on topic model
CN105975642A (en) * 2016-07-15 2016-09-28 合肥指南针电子科技有限责任公司 Public opinion monitoring method based on network big data
CN106161377A (en) * 2015-04-13 2016-11-23 中国科学院软件研究所 A kind of social networks access control method based on user characteristics
CN106202070A (en) * 2015-04-29 2016-12-07 中国电信股份有限公司 File storage processing method and system
CN103324665B (en) * 2013-05-14 2017-05-03 亿赞普(北京)科技有限公司 Hot spot information extraction method and device based on micro-blog
CN107562822A (en) * 2017-08-18 2018-01-09 武汉红茶数据技术有限公司 A kind of public sentiment event method for digging and system
CN109508374A (en) * 2018-11-19 2019-03-22 云南电网有限责任公司信息中心 Text data Novel semi-supervised based on genetic algorithm
CN109635107A (en) * 2018-11-19 2019-04-16 北京亚鸿世纪科技发展有限公司 The method and device of semantic intellectual analysis and the event scenarios reduction of multi-data source
CN109740042A (en) * 2018-11-27 2019-05-10 平安科技(深圳)有限公司 Monitoring method, device and the storage medium of public opinion information, computer equipment
CN109902176A (en) * 2019-02-26 2019-06-18 北京微步在线科技有限公司 A kind of computer instruction storage medium of data correlation expanding method and non-transitory
CN113076335A (en) * 2021-04-02 2021-07-06 西安交通大学 Network cause detection method, system, equipment and storage medium
CN115687960A (en) * 2022-12-30 2023-02-03 中国人民解放军61660部队 Text clustering method for open source security information
CN115858787A (en) * 2022-12-12 2023-03-28 交通运输部公路科学研究所 Hot spot extraction and mining method based on problem appeal information in road transportation
CN116911288A (en) * 2023-09-11 2023-10-20 戎行技术有限公司 Discrete text recognition method based on natural language processing technology

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6446061B1 (en) * 1998-07-31 2002-09-03 International Business Machines Corporation Taxonomy generation for document collections
CN101819573A (en) * 2009-09-15 2010-09-01 电子科技大学 Self-adaptive network public opinion identification method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6446061B1 (en) * 1998-07-31 2002-09-03 International Business Machines Corporation Taxonomy generation for document collections
CN101819573A (en) * 2009-09-15 2010-09-01 电子科技大学 Self-adaptive network public opinion identification method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
《电脑知识与技术》 20100930 将昀昕 基于自适应K均值聚类的小生境遗传算法 7675-7677页 1-6、8 第6卷, 第27期 *

Cited By (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2509874A (en) * 2011-12-13 2014-07-16 Ibm Event mining in social networks
US8914371B2 (en) 2011-12-13 2014-12-16 International Business Machines Corporation Event mining in social networks
WO2013086931A1 (en) * 2011-12-13 2013-06-20 International Business Machines Corporation Event mining in social networks
CN102662998A (en) * 2012-03-14 2012-09-12 华侨大学 Text semantic theme extracting method based on Baidu Encyclopedia
CN102662998B (en) * 2012-03-14 2015-07-15 华侨大学 Text semantic theme extracting method based on Baidu Encyclopedia
CN102708096B (en) * 2012-05-29 2014-10-15 代松 Network intelligence public sentiment monitoring system based on semantics and work method thereof
CN102708096A (en) * 2012-05-29 2012-10-03 代松 Network intelligence public sentiment monitoring system based on semantics and work method thereof
CN103324665B (en) * 2013-05-14 2017-05-03 亿赞普(北京)科技有限公司 Hot spot information extraction method and device based on micro-blog
US10162852B2 (en) 2013-12-16 2018-12-25 International Business Machines Corporation Constructing concepts from a task specification
CN104714800A (en) * 2013-12-16 2015-06-17 国际商业机器公司 Method and system for constructing concepts according to task specification
CN104714800B (en) * 2013-12-16 2018-06-29 国际商业机器公司 For according to mission statement come the method and system of structure concept
CN103970730A (en) * 2014-04-29 2014-08-06 河海大学 Method for extracting multiple subject terms from single Chinese text
CN106161377B (en) * 2015-04-13 2019-03-29 中国科学院软件研究所 A kind of social networks access control method based on user characteristics
CN106161377A (en) * 2015-04-13 2016-11-23 中国科学院软件研究所 A kind of social networks access control method based on user characteristics
CN106202070A (en) * 2015-04-29 2016-12-07 中国电信股份有限公司 File storage processing method and system
CN104881734A (en) * 2015-05-11 2015-09-02 广东小天才科技有限公司 Method, device and system for guiding product improvement based on gray release
CN104812036A (en) * 2015-05-15 2015-07-29 桂林电子科技大学 Sleep scheduling method and system for energy acquisition sensor network
CN104812036B (en) * 2015-05-15 2018-04-17 桂林电子科技大学 A kind of dormancy dispatching method and system of energy harvesting sensor network
CN105005578A (en) * 2015-05-21 2015-10-28 中国电子科技集团公司第十研究所 Multimedia target information visual analysis system
CN105183765A (en) * 2015-07-30 2015-12-23 成都鼎智汇科技有限公司 Big data-based topic extraction method
CN105068991A (en) * 2015-07-30 2015-11-18 成都鼎智汇科技有限公司 Big data based public sentiment discovery method
CN104965823A (en) * 2015-07-30 2015-10-07 成都鼎智汇科技有限公司 Big data based opinion extraction method
CN105426391A (en) * 2015-10-27 2016-03-23 张贝贝 Method of acquiring diffusion pattern of network hot topic
CN105512245A (en) * 2015-11-30 2016-04-20 青岛智能产业技术研究院 Enterprise figure building method based on regression model
CN105512245B (en) * 2015-11-30 2018-08-21 青岛智能产业技术研究院 A method of enterprise's portrait is established based on regression model
CN105786781A (en) * 2016-03-14 2016-07-20 裴克铭管理咨询(上海)有限公司 Job description text similarity calculation method based on topic model
CN105975642A (en) * 2016-07-15 2016-09-28 合肥指南针电子科技有限责任公司 Public opinion monitoring method based on network big data
CN107562822A (en) * 2017-08-18 2018-01-09 武汉红茶数据技术有限公司 A kind of public sentiment event method for digging and system
CN109508374A (en) * 2018-11-19 2019-03-22 云南电网有限责任公司信息中心 Text data Novel semi-supervised based on genetic algorithm
CN109635107A (en) * 2018-11-19 2019-04-16 北京亚鸿世纪科技发展有限公司 The method and device of semantic intellectual analysis and the event scenarios reduction of multi-data source
CN109508374B (en) * 2018-11-19 2021-12-21 云南电网有限责任公司信息中心 Text data semi-supervised clustering method based on genetic algorithm
CN109740042A (en) * 2018-11-27 2019-05-10 平安科技(深圳)有限公司 Monitoring method, device and the storage medium of public opinion information, computer equipment
CN109902176A (en) * 2019-02-26 2019-06-18 北京微步在线科技有限公司 A kind of computer instruction storage medium of data correlation expanding method and non-transitory
CN113076335A (en) * 2021-04-02 2021-07-06 西安交通大学 Network cause detection method, system, equipment and storage medium
CN115858787A (en) * 2022-12-12 2023-03-28 交通运输部公路科学研究所 Hot spot extraction and mining method based on problem appeal information in road transportation
CN115687960A (en) * 2022-12-30 2023-02-03 中国人民解放军61660部队 Text clustering method for open source security information
CN116911288A (en) * 2023-09-11 2023-10-20 戎行技术有限公司 Discrete text recognition method based on natural language processing technology
CN116911288B (en) * 2023-09-11 2023-12-12 戎行技术有限公司 Discrete text recognition method based on natural language processing technology

Similar Documents

Publication Publication Date Title
CN102110140A (en) Network-based method for analyzing opinion information in discrete text
Venugopalan et al. Exploring sentiment analysis on twitter data
KR101005337B1 (en) System for extraction and analysis of opinion in web documents and method thereof
US20160147866A1 (en) Processing user profiles
CN113704451B (en) Power user appeal screening method and system, electronic device and storage medium
CN100595760C (en) Method for gaining oral vocabulary entry, device and input method system thereof
Ding et al. Automatic hashtag recommendation for microblogs using topic-specific translation model
US8510308B1 (en) Extracting semantic classes and instances from text
CN102298638A (en) Method and system for extracting news webpage contents by clustering webpage labels
CN103699525A (en) Method and device for automatically generating abstract on basis of multi-dimensional characteristics of text
CN104182389A (en) Semantic-based big data analysis business intelligence service system
Lee Unsupervised and supervised learning to evaluate event relatedness based on content mining from social-media streams
CN102043808A (en) Method and equipment for extracting bilingual terms using webpage structure
CN103699626A (en) Method and system for analysing individual emotion tendency of microblog user
CN102567534B (en) Interactive product user generated content intercepting system and intercepting method for the same
CN102682120A (en) Method,device and system for acquiring essential article commented on network
CN109086355A (en) Hot spot association relationship analysis method and system based on theme of news word
CN104281565A (en) Semantic dictionary constructing method and device
CN106649308B (en) Word segmentation and word library updating method and system
Zhu et al. Real-time personalized twitter search based on semantic expansion and quality model
CN104346382A (en) Text analysis system and method employing language query
Kumar et al. A knowledge induced graph-theoretical model for extract and abstract single document summarization
Kim et al. Customer preference analysis based on SNS data
Chowdhury et al. Crime monitoring from newspaper data based on sentiment analysis
Khatoon Real-time twitter data analysis of Saudi telecom companies for enhanced customer relationship management

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20110629