CN101944099A - Method for automatically classifying text documents by utilizing body - Google Patents
Method for automatically classifying text documents by utilizing body Download PDFInfo
- Publication number
- CN101944099A CN101944099A CN 201010210107 CN201010210107A CN101944099A CN 101944099 A CN101944099 A CN 101944099A CN 201010210107 CN201010210107 CN 201010210107 CN 201010210107 A CN201010210107 A CN 201010210107A CN 101944099 A CN101944099 A CN 101944099A
- Authority
- CN
- China
- Prior art keywords
- word
- meaning
- notion
- text document
- expression
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 48
- 238000004458 analytical method Methods 0.000 claims description 26
- 239000000284 extract Substances 0.000 claims description 3
- 238000012163 sequencing technique Methods 0.000 abstract 1
- 230000001131 transforming effect Effects 0.000 abstract 1
- 238000010801 machine learning Methods 0.000 description 6
- 238000012549 training Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000037396 body weight Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Abstract
Description
Recall rate | Accuracy rate |
96.2% | 83.9% |
?S Arts(%) | ?S Sports(%) | S Games(%) | Average phase recency (%) |
?78.5 | ?75.0 | 83.7 | 79.1 |
Claims (1)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2010102101070A CN101944099B (en) | 2010-06-24 | 2010-06-24 | Method for automatically classifying text documents by utilizing body |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2010102101070A CN101944099B (en) | 2010-06-24 | 2010-06-24 | Method for automatically classifying text documents by utilizing body |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101944099A true CN101944099A (en) | 2011-01-12 |
CN101944099B CN101944099B (en) | 2012-05-30 |
Family
ID=43436091
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2010102101070A Expired - Fee Related CN101944099B (en) | 2010-06-24 | 2010-06-24 | Method for automatically classifying text documents by utilizing body |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101944099B (en) |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102521242A (en) * | 2011-11-14 | 2012-06-27 | 江苏联著实业有限公司 | Automatic classification system based on OWL (Ontology of Web Language) ontology analysis |
CN102708104A (en) * | 2011-03-28 | 2012-10-03 | 日电(中国)有限公司 | Method and equipment for sorting document |
CN103034922A (en) * | 2011-09-30 | 2013-04-10 | 国际商业机器公司 | Refinement and calibration method and system for improving classification of information assets |
CN103123685A (en) * | 2011-11-18 | 2013-05-29 | 江南大学 | Text mode recognition method |
CN103218362A (en) * | 2012-01-19 | 2013-07-24 | 中兴通讯股份有限公司 | Method and system for constructing domain ontology |
CN103392177A (en) * | 2011-02-25 | 2013-11-13 | 英派尔科技开发有限公司 | Ontology expansion |
CN103970888A (en) * | 2014-05-21 | 2014-08-06 | 山东省科学院情报研究所 | Document classifying method based on network measure index |
CN104102651A (en) * | 2013-04-07 | 2014-10-15 | 华东师范大学 | Semantic-based self-adaption text classification method under cloud computing environment |
CN104182463A (en) * | 2014-07-21 | 2014-12-03 | 安徽华贞信息科技有限公司 | Semantic-based text classification method |
WO2015043077A1 (en) * | 2013-09-29 | 2015-04-02 | 北大方正集团有限公司 | Semantic information acquisition method, keyword expansion method thereof, and search method and system |
CN105117397A (en) * | 2015-06-18 | 2015-12-02 | 浙江大学 | Method for searching semantic association of medical documents based on ontology |
CN105205090A (en) * | 2015-05-29 | 2015-12-30 | 湖南大学 | Web page text classification algorithm research based on web page link analysis and support vector machine |
CN105354184A (en) * | 2015-10-28 | 2016-02-24 | 甘肃智呈网络科技有限公司 | Method for using optimized vector space model to automatically classify document |
CN105893606A (en) * | 2016-04-25 | 2016-08-24 | 深圳市永兴元科技有限公司 | Text classifying method and device |
CN107066448A (en) * | 2017-04-23 | 2017-08-18 | 四川用联信息技术有限公司 | New small-world network model realizes the extracting method of text feature |
CN108009248A (en) * | 2017-11-30 | 2018-05-08 | 国信优易数据有限公司 | A kind of data classification method and system |
CN108197109A (en) * | 2017-12-29 | 2018-06-22 | 北京百分点信息科技有限公司 | A kind of multilingual analysis method and device based on natural language processing |
WO2018153265A1 (en) * | 2017-02-23 | 2018-08-30 | 腾讯科技(深圳)有限公司 | Keyword extraction method, computer device, and storage medium |
WO2018161516A1 (en) * | 2017-03-07 | 2018-09-13 | 京东方科技集团股份有限公司 | Method and device for automatic discovery of medical knowledge |
CN109271513A (en) * | 2018-09-07 | 2019-01-25 | 华南师范大学 | A kind of file classification method, computer-readable storage media and system |
CN110348497A (en) * | 2019-06-28 | 2019-10-18 | 西安理工大学 | A kind of document representation method based on the building of WT-GloVe term vector |
CN112632968A (en) * | 2020-12-18 | 2021-04-09 | 万兴科技(湖南)有限公司 | PDF directory identification method, electronic device and computer readable storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2005316699A (en) * | 2004-04-28 | 2005-11-10 | Hitachi Ltd | Content disclosure system, content disclosure method and content disclosure program |
US20080059448A1 (en) * | 2006-09-06 | 2008-03-06 | Walter Chang | System and Method of Determining and Recommending a Document Control Policy for a Document |
CN101169780A (en) * | 2006-10-25 | 2008-04-30 | 华为技术有限公司 | Semantic ontology retrieval system and method |
CN101639837A (en) * | 2008-07-29 | 2010-02-03 | 日电(中国)有限公司 | Method and system for automatically classifying objects |
-
2010
- 2010-06-24 CN CN2010102101070A patent/CN101944099B/en not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2005316699A (en) * | 2004-04-28 | 2005-11-10 | Hitachi Ltd | Content disclosure system, content disclosure method and content disclosure program |
US20080059448A1 (en) * | 2006-09-06 | 2008-03-06 | Walter Chang | System and Method of Determining and Recommending a Document Control Policy for a Document |
CN101169780A (en) * | 2006-10-25 | 2008-04-30 | 华为技术有限公司 | Semantic ontology retrieval system and method |
CN101639837A (en) * | 2008-07-29 | 2010-02-03 | 日电(中国)有限公司 | Method and system for automatically classifying objects |
Cited By (39)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103392177A (en) * | 2011-02-25 | 2013-11-13 | 英派尔科技开发有限公司 | Ontology expansion |
CN103392177B (en) * | 2011-02-25 | 2018-01-05 | 英派尔科技开发有限公司 | Ontology expansion |
CN102708104B (en) * | 2011-03-28 | 2015-03-11 | 日电(中国)有限公司 | Method and equipment for sorting document |
CN102708104A (en) * | 2011-03-28 | 2012-10-03 | 日电(中国)有限公司 | Method and equipment for sorting document |
CN103034922A (en) * | 2011-09-30 | 2013-04-10 | 国际商业机器公司 | Refinement and calibration method and system for improving classification of information assets |
CN103034922B (en) * | 2011-09-30 | 2017-05-03 | 国际商业机器公司 | Refinement and calibration method and system for improving classification of information assets |
CN102521242A (en) * | 2011-11-14 | 2012-06-27 | 江苏联著实业有限公司 | Automatic classification system based on OWL (Ontology of Web Language) ontology analysis |
CN103123685A (en) * | 2011-11-18 | 2013-05-29 | 江南大学 | Text mode recognition method |
CN103123685B (en) * | 2011-11-18 | 2016-03-02 | 江南大学 | Text mode recognition method |
CN103218362A (en) * | 2012-01-19 | 2013-07-24 | 中兴通讯股份有限公司 | Method and system for constructing domain ontology |
CN104102651B (en) * | 2013-04-07 | 2017-07-25 | 华东师范大学 | Based on semantic adaptive file classification method under cloud computing environment |
CN104102651A (en) * | 2013-04-07 | 2014-10-15 | 华东师范大学 | Semantic-based self-adaption text classification method under cloud computing environment |
WO2015043077A1 (en) * | 2013-09-29 | 2015-04-02 | 北大方正集团有限公司 | Semantic information acquisition method, keyword expansion method thereof, and search method and system |
CN104516902A (en) * | 2013-09-29 | 2015-04-15 | 北大方正集团有限公司 | Semantic information acquisition method and corresponding keyword extension method and search method |
US10268758B2 (en) | 2013-09-29 | 2019-04-23 | Peking University Founder Group Co. Ltd. | Method and system of acquiring semantic information, keyword expansion and keyword search thereof |
JP2016532173A (en) * | 2013-09-29 | 2016-10-13 | ペキン ユニバーシティ ファウンダー グループ カンパニー,リミティド | Semantic information, keyword expansion and related keyword search method and system |
CN103970888B (en) * | 2014-05-21 | 2017-02-15 | 山东省科学院情报研究所 | Document classifying method based on network measure index |
CN103970888A (en) * | 2014-05-21 | 2014-08-06 | 山东省科学院情报研究所 | Document classifying method based on network measure index |
CN104182463A (en) * | 2014-07-21 | 2014-12-03 | 安徽华贞信息科技有限公司 | Semantic-based text classification method |
CN105205090A (en) * | 2015-05-29 | 2015-12-30 | 湖南大学 | Web page text classification algorithm research based on web page link analysis and support vector machine |
CN105117397B (en) * | 2015-06-18 | 2018-08-28 | 浙江大学 | A kind of medical files semantic association search method based on ontology |
CN105117397A (en) * | 2015-06-18 | 2015-12-02 | 浙江大学 | Method for searching semantic association of medical documents based on ontology |
CN105354184A (en) * | 2015-10-28 | 2016-02-24 | 甘肃智呈网络科技有限公司 | Method for using optimized vector space model to automatically classify document |
CN105354184B (en) * | 2015-10-28 | 2018-04-20 | 甘肃智呈网络科技有限公司 | A kind of vector space model using optimization realizes the method that document is classified automatically |
CN105893606A (en) * | 2016-04-25 | 2016-08-24 | 深圳市永兴元科技有限公司 | Text classifying method and device |
WO2018153265A1 (en) * | 2017-02-23 | 2018-08-30 | 腾讯科技(深圳)有限公司 | Keyword extraction method, computer device, and storage medium |
US10963637B2 (en) | 2017-02-23 | 2021-03-30 | Tencent Technology (Shenzhen) Company Ltd | Keyword extraction method, computer equipment and storage medium |
WO2018161516A1 (en) * | 2017-03-07 | 2018-09-13 | 京东方科技集团股份有限公司 | Method and device for automatic discovery of medical knowledge |
US11455546B2 (en) | 2017-03-07 | 2022-09-27 | Beijing Boe Technology Development Co., Ltd. | Method and apparatus for automatically discovering medical knowledge |
CN107066448A (en) * | 2017-04-23 | 2017-08-18 | 四川用联信息技术有限公司 | New small-world network model realizes the extracting method of text feature |
CN108009248A (en) * | 2017-11-30 | 2018-05-08 | 国信优易数据有限公司 | A kind of data classification method and system |
CN108197109B (en) * | 2017-12-29 | 2021-04-23 | 北京百分点科技集团股份有限公司 | Multi-language analysis method and device based on natural language processing |
CN108197109A (en) * | 2017-12-29 | 2018-06-22 | 北京百分点信息科技有限公司 | A kind of multilingual analysis method and device based on natural language processing |
CN109271513A (en) * | 2018-09-07 | 2019-01-25 | 华南师范大学 | A kind of file classification method, computer-readable storage media and system |
CN109271513B (en) * | 2018-09-07 | 2021-10-22 | 华南师范大学 | Text classification method, computer readable storage medium and system |
CN110348497B (en) * | 2019-06-28 | 2021-09-10 | 西安理工大学 | Text representation method constructed based on WT-GloVe word vector |
CN110348497A (en) * | 2019-06-28 | 2019-10-18 | 西安理工大学 | A kind of document representation method based on the building of WT-GloVe term vector |
CN112632968A (en) * | 2020-12-18 | 2021-04-09 | 万兴科技(湖南)有限公司 | PDF directory identification method, electronic device and computer readable storage medium |
CN112632968B (en) * | 2020-12-18 | 2024-02-13 | 万兴科技(湖南)有限公司 | PDF catalog identification method, electronic equipment and computer readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN101944099B (en) | 2012-05-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101944099B (en) | Method for automatically classifying text documents by utilizing body | |
Chaovalit et al. | Movie review mining: A comparison between supervised and unsupervised classification approaches | |
CN104573046B (en) | A kind of comment and analysis method and system based on term vector | |
EP2041669B1 (en) | Text categorization using external knowledge | |
CN102799647B (en) | Method and device for webpage reduplication deletion | |
CN106599054B (en) | Method and system for classifying and pushing questions | |
CN101493819B (en) | Method for optimizing detection of search engine cheat | |
CN105488024A (en) | Webpage topic sentence extraction method and apparatus | |
CN103577462B (en) | A kind of Document Classification Method and device | |
CN105653706A (en) | Multilayer quotation recommendation method based on literature content mapping knowledge domain | |
CN103744956B (en) | A kind of diversified expanding method of key word | |
CN103838833A (en) | Full-text retrieval system based on semantic analysis of relevant words | |
CN102411563A (en) | Method, device and system for identifying target words | |
CN103235812B (en) | Method and system for identifying multiple query intents | |
CN104484380A (en) | Personalized search method and personalized search device | |
CN105183784A (en) | Content based junk webpage detecting method and detecting apparatus thereof | |
CN104778276A (en) | Multi-index combining and sequencing algorithm based on improved TF-IDF (term frequency-inverse document frequency) | |
Chifu et al. | Word sense discrimination in information retrieval: A spectral clustering-based approach | |
CN105512333A (en) | Product comment theme searching method based on emotional tendency | |
CN102789452A (en) | Similar content extraction method | |
CN100458797C (en) | Process for ordering network advertisement | |
Geng et al. | Evaluating web content quality via multi-scale features | |
CN105956010A (en) | Distributed information retrieval set selection method based on distributed representation and local ordering | |
Jedrzejewski et al. | Opinion mining and social networks: A promising match | |
CN104572915A (en) | User event relevance calculation method based on content environment enhancement |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
C53 | Correction of patent of invention or patent application | ||
CB03 | Change of inventor or designer information |
Inventor after: Fang Jun Inventor after: Guo Lei Inventor after: Yang Ning Inventor before: Guo Lei Inventor before: Fang Jun |
|
COR | Change of bibliographic data |
Free format text: CORRECT: INVENTOR; FROM: GUO LEI FANG JUN TO: FANG JUN GUO LEI YANG NING |
|
ASS | Succession or assignment of patent right |
Owner name: NORTHWESTERN POLYTECHNICAL UNIVERSITY Effective date: 20140814 Owner name: JIANGSU T.Y. ENVIRONMENTAL ENERGY CO., LTD. Free format text: FORMER OWNER: NORTHWESTERN POLYTECHNICAL UNIVERSITY Effective date: 20140814 |
|
C41 | Transfer of patent application or patent right or utility model | ||
COR | Change of bibliographic data |
Free format text: CORRECT: ADDRESS; FROM: 710072 XI'AN, SHAANXI PROVINCE TO: 226600 NANTONG, JIANGSU PROVINCE |
|
TR01 | Transfer of patent right |
Effective date of registration: 20140814 Address after: 226600 the Yellow Sea Avenue, Haian, Jiangsu province (West), No. 268, No. Patentee after: JIANGSU TIANYING ENVIRONMENTAL PROTECTION ENERGY Co.,Ltd. Patentee after: Northwestern Polytechnical University Address before: 710072 Xi'an friendship West Road, Shaanxi, No. 127 Patentee before: Northwestern Polytechnical University |
|
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20120530 |
|
CF01 | Termination of patent right due to non-payment of annual fee |