CN103377255A - Creation method and device for article index - Google Patents

Creation method and device for article index Download PDF

Info

Publication number
CN103377255A
CN103377255A CN2012101309808A CN201210130980A CN103377255A CN 103377255 A CN103377255 A CN 103377255A CN 2012101309808 A CN2012101309808 A CN 2012101309808A CN 201210130980 A CN201210130980 A CN 201210130980A CN 103377255 A CN103377255 A CN 103377255A
Authority
CN
China
Prior art keywords
index
directory entry
module
article
sorting tree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2012101309808A
Other languages
Chinese (zh)
Inventor
孔峰
苏勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University Founder Group Co Ltd
Beijing Founder Electronics Co Ltd
Original Assignee
Peking University Founder Group Co Ltd
Beijing Founder Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University Founder Group Co Ltd, Beijing Founder Electronics Co Ltd filed Critical Peking University Founder Group Co Ltd
Priority to CN2012101309808A priority Critical patent/CN103377255A/en
Publication of CN103377255A publication Critical patent/CN103377255A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention provides a creation method for an article index. The creation method for the article index comprises: acquiring index identifiers from the text of a document to form an index entry set; leading the index entry set to be subjected to a sort tree creation; traversing a sort tree to create the article index. The invention provides a creation device for an article index. The creation device for the article index comprises a set module, a sort tree module and an article index module, wherein the set module is used for acquiring index identifiers from the text of a document to form an index entry set; the sort tree module is used for leading the index entry set to be subjected to sort tree creation; the article index module is used for traversing a sort tree to create the article index. According to the creation method and device for the article index, efficient article index creation can be achieved.

Description

The creation method of index article and device
Technical field
The present invention relates to publishing area, in particular to a kind of creation method and device of index article.
Background technology
Index is the pith during books, magazine and paper consist of, and the effect of index becomes more and more important.Along with constantly bringing forth new ideas and the quickening of social rhythm of publication type-setting mode, how document is efficiently produced satisfactory index, be problem demanding prompt solution in the composing process.
Summary of the invention
The present invention aims to provide a kind of creation method and device of index article, to solve the problem of efficient making index.
In an embodiment of the present invention, provide a kind of creation method of index article, having comprised: from the document text, obtained index identifier to form the directory entry set; Set creates sorting tree to directory entry; The traversal sorting tree is to create the index article.
In an embodiment of the present invention, provide a kind of creation apparatus of index article, having comprised: collection modules is used for obtaining index identifier to form the directory entry set from the document text; The sorting tree module is used for directory entry set establishment sorting tree; Index article module is used for the traversal sorting tree to create the index article.
The creation method of the above embodiment of the present invention and device are because adopt sorting tree to process index, so reach the effect of rapidly and efficiently processing index.
Description of drawings
Accompanying drawing described herein is used to provide a further understanding of the present invention, consists of the application's a part, and illustrative examples of the present invention and explanation thereof are used for explaining the present invention, do not consist of improper restriction of the present invention.In the accompanying drawings:
Fig. 1 shows the process flow diagram according to the creation method of the index article of the embodiment of the invention;
Fig. 2 shows the sorting tree according to the embodiment of the invention;
Fig. 3 shows the product process according to the sorting tree of the embodiment of the invention;
Fig. 4 shows the flow process that creates child node according to the directory entry of the embodiment of the invention;
Fig. 5 shows the sorting tree according to the automatic establishment of the embodiment of the invention;
Fig. 6 shows the index effect according to the automatic extraction of the embodiment of the invention;
Fig. 7 shows the sorting tree that the editting and processing according to the embodiment of the invention create afterwards;
Fig. 8 shows the index effect of upgrading afterwards according to the editting and processing of the embodiment of the invention;
Fig. 9 shows the synoptic diagram according to the creation apparatus of the index article of the embodiment of the invention.
Embodiment
Below with reference to the accompanying drawings and in conjunction with the embodiments, describe the present invention in detail.
Fig. 1 shows the process flow diagram according to the creation method of the index article of the embodiment of the invention, comprising:
Step S10 obtains index identifier to form the directory entry set from the document text;
Step S20, set creates sorting tree to directory entry;
Step S30, the traversal sorting tree is to create the index article.
This method is because adopt sorting tree to process index, so realized efficient establishment article index.
Preferably, step S10 comprises: the word content in all pages of traversal text; Judge whether the current character in the word content of current page is index identifier; If so, then obtain index identifier, consist of respectively a directory entry with each index identifier, each directory entry is consisted of the directory entry set.
Preferably, step S20 comprises: create root node; Judge the subordinate relation that each directory entry is mutual; According to the regulation of the level rule that sets in advance to subordinate relation, directory entry is inserted in the sorting tree that creates with root node.
Preferably, according to the regulation of the level rule that sets in advance to subordinate relation, directory entry is inserted in the sorting tree that creates with root node comprises: if the first directory entry in the directory entry is subordinated to the second directory entry in the directory entry, then the first directory entry is set to the child node of the corresponding node of the second directory entry.
Fig. 2 shows the sorting tree according to the embodiment of the invention, and the ground floor child node is the index classification, and second layer child node is distribution caption, down be the directory entry child node again, index level mostly is 4 grades most, and each node can comprise the index entry content, page number data or referring to reference information.The index classification have Chinese, symbol, numeral, English four kinds, Chinese again can be according to stroke or phonetic classification, wherein symbol ordering classification is default setting.The DISPLAY ORDER of index can be by adjusting the priority adjustment of index classification.The ordering rule of using among the embodiment: according to priority high to Low is Chinese pinyin, English, symbol.
Fig. 3 shows the product process of the sorting tree of Fig. 2, comprising:
1, creates the sorting tree root node;
2, obtain ordering rule, acquiescence contains notation index classification and distribution caption at least, and can also increase index classification, distribution caption node this moment according to ordering rule;
3, according to ordering rule, form the mapping table of classifying under the index, be used for indicating the classification of all types of characters under this minor sort;
4, from the document text, obtain index identifier and form the directory entry set;
5, according to ordering rule, the directory entry set is created as the index level child node of sorting tree;
6, suppose that directory entry set number is n, it is i=1 that current directory entry is set;
7, obtain the initial character type of current clauses and subclauses, find index classification and the distribution caption child node that belong to according to classification map table under the index;
8, current directory entry is corresponded to the suitable child node position of index tree according to ordering rule;
9, obtain next directory entry;
If 10 exist next directory entry, it is next directory entry that current directory entry is set, and gets back to step 7;
11, if there is no, the directory entry circulation finishes; Sorting tree just creates and has finished like this.
Fig. 4 shows the flow process that creates child node according to the directory entry of the embodiment of the invention, comprising:
1, obtains the initial character of the one-level index entry content of directory entry, obtain this character types; Character types comprise Chinese, English, numeral, symbol etc.; The index entry rank of directory entry mostly is 4 grades most;
2, according to character types, from index classification ownership classification map table, find index classification and the distribution caption type of directory entry ownership, and obtain to search the start node of directory entry content; If be character class, current start node is index classification node, otherwise is the distribution caption node;
3, search directory entry from the start node traversal; Pass this node object back if find, this node object is the last level node of directory entry; Do not find the father node object of then passing the insertion position back, the position that is inserted into and the index level that is inserted into;
4, judge whether in the index order tree, to find this directory entry;
If 5 do not find this directory entry in sorting tree, newly-increased index entry rank and subsequent the node of being inserted into is to the index structure tree, and node content is set to respectively the index entry rank content of correspondence, then enters step 6;
If 6 find, then increase in order last index level child node with the page number information of this index information or referring to reference information.
Fig. 5 shows the sorting tree according to the automatic establishment of the embodiment of the invention.
There are following 5 index identifiers in the text:
The one-level content The secondary content Type Page number scope
flower plum blossom The current page number 5
Plant Tree peony The current page number 6
flower peony The current page number 7
Animal The current page number 8
Plant Plum blossom The current page number 8
Read first directory entry;
One-level index entry content " flower " initial is f, is English type, according to this ordering rule, is classified as English index classification, and the child node under English distribution caption F searches whether contain " flower "; Do not find in this tree, under English distribution caption F, increase " flower " child node, owing to there is secondary index item content, so continue to increase secondary index item content " plum blossom " child node, write index value in secondary index item content " plum blossom " child node: the page number 5;
Successively each directory entry is added into sorting tree, sorting tree just creates and has finished effect such as Fig. 6 like this.
Preferably, the traversal sorting tree comprises to create the index article: degree of depth traversal sorting tree; Current node is joined in the index paragraph Information Number group as an index paragraph information; Form according to index paragraph Information Number and to index article.
Preferably, form according to index paragraph Information Number and to index article and comprise: according to a paragraph in each index paragraph information architecture index article in the index paragraph Information Number group, the typesetting content of paragraph comprises the index entry type, index entry content, page number data, reference information.
Can revise ordering rule, among the embodiment ordering rule is revised as: high to Low is-symbol according to priority; Again create root node, the index classification node of index number.Fig. 7 shows the sorting tree that the editting and processing according to the embodiment of the invention create afterwards, and Fig. 8 shows the index effect of upgrading afterwards according to the editting and processing of the embodiment of the invention.The embodiment of Fig. 7 Fig. 8 only has the notation index classification.Wherein, read first directory entry; One-level index entry content " flower " initial is f, is English type, according to this ordering rule, is classified as the notation index classification, and the child node under the notation index classification searches whether contain " flower "; Do not find in this tree, under the notation index classification, increase " flower " child node, owing to there is secondary index item content, so continue to increase secondary index item content " plum blossom " child node, write index value in secondary index item content " plum blossom " child node: the page number 5; Successively each directory entry is added into sorting tree, sorting tree such as Fig. 7 after rebuilding like this.Index paragraph Information Number group after degree of depth traversal obtains processing; So just obtaining being used for behind the edit-modify has created each paragraph information of index article.Edit and process ordering rule funiculus posterior medullae spinalis quoted passage chapter content effect such as Fig. 8.
Fig. 9 shows the synoptic diagram according to the creation apparatus of the index article of the embodiment of the invention, comprising:
Collection modules 10 is used for obtaining index identifier to form the directory entry set from the document text;
Sorting tree module 20 is used for directory entry set establishment sorting tree;
Index article module 30 is used for the traversal sorting tree to create the index article.
This device has been realized efficient establishment article index.
Preferably, collection modules comprises: spider module, for the word content of all pages that travel through text; Judge module is used for judging whether the current character of the word content of current page is index identifier; Acquisition module, be used for if, then obtain index identifier, consist of respectively a directory entry with each index identifier, each directory entry is consisted of directory entry set.
Preferably, the sorting tree module comprises: the root node module is used for creating root node; Relationship module is used for judging the mutual subordinate relation of each directory entry; Insert module is used for according to the regulation of the level rule that sets in advance to subordinate relation directory entry being inserted in the sorting tree that creates with root node.
Preferably, index article module comprises: degree of depth spider module is used for degree of depth traversal sorting tree; Add module, be used for current node is joined index paragraph Information Number group as an index paragraph information; Creation module indexes article for forming according to index paragraph Information Number.
As can be seen from the above description, the present invention can change the rule of ordering, and the quick reconfiguration sorting tree also obtains satisfactory index article, adapting to the layout demand of document, and reaches the effect of rapidly and efficiently processing index.
Obviously, those skilled in the art should be understood that, above-mentioned each module of the present invention or each step can realize with general calculation element, they can concentrate on the single calculation element, perhaps be distributed on the network that a plurality of calculation elements form, alternatively, they can be realized with the executable program code of calculation element, thereby, they can be stored in the memory storage and be carried out by calculation element, perhaps they are made into respectively each integrated circuit modules, perhaps a plurality of modules in them or step are made into the single integrated circuit module and realize.Like this, the present invention is not restricted to any specific hardware and software combination.
The above is the preferred embodiments of the present invention only, is not limited to the present invention, and for a person skilled in the art, the present invention can have various modifications and variations.Within the spirit and principles in the present invention all, any modification of doing, be equal to replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (10)

1. the creation method of an index article is characterized in that, comprising:
From the document text, obtain index identifier to form the directory entry set;
Described directory entry set is created sorting tree;
Travel through described sorting tree to create the index article.
2. method according to claim 1 is characterized in that, obtains index identifier and comprise to form the directory entry set from the document text:
Travel through the word content in all pages of described text;
Judge whether the current character in the word content of current page is index identifier;
If so, then obtain described index identifier, consist of respectively a directory entry with each described index identifier, each described directory entry is consisted of described directory entry set.
3. method according to claim 1 is characterized in that, described directory entry set is created sorting tree comprise:
Create root node;
Judge the subordinate relation that each described directory entry is mutual;
According to the regulation of the level rule that sets in advance to described subordinate relation, described directory entry is inserted in the described sorting tree that creates with described root node.
4. method according to claim 3 is characterized in that, according to the regulation of the level rule that sets in advance to described subordinate relation, described directory entry is inserted in the described sorting tree that creates with described root node comprises:
If the first directory entry in the described directory entry is subordinated to the second directory entry in the described directory entry, then described the first directory entry is set to the child node of the corresponding node of described the second directory entry.
5. method according to claim 1 is characterized in that, travels through described sorting tree and comprises to create the index article:
The degree of depth travels through described sorting tree;
Current node is joined in the index paragraph Information Number group as an index paragraph information;
Form according to described index paragraph Information Number and to build described index article.
6. method according to claim 5 is characterized in that, forms according to described index paragraph Information Number and builds described index article and comprise:
According to a paragraph in the described index article of each described index paragraph information architecture in the described index paragraph Information Number group.
7. the creation apparatus of an index article is characterized in that, comprising:
Collection modules is used for obtaining index identifier to form the directory entry set from the document text;
The sorting tree module is used for described directory entry set is created sorting tree;
Index article module is used for traveling through described sorting tree to create the index article.
8. device according to claim 7 is characterized in that, described collection modules comprises:
Spider module is for the word content of all pages that travel through described text;
Judge module is used for judging whether the current character of the word content of current page is index identifier;
Acquisition module, be used for if, then obtain described index identifier, consist of respectively a directory entry with each described index identifier, each described directory entry is consisted of described directory entry set.
9. device according to claim 7 is characterized in that, described sorting tree module comprises:
The root node module is used for creating root node;
Relationship module is used for judging the mutual subordinate relation of each described directory entry;
Insert module is used for according to the regulation of the level rule that sets in advance to described subordinate relation described directory entry being inserted in the described sorting tree that creates with described root node.
10. device according to claim 7 is characterized in that, described index article module comprises:
Degree of depth spider module is used for the degree of depth and travels through described sorting tree;
Add module, be used for current node is joined index paragraph Information Number group as an index paragraph information;
Creation module is built described index article for forming according to described index paragraph Information Number.
CN2012101309808A 2012-04-27 2012-04-27 Creation method and device for article index Pending CN103377255A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2012101309808A CN103377255A (en) 2012-04-27 2012-04-27 Creation method and device for article index

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2012101309808A CN103377255A (en) 2012-04-27 2012-04-27 Creation method and device for article index

Publications (1)

Publication Number Publication Date
CN103377255A true CN103377255A (en) 2013-10-30

Family

ID=49462381

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2012101309808A Pending CN103377255A (en) 2012-04-27 2012-04-27 Creation method and device for article index

Country Status (1)

Country Link
CN (1) CN103377255A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106656861A (en) * 2016-12-15 2017-05-10 咪咕数字传媒有限公司 Electronic book pushing method and device
CN110019988A (en) * 2019-03-08 2019-07-16 阿里巴巴集团控股有限公司 A kind of band structure topological characteristic parameter is clapped method and apparatus

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1341895A (en) * 2000-09-05 2002-03-27 英业达股份有限公司 Method for implementing quick classified browsing on web page by utilizing directory tree
US20030042319A1 (en) * 2001-08-31 2003-03-06 Xerox Corporation Automatic and semi-automatic index generation for raster documents
CN1967567A (en) * 2005-11-18 2007-05-23 三星电子株式会社 Image forming apparatus that automatically creates an index and a method thereof
US20090144605A1 (en) * 2007-12-03 2009-06-04 Microsoft Corporation Page classifier engine
CN101685455A (en) * 2008-09-28 2010-03-31 华为技术有限公司 Method and system of data retrieval

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1341895A (en) * 2000-09-05 2002-03-27 英业达股份有限公司 Method for implementing quick classified browsing on web page by utilizing directory tree
US20030042319A1 (en) * 2001-08-31 2003-03-06 Xerox Corporation Automatic and semi-automatic index generation for raster documents
CN1967567A (en) * 2005-11-18 2007-05-23 三星电子株式会社 Image forming apparatus that automatically creates an index and a method thereof
US20090144605A1 (en) * 2007-12-03 2009-06-04 Microsoft Corporation Page classifier engine
CN101685455A (en) * 2008-09-28 2010-03-31 华为技术有限公司 Method and system of data retrieval

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
何静: "图书内容索引的计算机编制", 《情报理论与实践》 *
康艳: "图书内容索引编制系统(BIS)设计探讨", 《中国索引》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106656861A (en) * 2016-12-15 2017-05-10 咪咕数字传媒有限公司 Electronic book pushing method and device
CN106656861B (en) * 2016-12-15 2019-03-01 咪咕数字传媒有限公司 A kind of e-book method for pushing and device
CN110019988A (en) * 2019-03-08 2019-07-16 阿里巴巴集团控股有限公司 A kind of band structure topological characteristic parameter is clapped method and apparatus
CN110019988B (en) * 2019-03-08 2023-07-18 创新先进技术有限公司 Topological characteristic parameter flattening method and device with structure

Similar Documents

Publication Publication Date Title
CN105159949B (en) A kind of Chinese address segmenting method and system
CN101446962B (en) Data conversion method, device thereof and data processing system
CN101770446B (en) Method and system for identifying form in layout file
CN104200369B (en) Method and device for determining commodity distribution range
Ehrl Minimum comparable areas for the period 1872-2010: an aggregation of Brazilian municipalities♦
CN104199860B (en) Dataset fragmentation method based on two-dimensional geographic position information
CN103123624B (en) Determine method and device, searching method and the device of centre word
CN102819836A (en) Method and system for image segmentation
CN103164388B (en) In a kind of layout files structured message obtain method and device
CN102681994A (en) Webpage information extracting method and system
CN100447793C (en) Method for extracting page query interface based on character of vision
CN102314497A (en) Method and equipment for identifying body contents of markup language files
CN102486769A (en) Document directory processing method and device
CN101458708A (en) Searching result clustering method and device
CN105786921B (en) A kind of the data module method for transformation and device of non-structured document
CN104077385A (en) Classification and retrieval method of files
CN104063365A (en) Method for inserting object in PDF document
CN108228657B (en) Method and device for realizing keyword retrieval
CN106326193A (en) Footnote identification method and footnote and footnote citation association method in fixed-layout document
CN103902700A (en) Tree structure data processing method
CN104572785A (en) Method and device for establishing index in distributed form
CN103186560A (en) Data sorting method and related device
CN103377255A (en) Creation method and device for article index
CN104281648A (en) Search-result multi-dimensional navigating method on basis of dimension label
CN106874255A (en) Method and device for rule matching

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20131030

RJ01 Rejection of invention patent application after publication