CN102609546A - Method and system for excavating information of academic journal paper authors - Google Patents

Method and system for excavating information of academic journal paper authors Download PDF

Info

Publication number
CN102609546A
CN102609546A CN2012100726457A CN201210072645A CN102609546A CN 102609546 A CN102609546 A CN 102609546A CN 2012100726457 A CN2012100726457 A CN 2012100726457A CN 201210072645 A CN201210072645 A CN 201210072645A CN 102609546 A CN102609546 A CN 102609546A
Authority
CN
China
Prior art keywords
author
paper
research direction
information
scientific
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012100726457A
Other languages
Chinese (zh)
Other versions
CN102609546B (en
Inventor
朝乐门
张勇
邢春晓
孙一钢
朱先忠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National Library
Tsinghua University
Original Assignee
National Library
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National Library, Tsinghua University filed Critical National Library
Priority to CN201210072645.7A priority Critical patent/CN102609546B/en
Publication of CN102609546A publication Critical patent/CN102609546A/en
Application granted granted Critical
Publication of CN102609546B publication Critical patent/CN102609546B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a method and a system for excavating information of academic journal paper authors. The method comprises the steps: firstly selecting an objective subject field and establishing an OWL (Web Ontology Language) field ontology; secondly extracting author information from an academic journal paper of the objective subject field; thirdly performing the format conversion to the extracted author information, storing the information into an author information bank and calculating out the unique author ID (Identity); finally obtaining an incidence matrix of an author and an academic paper, an academic developing route chart of the author, a cooperator network diagram of the author, an academic cooperation distances among authors, a hot spot research direction map and an academic reputation map of the author by above information. According to the invention, the data source of the author information excavating method is changed; the OWL filed ontology technology is introduced into calculation processes of the academic cooperation distances among authors and the hot spot research direction; and the semantic calculation effect is improved.

Description

A kind of academic journal paper author information method for digging and system
Technical field
The present invention relates to the knowledge engineering field, be specifically related to a kind of academic journal paper author information method for digging and system.
Background technology
Academic journal paper author information is meant the essential informations such as author's name, sex, year of birth, native place, academic title and research direction that provide in the scientific paper that formally is published on the periodical; Generally appear at the footnote or the last endnote position of paper of paper homepage, as shown in Figure 1.With respect to books, author information has brief, the characteristics such as form is fixed, word standard of content in the academic journal paper.
The analysis of the quantitative relation between author and the document is meant that the science yield-power of describing the author is the purpose information analysis method to disclose the relation between author and the quantity of document.Aspect the analysis of the quantitative relation between author and document; More representational is that Lotka's law (Lotka ' s Law)---the relation of author's quantity and paper quantity is followed a kind of square inverse ratio law; That is: F (x)=C/x2; The author that wherein x, F (x), C represent the paper number respectively, write x piece of writing paper accounts for the ratio and the constant of author's sum.On the basis of Lotka's law, scholars such as Fei Laqi have proposed to influence two factors that Luo Teka distributes: the one, and residing epoch of researcher or environment directly affect result of study; The 2nd, the author's quantity in the statistical sample is relevant with result of study.The advantage of the quantitative relation analysis between author and the document is the relation that has disclosed preferably between author's frequency and the paper quantity, and shortcoming is not analyze other information of author, comprises information such as year of birth, native place, academic title, research direction.
Price utilizes the distribution of every author's cooperation quantity to study collaborative problem, has drawn following equation:
Σ m = 1 I n ( x ) = N
Wherein author's number of x paper is write in n (x) expression; I=nmax is high yield author's in this field a paper sum; N is whole authors' a sum.M=0.749(nmax)0.5。On the research basis of Price, scholars have proposed the computing formula of cooperation degree, cooperative rate, and are specific as follows:
Figure BDA0000144552190000021
Figure BDA0000144552190000022
Although said method each has the relative merits of himself; And the successfully case of utilization is arranged under each comfortable condition of different; But they can't satisfy the special requirement of scientific paper author profile information excavating: at first, the content of the author profile's information in the academic journal paper has singularity.Secondly, the position of the author profile's information in the academic journal paper has singularity.Once more, the form of the author profile's information in the academic journal paper has singularity.At last, the word of the author profile's information in the academic journal paper has singularity.
Summary of the invention
To the problems referred to above that exist in the prior art, the invention provides a kind of academic journal paper author information method for digging and system.
The invention provides a kind of academic journal paper author information method for digging, comprising:
Step 1, the select target ambit is set up the OWL domain body;
Step 2 extracts author information the academic journal paper in the target ambit;
Step 3 is carried out format conversion to the author information that extracts, and is deposited in the author information storehouse, and calculates unique author ID;
Step 4 is calculated author and scientific paper incidence matrix according to author ID and paper ID;
Step 5 is calculated the author in publish thesis absolute quantity and generate the academic growth route map of author of the accumulative total of same research direction according to author and scientific paper incidence matrix, research direction and time;
Step 6 obtains author's partner networks figure according to author and scientific paper incidence matrix;
Step 7 is calculated the scientific cooperation distance between the author according to author's partner networks figure;
Step 8 generates hot research direction map according to OWL domain body, author ID, research direction and focus degree thereof;
Step 9, author's acientific reputation map generation module is used for generating author's acientific reputation map according to author ID and author's partner networks figure.
In one example, in the step 1, the OWL domain body comprises inheritance, identity relation and the set operation relation between the field term.
In one example, in the step 2, author information comprises author's name, sex, year of birth, native place, academic title, research direction, paper title, periodical title, delivers time and author unit one belongs to; In the step 3, unique author ID comprises author's name, year of birth, sex, native place, unit one belongs to's title and random code.
In one example, in the step 4, author and scientific paper incidence matrix S M * n=(s Ij) M * n, wherein i and j are respectively paper ID and author ID, and m and n represent paper record and author's number, s respectively IjRepresent author's weight, the computing formula of author's weight is following:
S ( i , j ) = 0 n = 0 1 n n > 0 , Wherein, S (i j) is the author weight of i author in j piece of writing paper, and n is the rank order of i author in j piece of writing paper, n=1, and 2,3 ..., N.
In one example, in the step 5, the publish thesis computing formula of absolute quantity y of i the accumulation of author on research direction z is following:
Figure BDA0000144552190000032
wherein N is the paper sum that i author delivers on research direction z; S (i; J z) is the author weight of i author in j piece of writing paper; Exist inheritance, identity relation or set operation relation then to be judged to be same research direction between two research directions.
In one example, in the step 6, author's cooperative network figure comprises that the author gathers and paper set, and the author is a node, and paper is a tie, and the weighted value computing method between two nodes are following:
D(i,j,k)=|S(i,k)-S(j,k)|;
Wherein, D (i, j k) are the poor of i author and the weight of j author in k piece of writing paper, W (i, k) and W (j k) is respectively i author and the weight of j author in k piece of writing paper.
In one example, in the step 7, the computing formula of the scientific cooperation distance between the author is following:
L ( i , j ) = Σ k = 0 N ( k × S ( k , k + 1 ) ) , Wherein L (i j) is scientific cooperation distance between the corresponding author of node i and node j, the intermediate node of k on shortest path between node i and the node j, existing among author's cooperative network figure, N is the number of intermediate node.
In one example, generate hot research direction map according to following formula:
H ( i ) = π × ( Σ k = 0 n ( H ( k ) × D ( i , k ) ) 2 , Wherein n is author's number of being engaged in the subclass research direction of i research direction; (H (k) is the focus degree of the research direction of k sub-category; (on behalf of the middle junction on the shortest path between research direction i and the research direction k, i k) count to D, and H (0) represents the relative author's number on i the research direction; D (i, 0)=1; The research direction that leaf node in the OWL body is corresponding is the subclass research direction.
In one example, author's acientific reputation map is blazer's node with first author, is the digraph of recipient's node with co-worker; " computing method of author's acientific reputation are following:
I ( i ) = ( Σ k = 0 n ( I ( k ) × D ( i , k ) ) ) ;
Wherein, I (i) is i author's a popularity, and n is i author's a cooperation author number; K is i author's k co-worker, and (i k) is distance between i author and k the author to D; I (0) representative and i the direct cooperation number of author, and D (0, k)=1.
The invention provides the realization system for carrying out said process, comprise that ETL module, domain body, unique identification module, author and scientific paper incidence matrix computing module, the academic growth route map of author generation module, author's cooperative network figure generation module, scientific cooperation are apart from generation module, hot research direction map generation module and author's acientific reputation map generation module;
The ETL module, the academic journal paper that is used in the target ambit extracts author information, and the author information that extracts is carried out format conversion and deposits in the author information storehouse;
Domain body is by being set up the OWL domain body according to selected target ambit;
The unique identification module is used to calculate unique author ID;
Author and scientific paper incidence matrix computing module are used for calculating author and scientific paper incidence matrix according to author ID and paper ID;
The academic growth route map of author generation module is used for calculating the author in publish thesis absolute quantity and generate author's science growth route map of the accumulative total of same research direction according to author and scientific paper incidence matrix, research direction and time;
Author's cooperative network figure generation module is used for obtaining according to author and scientific paper incidence matrix author's partner networks figure;
Scientific cooperation is used for calculating the scientific cooperation distance between the author according to author's partner networks figure apart from generation module;
Hot research direction map generation module is used for generating hot research direction map based on OWL domain body, author ID, research direction and focus degree thereof;
Author's acientific reputation map generation module is used for generating author's acientific reputation map according to author ID and author's partner networks figure.
To sum up; The major advantage of this method is: 1) break through traditional literature metering and information metering method to the not enough phenomenon of the attention of author profile's information; Propose a kind of author profile's in the science opinion information mining method, changed the Data Source of author information method for digging.2) in the computation process of author's scientific cooperation distance, hot research direction, introduce OWL domain body technology, improved semantic calculating effect.3) propose the grow up computing method of route, scientific cooperation distance, focus direction of author's unique identification sign indicating number, scholar based on author profile's information, expanded the research visual angle that author information excavates.Therefore, compare with the information metering method with aforesaid document metering, this method can satisfy the needs that the scientific paper author information excavates better.
Description of drawings
Come the present invention is done further explain below in conjunction with accompanying drawing, wherein:
Fig. 1 is the author profile's information synoptic diagram in the scientific paper according to the invention;
Fig. 2 is that scientific paper author information according to the invention excavates the basic step synoptic diagram;
Fig. 3 is the E-R figure of scientific paper author information digging system according to the invention;
Fig. 4 is " author and scientific paper incidence matrix " according to the invention synoptic diagram;
Fig. 5 is " the academic growth route map of author " according to the invention synoptic diagram;
Fig. 6 is " author's cooperative network figure " according to the invention synoptic diagram;
Fig. 7 is " author's scientific cooperation distance matrix " according to the invention synoptic diagram;
Fig. 8 is " hot research direction map " according to the invention synoptic diagram;
Fig. 9 is the synoptic diagram of " author's acientific reputation map " according to the invention;
Figure 10 is the synoptic diagram of " scientific paper author information digging system " according to the invention.
Embodiment
It is as shown in Figure 2 to the present invention proposes academic journal paper author profile information mining method, and this method comprises the steps:
Step (1) is selected specific ambit according to demand, adopts the OWL technology to set up domain body.When making up domain body, need consider term corresponding and mutual relationship thereof with this area research direction.The succession of the formalization representation of domain body between must type of indicating (or attribute), be equal to, cross reference, attribute and type between affiliated relation, class and instance between between corresponding relation, attribute transmission, symmetry, function and inverse function relation, type gather operation relation.
Step (2) extracts author profile's information from specific area academic journal paper, comprise author's name, sex, year of birth, native place, academic title, research direction, paper title, periodical title, deliver time, author unit one belongs to.The extraction position of different information maybe be different.Information such as author's name, sex, year of birth, native place, academic title, research direction partly extract from the author profile of scientific paper; Paper title, periodical title, deliver time and author unit one belongs to and extract from the position of correspondence respectively.
Step (3) is carried out format conversion to the author profile's information that extracts, and is deposited in the author information storehouse.Design one or more information tables, be used to deposit author information; Author's name after the extraction, sex, native place, academic title, research direction, paper title, periodical title, author unit one belongs to convert character string type into; Year of birth after the extraction converts date type into the time of delivering; After the format conversion, author information is put into corresponding information table.
Step (4) is calculated author's unique identification sign indicating number, discerns same author and distinguishes different authors.Through calculating name, year of birth, sex, native place, academic title, research direction, author unit one belongs to are carried out function calculation, draw each author's unique identification sign indicating number; Unique identification is deposited in the author information table.
Step (5) is that row, author ID are row with paper ID, calculates " author and scientific paper incidence matrix ", i.e. S M * n=(s Ij) M * n, wherein i and j are respectively paper ID and author ID, and m and n represent paper record and author's number, s respectively IjRepresentative " author's weight "." author's weight " s IjBy the rank order decision of author in corresponding scientific paper.In the following content, except spelling out, the author who when calculating, mentions is author ID.
Step (6), according to " author and scientific paper incidence matrix ", the x axle is the time, the y axle is " accumulation publish thesis absolute quantity " of i author on research direction z, adopts function y=f Subj(x, z i) generate " the academic growth route map of author "." accumulation publish thesis absolute quantity " on the research direction z determined by the author's rank order in publish thesis quantity and the paper.Judge whether for the method for same research direction following: at first, the research direction when from database, reading paper publishing, and shine upon with field OWL body; Secondly, judge whether have succession (< rdfs:subclassOf >) between the research direction, be equal to (< owl:equivalentClass >), set operation (< owl:disjointWith >, < owl:unionOf >, < owl:intersectionOf >, < owl:complementOf >) or instance concern (< rdf:Description >, < rdf:type >); At last,, then think same research direction, otherwise think different research direction if exist with co-relation.
Step (7) generates in " author's cooperative network figure "." author's cooperative network figure " is to be actor's node with author, and paper is the weighted graph of tie.Therefore, " author's cooperative network figure " comprises two groups of information: one group is that the author gathers N={n 1, n 2... .n N, wherein N is author's number; Another group is paper set L={l 1, l 2...., l n, wherein L is the paper number.The weighted value of each tie among author's cooperative network figure is by the absolute value decision of the difference of the weight of author in the paper of tie representative of two node representatives.
Step (8) is calculated " author's scientific cooperation distance " between the author.Be the basis with " author's cooperative network figure ", calculate the scientific cooperation distance value between the author, and generation " author's scientific cooperation distance matrix ".Scientific cooperation distance value between the author is by node number on the shortest path that connects the author and the decision of the weight on the limit.
Step (9) generates by " hot research direction map ".Being the basis with the OWL domain body, is that node, research direction are tie with researcher, generates by " hot research direction map "., " the focus degree of research direction " by two variablees decisions: the one, be engaged in author's number of this research direction, subclass research direction; The 2nd, the distance between the research direction of subclass research direction and node representative.The method of judging same research direction, its subclass direction is after research method is mapped to the OWL domain body, whether has < rdfs:subclassOf>or < owl:equivalentClass>in the domain body.
Step (10) is calculated author's acientific reputation.With first author is blazer's node, and other cooperations author in the same piece of writing paper is recipient's node, generates by " author's acientific reputation map ".Author's acientific reputation value is by author's quantity of directly cooperating with this author and each co-worker's popularity decision.
Below in conjunction with accompanying drawing and instance, specific embodiments of the invention is done further explain.Following instance is used to explain the present invention, but is not used for limiting scope of the present invention.
As shown in Figure 2, the excavation of scientific paper author information needs the support of OWL domain body technology.Therefore, before analyzing the scientific paper author information, need to prepare domain body.When making up the OWL domain body, adopt that mark < rdfs:subclassOf >, < owl:equivalentClass >, < owl:disjointWith>identifys succession between the class respectively, are equal to, cross reference; Adopt mark < rdfs:subPropertyOf >, < owl:equivalentProperty >, < owl:inverseOf>succession between the representation attribute respectively, be equal to, reciprocal relation; Adopt relation between mark < rdfs:domain >, < rdfs:range>difference representation attribute and the class; Adopt between mark < rdf:Description >, < rdf:type>representation class and the instance and concern; Adopt mark owl:TransitiveProperty, owl:SymmetricProperty, owl:FunctionalProperty and the owl:InverseFunctionalProperty transmission between representation attribute respectively, symmetry, function and inverse function relation; Adopt mark < owl:unionOf >, < owl:intersectionOf >, < owl:complementOf>expression set operation relation.
As shown in Figure 3, extract and conversion after author's name, sex, year of birth, native place, academic title, research direction, paper title, periodical title, deliver the time, author unit one belongs to information deposits in respectively in ten relation tables such as author's table, paper table, paper and author's table of comparisons, academic title's table, author and academic title's table of comparisons, department table, author and department's table of comparisons, research direction table, author and the research direction table of comparisons, periodical table.The pattern of above-mentioned ten relation tables is respectively: author (author ID, author's name, date of birth, native place), paper (paper ID; Thesis topic, periodical ID, date issued), author and the paper table of comparisons (author ID, paper ID, author's rank), academic title (academic title ID; Academic title's title), author and academic title's table of comparisons (academic title ID, author ID, paper ID), department (ID of department, department name; City, place, postcode), author and department's table of comparisons (author ID, the ID of department, paper ID), research direction (research direction ID; The research direction title, paper ID, author ID; Body URI), the author and the research direction table of comparisons (research direction ID, author ID, paper ID), periodical table (periodical title, ISBN, establishment date).
Author's unique identification sign indicating number is by name, year of birth, sex, native place, unit one belongs to's name character string decision, and concrete computing formula is following:
AID (i)=StrConn (NameStr (N (i)); BirthStr (Y (i)), SexStr (S (i)), AffStr (A (i)); Ram (i)); Wherein AID (i) is i author's a unique identification sign indicating number, and N (i), Y (i), S (i), A (i) represent i author's name, year of birth, sex, native place and unit one belongs to's title respectively, and function NameStr (), BirthStr (), SexStr (), AffStr () are respectively the hash function of author's name, date of birth, sex and unit one belongs to; Ram () is five random codes, is used to the author of the same name who distinguishes in same unit.
As shown in Figure 4, be that row, author ID are row with paper ID, calculate " author and scientific paper incidence matrix ", i.e. S M * n=(s Ij) M * n, wherein i and j are respectively paper ID and author ID, and m and n represent paper record and author's number, s respectively IjRepresentative " author's weight "." author's weight " s IjBy the rank order decision of author in corresponding scientific paper.The concrete computing formula of " author's weight " is following:
S ( i , j ) = 0 n = 0 1 n n > 0 (wherein, S (i j) is the author weight of i author in j piece of writing paper, and n is the rank order of i author in j piece of writing paper, n=1, and 2,3 ..., N).
As shown in Figure 5, " author academic growth route map " is two-dimensional curve figure, and the x axle is the time, and the y axle is " accumulation publish thesis absolute quantity " of i author on research direction z, employing function y=f Subj(x, z i) generate " the academic growth route map of author " automatically." the publish thesis concrete computing formula of absolute quantity y of i the accumulation of author on research direction z is following:
y = f Subj ( x , z , i ) = &Sigma; j = 0 N S ( i , j , z ) , Wherein N is the paper sum that i author delivers on research direction z, and (i, j z) are " the author weight " of i author in j piece of writing paper to S.Wherein, judge whether that the method for same research direction is following: at first, the research direction when from database, reading paper publishing, and shine upon with field OWL body; Secondly, judge whether have succession (< rdfs:subclassOf >) between the research direction, be equal to (< owl:equivalentClass >), set operation (< owl:disjointWith >, < owl:unionOf >, < owl:intersectionOf >, < owl:complementOf >) or instance concern (< rdf:Description >, < rdf:type >); At last,, then think same research direction, otherwise think different research direction if exist with co-relation.
As shown in Figure 6, " author's cooperative network figure " is to be actor's node with author, and paper is the weighted graph of tie." author's cooperative network figure " comprises two groups of information: one group is that the author gathers N={n 1, n 2... .n N, wherein N is author's number; Another group is paper set L={l 1, l 2...., l n, wherein L is the paper number.Flexible strategy in this weighted graph are the absolute value of difference of the weight of author in the paper of tie representative of two nodes representative, and computing method are following:
D(i,j,k)=|S(i,k)-S(j,k)|
Wherein, D (i, j k) are the poor of i author and the weight of j author in k piece of writing paper, W (i, k) and W (j k) is respectively i author and the weight of j author in k piece of writing paper.
As shown in Figure 7, the row and column of " author's scientific cooperation distance matrix " is author ID, and element value is the scientific cooperation distance value.Scientific cooperation distance value between the author is by node number on the shortest path that connects the author and the decision of the weight on the limit.The formula that calculates the scientific cooperation distance between the author is following:
L ( i , j ) = &Sigma; k = 0 N ( k &times; S ( k , k + 1 ) ) , Wherein k is the intermediate node that on shortest path between node i and the j, exists, and N is the number of intermediate node.
As shown in Figure 8; " hot research direction map " is to be the basis with the OWL domain body; Research direction is a node, and the semantic relation between the research direction is a tie, and " the focus degree of research direction " determined by two variablees: the one, be engaged in author's number of this research direction, subclass research direction; The 2nd, the distance between the research direction of subclass research direction and node representative.On the basis of calculating the focus degree,, generate hot research direction map with the independent variable of focus degree as node area sizes values.The computing method of the focus degree of research direction are following:
H ( i ) = &pi; &times; ( &Sigma; k = 0 n ( H ( k ) &times; D ( i , k ) ) 2 , Wherein n is author's number of being engaged in the subclass research direction of i research direction; H (k) is k sub-category " the focus degree of research direction "; (on behalf of the middle junction on the shortest path between research direction i and the research direction k, i k) count to D, and H (0) represents i the author's number on the research direction; And D (i, 0)=1.Judge whether with the method for subclass research direction following: at first, the research direction when from database, reading paper publishing, and shine upon with field OWL body; Secondly, judge whether there is succession (< rdfs:subclassOf >) between the research direction; Once more,, then think the subclass research direction, otherwise think and be not subclass relation if there is inheritance; Then, if there is the subclass research direction, judge further whether the subclass research direction also exists littler subclass direction.And the like, till the corresponding research direction of the leaf node in the OWL body.
As shown in Figure 9, " author's acientific reputation map " is to be blazer's node with first author, and other co-workers are the digraph of recipient's node." author's popularity " by author's quantity of directly cooperating with this author and each co-worker's popularity decision, concrete computing method are following:
I ( i ) = &Sigma; k = 0 n ( I ( k ) &times; D ( i , k ) )
Wherein, I (i) is i author's a popularity, and n is i author's a cooperation author number, and k is i author's k co-worker, and (i k) is distance between i author and k the author to D.I (0) representative and i the direct cooperation number of author, and D (0, k)=1.
System of the present invention is shown in figure 10, comprises that ETL module, domain body, unique identification module, author and scientific paper incidence matrix computing module, the academic growth route map of author generation module, author's cooperative network figure generation module, scientific cooperation are apart from generation module, hot research direction map generation module and author's acientific reputation map generation module;
Data extract, conversion and loading (ETL) module, the academic journal paper that is used in the target ambit extracts author information, and the author information that extracts is carried out format conversion and deposits in the author information storehouse;
Domain body is by being set up the OWL domain body according to selected target ambit;
The unique identification module is used to calculate unique author ID;
Author and scientific paper incidence matrix computing module are used for calculating author and scientific paper incidence matrix according to author ID and paper ID;
The academic growth route map of author generation module is used for calculating the author in publish thesis absolute quantity and generate author's science growth route map of the accumulative total of same research direction according to author and scientific paper incidence matrix, research direction and time;
Author's cooperative network figure generation module is used for obtaining according to author and scientific paper incidence matrix author's partner networks figure;
Scientific cooperation is used for calculating the scientific cooperation distance between the author according to author's partner networks figure apart from generation module;
Hot research direction map generation module is used for generating hot research direction map based on OWL domain body, author ID, research direction and focus degree thereof;
Author's acientific reputation map generation module is used for generating author's acientific reputation map according to author ID and author's partner networks figure.
The above is merely preferred implementation of the present invention, but protection domain of the present invention is not limited thereto.Any those skilled in the art all can carry out suitable change or variation to it in technical scope disclosed by the invention, and this change or variation all should be encompassed within protection scope of the present invention.

Claims (10)

1. an academic journal paper author information method for digging is characterized in that, comprising:
Step 1, the select target ambit is set up the OWL domain body;
Step 2 extracts author information the academic journal paper in the target ambit;
Step 3 is carried out format conversion to the author information that extracts, and is deposited in the author information storehouse, and calculates unique author ID;
Step 4 is calculated author and scientific paper incidence matrix according to author ID and paper ID;
Step 5 is calculated the author in publish thesis absolute quantity and generate the academic growth route map of author of the accumulative total of same research direction according to author and scientific paper incidence matrix, research direction and time;
Step 6 obtains author's partner networks figure according to author and scientific paper incidence matrix;
Step 7 is calculated the scientific cooperation distance between the author according to author's partner networks figure;
Step 8 generates hot research direction map according to OWL domain body, author ID and research direction;
Step 9 generates author's acientific reputation map according to author ID and author's partner networks figure.
2. the method for claim 1 is characterized in that, in the step 1, the OWL domain body comprises inheritance, identity relation and the set operation relation between the field term.
3. method as claimed in claim 2 is characterized in that, in the step 2, author information comprises author's name, sex, year of birth, native place, academic title, research direction, paper title, periodical title, delivers time and author unit one belongs to; In the step 3, unique author ID comprises author's name, year of birth, sex, native place, unit one belongs to's title and random code.
4. method as claimed in claim 3 is characterized in that, in the step 4, and author and scientific paper incidence matrix S M * n=(s Ij) M * n, wherein i and j are respectively paper ID and author ID, and m and n represent paper record and author's number, s respectively IjRepresent author's weight, the computing formula of author's weight is following:
S ( i , j ) = 0 n = 0 1 n n > 0 , Wherein, S (i j) is the author weight of i author in j piece of writing paper, and n is the rank order of i author in j piece of writing paper, n=1, and 2,3 ..., N.
5. method as claimed in claim 4 is characterized in that, in the step 5, the publish thesis computing formula of absolute quantity y of i the accumulation of author on research direction z is following:
Figure FDA0000144552180000022
wherein N is the paper sum that i author delivers on research direction z; S (i; J z) is the author weight of i author in j piece of writing paper; Exist inheritance, identity relation or set operation relation then to be judged to be same research direction between two research directions.
6. method as claimed in claim 5 is characterized in that, in the step 6, author's cooperative network figure comprises that the author gathers and paper set, and the author is a node, and paper is a tie, and the weighted value computing method between two nodes are following:
D(i,j,k)=|S(i,k)-S(j,k)|;
Wherein, D (i, j k) are the poor of i author and the weight of j author in k piece of writing paper, W (i, k) and W (j k) is respectively i author and the weight of j author in k piece of writing paper.
7. method as claimed in claim 6 is characterized in that, in the step 7, the computing formula of the scientific cooperation distance between the author is following:
L ( i , j ) = &Sigma; k = 0 N ( k &times; S ( k , k + 1 ) ) , Wherein L (i j) is scientific cooperation distance between the corresponding author of node i and node j, the intermediate node of k on shortest path between node i and the node j, existing among author's cooperative network figure, N is the number of intermediate node.
8. method as claimed in claim 7 is characterized in that, in the step 8, generates hot research direction map according to following formula:
H ( i ) = &pi; &times; ( &Sigma; k = 0 n ( H ( k ) &times; D ( i , k ) ) 2 , Wherein n is author's number of being engaged in the subclass research direction of i research direction; (H (k) is the focus degree of the research direction of k sub-category; (on behalf of the middle junction on the shortest path between research direction i and the research direction k, i k) count to D, and H (0) represents i the author's number on the research direction; D (i, 0)=1; The research direction that leaf node in the OWL body is corresponding is the subclass research direction.
9. method as claimed in claim 7 is characterized in that, author's acientific reputation map is blazer's node with first author, is the digraph of recipient's node with co-worker; " computing method of author's acientific reputation are following:
I ( i ) = ( &Sigma; k = 0 n ( I ( k ) &times; D ( i , k ) ) ) ;
Wherein, I (i) is i author's a popularity, and n is i author's a cooperation author number; K is i author's k co-worker, and (i k) is distance between i author and k the author to D; I (0) representative and i the direct cooperation number of author, and D (0, k)=1.
10. the system of realization such as any described method of claim 1-9; It is characterized in that, comprise that ETL module, domain body, unique identification module, author and scientific paper incidence matrix computing module, the academic growth route map of author generation module, author's cooperative network figure generation module, scientific cooperation are apart from generation module, hot research direction map generation module and author's acientific reputation map generation module;
The ETL module, the academic journal paper that is used in the target ambit extracts author information, and the author information that extracts is carried out format conversion and deposits in the author information storehouse;
Domain body is by being set up the OWL domain body according to selected target ambit;
The unique identification module is used to calculate unique author ID;
Author and scientific paper incidence matrix computing module are used for calculating author and scientific paper incidence matrix according to author ID and paper ID;
The academic growth route map of author generation module is used for calculating the author in publish thesis absolute quantity and generate author's science growth route map of the accumulative total of same research direction according to author and scientific paper incidence matrix, research direction and time;
Author's cooperative network figure generation module is used for obtaining according to author and scientific paper incidence matrix author's partner networks figure;
Scientific cooperation is used for calculating the scientific cooperation distance between the author according to author's partner networks figure apart from generation module;
Hot research direction map generation module is used for generating hot research direction map according to OWL domain body, author ID, research direction and focus degree thereof;
Author's acientific reputation map generation module is used for generating author's acientific reputation map according to author ID and author's partner networks figure.
CN201210072645.7A 2011-12-08 2012-03-19 Method and system for excavating information of academic journal paper authors Active CN102609546B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210072645.7A CN102609546B (en) 2011-12-08 2012-03-19 Method and system for excavating information of academic journal paper authors

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
CN201110408020.9 2011-12-08
CN201110408020 2011-12-08
CN201210072645.7A CN102609546B (en) 2011-12-08 2012-03-19 Method and system for excavating information of academic journal paper authors

Publications (2)

Publication Number Publication Date
CN102609546A true CN102609546A (en) 2012-07-25
CN102609546B CN102609546B (en) 2014-11-05

Family

ID=46526918

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210072645.7A Active CN102609546B (en) 2011-12-08 2012-03-19 Method and system for excavating information of academic journal paper authors

Country Status (1)

Country Link
CN (1) CN102609546B (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103020302A (en) * 2012-12-31 2013-04-03 中国科学院自动化研究所 Academic core author excavation and related information extraction method and system based on complex network
CN104156437A (en) * 2014-08-13 2014-11-19 中科嘉速(北京)并行软件有限公司 Academic relationship network construction method based on paper author information extraction and relationship weight model
CN105653590A (en) * 2015-12-21 2016-06-08 青岛智能产业技术研究院 Name duplication disambiguation method of Chinese literature authors
CN105701258A (en) * 2016-03-31 2016-06-22 比美特医护在线(北京)科技有限公司 Information processing method and device
CN106227835A (en) * 2016-07-25 2016-12-14 中南大学 Team's research direction method for digging based on two subnetwork figure hierarchical clusterings
CN106886571A (en) * 2017-01-18 2017-06-23 大连理工大学 A kind of Forecasting Methodology of the scientific cooperation sustainability based on social network analysis
CN108510205A (en) * 2018-04-08 2018-09-07 大连理工大学 A kind of author's technical capability evaluation method based on hypergraph
CN108959543A (en) * 2018-07-02 2018-12-07 吉林大学 A kind of scientific cooperation author network partitioning method
CN109376236A (en) * 2018-07-27 2019-02-22 中山大学 A kind of academic paper author's weight analysis method based on clustering
CN109741791A (en) * 2018-12-29 2019-05-10 人和未来生物科技(长沙)有限公司 A kind of author's subject bearing data method for digging and system towards PubMed paper library
CN110704643A (en) * 2019-08-23 2020-01-17 上海科技发展有限公司 Method and device for automatically identifying same author of different documents and storage medium terminal
CN110941662A (en) * 2019-06-24 2020-03-31 上海市研发公共服务平台管理中心 Graphical method, system, storage medium and terminal for scientific research cooperative relationship
CN111183442A (en) * 2017-10-06 2020-05-19 爱思唯尔有限公司 System and method for providing academic and research entity recommendations
CN111488424A (en) * 2020-03-27 2020-08-04 中国科学院计算技术研究所 Method and system for discovering and tracking people in specific academic field
CN111538917A (en) * 2020-04-20 2020-08-14 清华大学 Learner migration route construction method and device

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109359249B (en) * 2018-09-29 2020-07-10 清华大学 Precise student positioning method and device based on student scientific research result mining

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060106847A1 (en) * 2004-05-04 2006-05-18 Boston Consulting Group, Inc. Method and apparatus for selecting, analyzing, and visualizing related database records as a network
CN101320370A (en) * 2008-05-16 2008-12-10 崔志明 Deep layer web page data source sort management method based on query interface connection drawing
CN101685455A (en) * 2008-09-28 2010-03-31 华为技术有限公司 Method and system of data retrieval

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060106847A1 (en) * 2004-05-04 2006-05-18 Boston Consulting Group, Inc. Method and apparatus for selecting, analyzing, and visualizing related database records as a network
CN101320370A (en) * 2008-05-16 2008-12-10 崔志明 Deep layer web page data source sort management method based on query interface connection drawing
CN101685455A (en) * 2008-09-28 2010-03-31 华为技术有限公司 Method and system of data retrieval

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103020302B (en) * 2012-12-31 2016-03-02 中国科学院自动化研究所 Academic Core Authors based on complex network excavates and relevant information abstracting method and system
CN103020302A (en) * 2012-12-31 2013-04-03 中国科学院自动化研究所 Academic core author excavation and related information extraction method and system based on complex network
CN104156437A (en) * 2014-08-13 2014-11-19 中科嘉速(北京)并行软件有限公司 Academic relationship network construction method based on paper author information extraction and relationship weight model
CN105653590B (en) * 2015-12-21 2019-03-26 青岛智能产业技术研究院 A kind of method that Chinese literature author duplication of name disambiguates
CN105653590A (en) * 2015-12-21 2016-06-08 青岛智能产业技术研究院 Name duplication disambiguation method of Chinese literature authors
CN105701258A (en) * 2016-03-31 2016-06-22 比美特医护在线(北京)科技有限公司 Information processing method and device
CN106227835A (en) * 2016-07-25 2016-12-14 中南大学 Team's research direction method for digging based on two subnetwork figure hierarchical clusterings
CN106227835B (en) * 2016-07-25 2018-01-19 中南大学 Team's research direction method for digging based on two subnetwork figure hierarchical clusterings
CN106886571A (en) * 2017-01-18 2017-06-23 大连理工大学 A kind of Forecasting Methodology of the scientific cooperation sustainability based on social network analysis
CN111183442A (en) * 2017-10-06 2020-05-19 爱思唯尔有限公司 System and method for providing academic and research entity recommendations
CN108510205A (en) * 2018-04-08 2018-09-07 大连理工大学 A kind of author's technical capability evaluation method based on hypergraph
CN108510205B (en) * 2018-04-08 2021-07-16 大连理工大学 Author skill evaluation method based on hypergraph
CN108959543A (en) * 2018-07-02 2018-12-07 吉林大学 A kind of scientific cooperation author network partitioning method
CN109376236A (en) * 2018-07-27 2019-02-22 中山大学 A kind of academic paper author's weight analysis method based on clustering
CN109376236B (en) * 2018-07-27 2021-10-26 中山大学 Academic paper author weight analysis method based on cluster analysis
CN109741791A (en) * 2018-12-29 2019-05-10 人和未来生物科技(长沙)有限公司 A kind of author's subject bearing data method for digging and system towards PubMed paper library
CN110941662A (en) * 2019-06-24 2020-03-31 上海市研发公共服务平台管理中心 Graphical method, system, storage medium and terminal for scientific research cooperative relationship
CN110704643A (en) * 2019-08-23 2020-01-17 上海科技发展有限公司 Method and device for automatically identifying same author of different documents and storage medium terminal
CN110704643B (en) * 2019-08-23 2022-07-26 上海科技发展有限公司 Method and device for automatically identifying same author of different documents and storage medium terminal
CN111488424A (en) * 2020-03-27 2020-08-04 中国科学院计算技术研究所 Method and system for discovering and tracking people in specific academic field
CN111538917A (en) * 2020-04-20 2020-08-14 清华大学 Learner migration route construction method and device
CN111538917B (en) * 2020-04-20 2022-08-26 清华大学 Learner migration route construction method and device

Also Published As

Publication number Publication date
CN102609546B (en) 2014-11-05

Similar Documents

Publication Publication Date Title
CN102609546B (en) Method and system for excavating information of academic journal paper authors
Li et al. Identifying major factors affecting groundwater change in the North China Plain with grey relational analysis
Singh et al. Urbanisation and water insecurity in the Hindu Kush Himalaya: insights from Bangladesh, India, Nepal and Pakistan
Rawat et al. Monitoring land use/cover change using remote sensing and GIS techniques: A case study of Hawalbagh block, district Almora, Uttarakhand, India
Fau et al. Transnational dynamics in Southeast Asia: The greater Mekong subregion and Malacca straits economic corridors
Peng et al. Global trends in DEM-related research from 1994 to 2013: a bibliometric analysis
Yu et al. Hierarchical clustering in minimum spanning trees
Hassan et al. Surface urban heat islands dynamics in response to LULC and vegetation across South Asia (2000–2019)
CN102033947A (en) Region recognizing device and method based on retrieval word
CN103714132B (en) A kind of method and apparatus for being used to carry out focus incident excavation based on region and industry
Kim et al. Geospatial big data-based geostatistical zonation of seismic site effects in Seoul metropolitan area
Sun et al. Landslide susceptibility mapping based on interpretable machine learning from the perspective of geomorphological differentiation
Blanc et al. Is current irrigation sustainable in the United States? An integrated assessment of climate change impact on water resources and irrigated crop yields
Zhu et al. A similarity-based automatic data recommendation approach for geographic models
Were et al. Exploring the geophysical and socio-economic determinants of land cover changes in Eastern Mau forest reserve and Lake Nakuru drainage basin, Kenya
Moltames et al. Multi-criteria decision methods for selecting a wind farm site using a geographic information system (GIS)
Zhao et al. Spatio-temporal land-use/land-cover change dynamics in coastal plains in Hangzhou Bay Area, China from 2009 to 2020 using Google Earth engine
Darlu et al. Spatial and temporal analyses of surname distributions to estimate mobility and changes in historical demography: the example of Savoy (France) from the eighteenth to the twentieth century
Alcalá et al. Shp assessment for a run-of-river (Ror) scheme using a rectangular mesh sweeping approach (msa) based on gis
Stanislawski et al. Classifying physiographic regimes on terrain and hydrologic factors for adaptive generalization of stream networks
Wu et al. Optimal site selection of tidal power plants using a novel method: A case in China
CN102682107A (en) Academic knowledge expressing method based on knowledge network node mode
Salonen et al. Evaluating the impact of distance measures on deforestation simulations in the fluvial landscapes of Amazonia
CN104699666A (en) Method for learning hierarchical structure from book catalogue based on affinity propagation model
Birney et al. A spatially resolved thermodynamic assessment of geothermal powered multi-effect brackish water distillation in Texas

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant