CN103324509A - Method for installing bioinformatics application programs in high-performance cluster system - Google Patents

Method for installing bioinformatics application programs in high-performance cluster system Download PDF

Info

Publication number
CN103324509A
CN103324509A CN2013102601126A CN201310260112A CN103324509A CN 103324509 A CN103324509 A CN 103324509A CN 2013102601126 A CN2013102601126 A CN 2013102601126A CN 201310260112 A CN201310260112 A CN 201310260112A CN 103324509 A CN103324509 A CN 103324509A
Authority
CN
China
Prior art keywords
application program
bioinformatics
mounting platform
class application
current mounting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2013102601126A
Other languages
Chinese (zh)
Inventor
姜金良
马少杰
曹振南
李斌
赵明坤
侯雪峰
何沧平
田相桂
杨亮
易成
曹征
苗春葆
胡耀国
范娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dawning Information Industry Beijing Co Ltd
Original Assignee
Dawning Information Industry Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dawning Information Industry Beijing Co Ltd filed Critical Dawning Information Industry Beijing Co Ltd
Priority to CN2013102601126A priority Critical patent/CN103324509A/en
Publication of CN103324509A publication Critical patent/CN103324509A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses a method for installing bioinformatics application programs in a high-performance cluster system. The method includes the steps: loading environment variables of the bioinformatics application programs; selecting a corresponding math library according to system types and network configuration of a current installation platform; installing the bioinformatics application programs by the aid of the environment variables and the math library. By the method for installing the bioinformatics application programs in the high-performance cluster system, the efficiency of installing the bioinformatics application programs in the high-performance cluster system is improved.

Description

The method of bioinformatics class application program is installed in High Performance Cluster System
Technical field
The present invention relates to investigation of materials basically, more specifically, relates to a kind of method that bioinformatics class application program is installed in High Performance Cluster System.
Background technology
Bioinformatics be take computing machine as instrument to biological information collect, the science of disposal and utilization.Research object is generally protein and the large molecule of DNA, on the one hand because the complicacy of this body structure of research object, on the other hand because the develop rapidly of sequencing technologies, the human gene order number of finding is according to exponential growth, study for the gene that quantity like this is huge, often be accompanied by googol according to treatment capacity and parallel computation amount.
Have a lot in the bioinformatics research: such as utilizing experimental apparatus that gene etc. is checked order and rough handling---the sequenator processed offline of measurement data, DNA sequencer is for the senior test apparatus of measuring DNA (gene) sequence, be used for paternity test, individual identification, genetic profile, paternal line evaluation, maternal evaluation, race's evaluation, kind evaluation, and the diagnosis of some disease etc., be requisite instrument and equipment in the life science, the important tool that obtains important development of scientific research.DNA sequencer is expensive, its research process is divided into preparation reagent, the instrument processed offline that instrument checks order last, thus obtain the gene order that scientist can identification, on this basis, scientist can utilize that the sequence that measure to obtain is spliced, comparison, homology analysis etc.; Sequence alignment mainly is reconstruct DNA complete sequence from overlapped sequence fragment, under various experiment conditions, from detection data, determine physics and gene map storage, dna sequence dna in traversal and the comparison database, the similarity of more two or more sequences, search correlated series and subsequence in database, seek the continuous generation pattern of nucleotide, find out the informational content of protein and dna sequence dna; Molecular docking is simulated the macromolecular interaction of little molecule ligand and acceptor according to the lock of part and acceptor-key principle, by calculating prediction binding pattern and affinity between the two, thereby carries out the virtual screening of medicine.
Program commonly used has abyss, allpathslg, amos, autodock, blast, clustal-omega, clustalw, clustalw-mpi, dock, emboss, exonerate, fasta, fsa, hmmer, mira, mpiblast, mpihmmer, mummer, velvet, wgs etc.
Usually the installation of bioinformatics class application program deployment all is manual execution, this mounting means comes with some shortcomings: program compilation, installation process are comparatively complicated, need the artificial parameter that arranges more, the manual installation complex operation, waste time and energy, if the compilation operations flow process is unfamiliar with, be easy to occur mistake.Need to carry out different parameter configuration with network environment for different hardware platforms in the installation process, all can cause executing efficiency lowly or even the operation result mistake to being unfamiliar with of operating system, compiler, math library, hardware system and network environment.Need to configure corresponding environmental variance after the installation success, to be user-friendly to, manual configuration is easily made mistakes, and when the application program kind is many, easily causes environmental variance that confusion, conflict are set.
Summary of the invention
Defective for above-mentioned prior art, the present invention proposes a kind of method that bioinformatics class application program is installed in High Performance Cluster System, solved and how to have improved the technical matters that the efficient that bioinformatics class application program is installed in the High Performance Cluster System is installed.
The present invention proposes a kind of a kind of automatic installation method of HPCC bioinformatics class application program.This application program realizes the robotization unmanned installation of multiple material physics class application program, comprises abyss, allpathslg, amos, autodock, blast, clustal-omega, clustalw, clustalw-mpi, dock, emboss, exonerate, fasta, fsa, hmmer, mira, mpiblast, mpihmmer, mummer, velvet, wgs etc.; This program other program environment that first self-verifying relies on before configuration bioinformatics class application program is installed; In the process of Auto-mounting configuration, be configured parameter adjustment and optimization according to the network environment of HPCC; Automatic configuration surroundings variable after the installation, and be provided at the script example of submitting required by task in the group system to; In the whole installation process, the dynamic reminding installation progress provides the corresponding prompting that reports an error if there is mistake.
According to an aspect of the present invention, provide a kind of method that bioinformatics class application program is installed in High Performance Cluster System, having comprised: step S1: the environmental variance that is written into described bioinformatics class application program; Step S2: system type and network configuration according to current mounting platform are selected corresponding math library; Step S3: utilize described environmental variance and described math library, described bioinformatics class application program is installed.
In described method, before described step S2, described method also comprises: whether whether the source program that checks described bioinformatics class application program exists with the installation targets file can normally create, if so, and execution in step S2 then.
In described method, before described step S2, described method also comprises: described system type and the described network configuration of obtaining current mounting platform.
In described method, described system type comprises the operating system version of current mounting platform.
In described method, the described system type that obtains current mounting platform comprises: the operating system version that obtains current mounting platform by the system file of checking current mounting platform.
In described method, described network configuration comprises whether disposing the Infiniband network interface card.
In described method, described system type and the described network configuration of obtaining current mounting platform comprise: the operating system version that obtains current mounting platform by the system file of checking current mounting platform; Check and whether configured the Infiniband network interface card in the current mounting platform; And check whether described Infiniband network interface card has been installed and drive and whether can normally move.
In described method, described environmental variance comprises integrated environment variable and specific environment variable, described integrated environment variable comprises the installation targets path of installation process subroutine, described bioinformatics class application program source program point and described bioinformatics class application program, and described specific environment variable comprises compiler and MPI.
In described method, described method also comprises: the output information that generates in the installation process is preserved.
In described method, described method also comprises: the script example that is created on described High Performance Cluster System submit job for described bioinformatics class application program, wherein, described script example content comprises resource bid mode and the application program method of operation of described bioinformatics class application program.
Improved the efficient that installation bioinformatics class application program in the High Performance Cluster System is installed by the method that bioinformatics class application program is installed provided by the present invention in High Performance Cluster System.
Description of drawings
Accompanying drawing is used to provide a further understanding of the present invention, and consists of the part of instructions, is used for together with embodiments of the present invention explaining the present invention, is not construed as limiting the invention.In the accompanying drawings:
Fig. 1 is the process flow diagram according to the General Implementing example of the method that bioinformatics class application program is installed in High Performance Cluster System of the present invention;
Fig. 2 is the process flow diagram according to the specific embodiment of the method that bioinformatics class application program is installed in High Performance Cluster System of the present invention;
Fig. 3 is the process flow diagram according to the example of the method that bioinformatics class application program is installed in High Performance Cluster System of the present invention.
Embodiment
Below in conjunction with accompanying drawing the preferred embodiments of the present invention are described, should be appreciated that preferred embodiment described herein only is used for description and interpretation the present invention, is not intended to limit the present invention.
Fig. 1 is the process flow diagram according to the General Implementing example of the method that bioinformatics class application program is installed in High Performance Cluster System of the present invention.In Fig. 1:
Step S100: the environmental variance that is written into bioinformatics class application program.Wherein, environmental variance can comprise integrated environment variable and specific environment variable, the integrated environment variable comprises the installation targets path of installation process subroutine, bioinformatics class application program source program point and bioinformatics class application program, and the specific environment variable comprises compiler and MPI.
Step S102: system type and network configuration according to current mounting platform are selected corresponding math library.Wherein, system type can comprise the operating system version of current mounting platform.In a preferred embodiment, can obtain by the system file of checking current mounting platform the operating system version of current mounting platform.Wherein, operating system can comprise the mainstream high performance Clustering OSs such as Red Hat, Suse, CentOS.In addition, network configuration can comprise whether disposing the Infiniband network interface card, can also detect further whether the Infiniband network interface card is equipped with driver and whether this network interface card normally moves.
Step S104: utilize environmental variance and math library that bioinformatics class application program is installed.
Simplify the installation procedure of bioinformatics class application program by the method for the disclosed installation bioinformatics of the present embodiment class application program, reduced installation difficulty; Be mounted to power and installation quality by what the modes such as dependence judgement, fault-tolerance judgement, standard configurations had improved application program, at utmost avoided the human operational error; Greatly improve the installation of bioinformatics class application program by unattended mode and disposed efficient, saved time and manpower.
Fig. 2 is the process flow diagram according to the specific embodiment of the method that bioinformatics class application program is installed in High Performance Cluster System of the present invention.In Fig. 2:
Step S200: the environmental variance that is written into bioinformatics class application program.Wherein, environmental variance can comprise integrated environment variable and specific environment variable, the integrated environment variable comprises the installation targets path of installation process subroutine, bioinformatics class application program source program point and bioinformatics class application program, and the specific environment variable comprises compiler and MPI.
Step S202: whether whether the source program of inspection bioinformatics class application program exists with the installation targets file can normally create, and if so, then carries out following step, if not, then withdraws from installation.
Step S204: system type and the network configuration of obtaining current mounting platform.Wherein, system type can comprise the operating system version of current mounting platform.In a preferred embodiment, can obtain by the system file of checking current mounting platform the operating system version of current mounting platform.Wherein, operating system can comprise the mainstream high performance Clustering OSs such as Red Hat, Suse, CentOS.In addition, network configuration can comprise whether disposing the Infiniband network interface card, can also detect further whether the Infiniband network interface card is equipped with driver and whether this network interface card normally moves.
Step S206: system type and network configuration according to current mounting platform are selected corresponding math library.
Step S208: utilize the compiling of environmental variance and math library that bioinformatics class application program is installed.
Step S210: be created on the script example of High Performance Cluster System submit job for bioinformatics class application program, wherein, the script example content comprises resource bid mode and the application program method of operation of bioinformatics class application program.For application program generates the script example at the group system submit job, example file comprises two parts: how to apply for computational resource, how to run application.High Performance Cluster System one general configuration job scheduling system, the script the inside of preparing comprises how applying for computational resource, fill order etc. how, this part content and application program are irrelevant, depend on the setting of dispatching system, the most frequently used pbs dispatching system of selecting among the present invention, the parameter that needs to arrange has " #PBS-lnodes=1:ppn=2 ", " #PBS-q low " etc.; Exectorial mode can be different according to the network condition that different application programs, early stage detect, if configured the Infiniband network, what select is the mpi storehouse of Openmpi, need to add "--mca btl self, openib " parameter etc.The resource required according to actual conditions when using made simple modification and got final product.
Simplify the installation procedure of bioinformatics class application program by the method for the disclosed installation bioinformatics of the present embodiment class application program, reduced installation difficulty; Be mounted to power and installation quality by what the modes such as dependence judgement, fault-tolerance judgement, standard configurations had improved application program, at utmost avoided the human operational error; Greatly improve the installation of bioinformatics class application program by unattended mode and disposed efficient, saved time and manpower.
Fig. 3 is the process flow diagram according to the example of the method that bioinformatics class application program is installed in High Performance Cluster System of the present invention.In this example:
The first step: loader bag integrated environment variable mainly comprises the subroutine that needs in the installation process to use, application program source program position, installation targets path etc.
Second step: be written into the environmental variance that set up applications needs, most bioinformatics class application program is multi-thread formula programming, can only move at the separate unit server, and the environmental variance that need to be written into mainly is compiler.Individual application is supported multi-node parallel, and such as mpiblast, mpihummer etc. also need to be written into the environmental variance in MPI storehouse except compiler.After being written into the environment that needs, whether test compiler etc. can normally use.
The 3rd step: whether the source program that checks application program exists, whether the installation targets file normally creates etc.
The 4th step: check the High-Performance Computing Cluster environment, comprise operating system version, network etc.
Operating system version can obtain by checking the system file setting, and the operating system of supporting at present comprises the mainstream high performance Clustering OSs such as Red Hat, Suse, CentOS.
Network Check mainly is need to carry out this part content when supporting the application program of multinode to install, such as mpiblast, mpihummer, whether Detection of content for having configured High Speed I nfiniband net, and the intended application program uses this network, mainly from released by checking two sections:
(1) checks in the server whether configured High Speed I nfiniband network interface card.
Whether be that the Infiniband network interface card has been installed driving (2), whether network card status is normal.
The 5th step: according to the system information that obtains, the variable that the Lookup protocol installation needs compiles installation to program.
The 6th step: for application program generates the script example at the group system submit job, example file comprises two parts: how to apply for computational resource, how to run application.High Performance Cluster System one general configuration job scheduling system, the script the inside of preparing comprises how applying for computational resource, fill order etc. how, this part content and application program are irrelevant, depend on the setting of dispatching system, the most frequently used pbs dispatching system of selecting among the present invention, the parameter that needs to arrange has " #PBS-lnodes=1:ppn=2 ", " #PBS q low " etc.; Exectorial mode can be different according to different application programs, for mpiblast, the application program of this support multinode of mpihummer, if configured the Infiniband network, what select is the mpi storehouse of Openmpi, need to add "--mca btl self, openib " parameter, if other single node application program does not then need specified network parameter etc.The resource required according to actual conditions when using made simple modification and got final product.
Routine package can be preserved the output that installation process produces in installation process, if improper withdrawing from can be checked the file of preservation, searches causes of mistake.
The use of routine package: the order that has an install.sh behind the routine package decompress(ion), enter the file of routine package, fill order: the name of ./install.sh--<application program 〉.Can realize afterwards the Auto-mounting of application program.
The present invention proposes a kind of automatic installation method of HPCC bioinformatics class application program.Greatly simplify the installation procedure of bioinformatics class application program by the mode of robotization, reduced installation difficulty; Be mounted to power and installation quality by what the modes such as dependence judgement, fault-tolerance judgement, standard configurations had improved application program, at utmost avoided the human operational error; Greatly improve the installation of bioinformatics class application program by unattended mode and disposed efficient, saved time and manpower.The method and program are widely used in the automatic Fast Installation of the HPCC bioinformatics class application program of different scales and dispose, and also are applicable in the dynamically changeable environment (such as cloud computing) and interim computational resource is carried out high-performance calculation program environment rapid configuration dispose.
The above is the preferred embodiments of the present invention only, is not limited to the present invention, and for a person skilled in the art, the present invention can have various modifications and variations.Within the spirit and principles in the present invention all, any modification of doing, be equal to replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (10)

1. the method that bioinformatics class application program is installed in High Performance Cluster System is characterized in that, comprising:
Step S1: the environmental variance that is written into described bioinformatics class application program;
Step S2: system type and network configuration according to current mounting platform are selected corresponding math library;
Step S3: utilize described environmental variance and described math library that described bioinformatics class application program is installed.
2. method according to claim 1, it is characterized in that, before described step S2, described method also comprises: whether whether the source program that checks described bioinformatics class application program exists with the installation targets file can normally create, if so, execution in step S2 then.
3. method according to claim 1 and 2 is characterized in that, before described step S2, described method also comprises: described system type and the described network configuration of obtaining current mounting platform.
4. method according to claim 3 is characterized in that, described system type comprises the operating system version of current mounting platform.
5. method according to claim 4 is characterized in that, the described system type that obtains current mounting platform comprises: the operating system version that obtains current mounting platform by the system file of checking current mounting platform.
6. method according to claim 5 is characterized in that, described network configuration comprises whether disposing the Infiniband network interface card.
7. method according to claim 6 is characterized in that, described system type and the described network configuration of obtaining current mounting platform comprise:
Obtain the operating system version of current mounting platform by the system file of checking current mounting platform;
Check and whether configured the Infiniband network interface card in the current mounting platform; And
Whether check whether described Infiniband network interface card has been installed drives and can normally move.
8. method according to claim 7, it is characterized in that, described environmental variance comprises integrated environment variable and specific environment variable, described integrated environment variable comprises the installation targets path of installation process subroutine, described bioinformatics class application program source program point and described bioinformatics class application program, and described specific environment variable comprises compiler and MPI.
9. method according to claim 1 is characterized in that, described method also comprises: the output information that generates in the installation process is preserved.
10. method according to claim 1, it is characterized in that, described method also comprises: the script example that is created on described High Performance Cluster System submit job for described bioinformatics class application program, wherein, described script example content comprises resource bid mode and the application program method of operation of described bioinformatics class application program.
CN2013102601126A 2013-06-26 2013-06-26 Method for installing bioinformatics application programs in high-performance cluster system Pending CN103324509A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2013102601126A CN103324509A (en) 2013-06-26 2013-06-26 Method for installing bioinformatics application programs in high-performance cluster system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2013102601126A CN103324509A (en) 2013-06-26 2013-06-26 Method for installing bioinformatics application programs in high-performance cluster system

Publications (1)

Publication Number Publication Date
CN103324509A true CN103324509A (en) 2013-09-25

Family

ID=49193276

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2013102601126A Pending CN103324509A (en) 2013-06-26 2013-06-26 Method for installing bioinformatics application programs in high-performance cluster system

Country Status (1)

Country Link
CN (1) CN103324509A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106126976A (en) * 2016-06-15 2016-11-16 北京市计算中心 Biological information analysis system in server
CN106445605A (en) * 2016-09-30 2017-02-22 郑州云海信息技术有限公司 Method for silent installation of ICC compiling environment
CN109960645A (en) * 2017-12-22 2019-07-02 迈普通信技术股份有限公司 Script testing method, device and script test macro

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040060035A1 (en) * 2002-09-24 2004-03-25 Eric Ustaris Automated method and system for building, deploying and installing software resources across multiple computer systems
CN101937351A (en) * 2010-09-15 2011-01-05 深圳市任子行网络技术股份有限公司 Method and system for automatically installing application software
CN102141924A (en) * 2010-01-29 2011-08-03 迈普通信技术股份有限公司 Batch production method of Linux boards and production server

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040060035A1 (en) * 2002-09-24 2004-03-25 Eric Ustaris Automated method and system for building, deploying and installing software resources across multiple computer systems
CN102141924A (en) * 2010-01-29 2011-08-03 迈普通信技术股份有限公司 Batch production method of Linux boards and production server
CN101937351A (en) * 2010-09-15 2011-01-05 深圳市任子行网络技术股份有限公司 Method and system for automatically installing application software

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
严冰等: "《Linux程序设计》", 2 February 2012, 浙江大学出版社 *
孙靖 等: "InfiniBand技术及其在Linux系统中的配置简介", 《HTTP://WWW.IBM.COM/DEVELOPERWORKS/CN/LINUX/L-CN-INFINIBAND/》 *
高俊峰 等: "《国产Linux基础应用》", 31 July 2012 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106126976A (en) * 2016-06-15 2016-11-16 北京市计算中心 Biological information analysis system in server
CN106445605A (en) * 2016-09-30 2017-02-22 郑州云海信息技术有限公司 Method for silent installation of ICC compiling environment
CN109960645A (en) * 2017-12-22 2019-07-02 迈普通信技术股份有限公司 Script testing method, device and script test macro
CN109960645B (en) * 2017-12-22 2022-10-18 迈普通信技术股份有限公司 Script test method and device and script test system

Similar Documents

Publication Publication Date Title
CN105765528B (en) Method, system and medium with the application execution path trace that configurable origin defines
Knight et al. PyCogent: a toolkit for making sense from sequence
Wilton et al. Arioc: high-throughput read alignment with GPU-accelerated exploration of the seed-and-extend search space
Abreu et al. MOLA: a bootable, self-configuring system for virtual screening using AutoDock4/Vina on computer clusters
JP2017520842A (en) System and method for software analysis
Stuckey et al. The G12 project: Mapping solver independent models to efficient solutions
Kathiresan et al. Accelerating next generation sequencing data analysis with system level optimizations
Herzeel et al. elPrep 4: A multithreaded framework for sequence analysis
CN103324509A (en) Method for installing bioinformatics application programs in high-performance cluster system
EP3311265A1 (en) A computing platform and method thereof for searching, executing, and evaluating computational algorithms
Feldt et al. Atomdroid: a computational chemistry tool for mobile platforms
Ocaña et al. Exploring large scale receptor-ligand pairs in molecular docking workflows in HPC clouds
Sarwar et al. Database search, alignment viewer and genomics analysis tools: big data for bioinformatics
Ahmed et al. GPU acceleration of Darwin read overlapper for de novo assembly of long DNA reads
WO2004077023A2 (en) High-throughput structure and electron density determination
Brookes et al. Beyond the US-SOMO-AF database: a new website for hydrodynamic, structural, and circular dichroism calculations on user-supplied structures
Sarkar et al. Automated simultaneous analysis phylogenetics (ASAP): an enabling tool for phlyogenomics
CN111312342A (en) Computer-aided medicine design system of electronic structure
Stansfield et al. R Tutorial: Detection of Differentially Interacting Chromatin Regions From Multiple Hi‐C Datasets
Olsson et al. Using Molecular Dynamics in the Study of Molecularly Imprinted Polymers
Badaczewska-Dawid et al. Protocols for all-atom reconstruction and high-resolution refinement of protein–peptide complex structures
Cui et al. A practical off-line taint analysis framework and its application in reverse engineering of file format
CN108885574B (en) System for monitoring and reporting performance and correctness issues at design, compilation, and runtime
Pina-Martins et al. NCBI Mass Sequence Downloader–Large dataset downloading made easy
CN103136060B (en) Progress control method and operating control device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20130925

RJ01 Rejection of invention patent application after publication