CN104361113B - A kind of OLAP query optimization method under internal memory flash memory mixing memory module - Google Patents
A kind of OLAP query optimization method under internal memory flash memory mixing memory module Download PDFInfo
- Publication number
- CN104361113B CN104361113B CN201410717830.6A CN201410717830A CN104361113B CN 104361113 B CN104361113 B CN 104361113B CN 201410717830 A CN201410717830 A CN 201410717830A CN 104361113 B CN104361113 B CN 104361113B
- Authority
- CN
- China
- Prior art keywords
- flash
- vector
- memory
- packet
- olap
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2237—Vectors, bitmaps or matrices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24534—Query rewriting; Transformation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0673—Single storage device
- G06F3/068—Hybrid storage device
Abstract
The present invention relates to the OLAP query optimization method under a kind of internal memory flash memory mixing memory module, it includes:OLAP storages are divided between relatively small DRAM and relatively large flash storages using the storage model of flash aware by the locality of data access, and storage optimization is carried out on the two-stage internal memory of isomery;Memory OLAP uses storage of array, and each attribute column is stored in continuous array location, and traditional attended operation is reduced into array index accesses, and carries out the OLAP query treatment of AIR algorithms;Wherein AIR is accessed for array index;OLAP query treatment based on storage of array and AIR algorithms is decomposed into three data access processes of order;Measure column specified metric value is stored by selecting vector to access flash;Using K keyword connection bitmap of storage is optimized in DRAM flash two-level memories on the basis of the bit join index based on keyword, two grades of connection bitmap index structures are formed.The present invention can improve memory storage cost performance, internal memory and CPU service efficiencies and data storage efficiency, can be widely used in general purpose O LAP application scenarios.
Description
Technical field
It is special the present invention relates to storage optimization in a kind of database field and OLAP (on-line analytical processing) enquiring and optimizing method
It is not (to be dodged with DRAM (dynamic random access memory) and Flash suitable for memory database one machine platform on a kind of
Deposit) OLAP query optimization method under internal memory-flash memory mixing memory module based on two-level memory.
Background technology
Memory analysis treatment (memory OLAP) are the important technologies that big data is analyzed and processed in real time, at big internal memory and multinuclear
Manage under the support of device parallel processing capability, memory OLAP has excellent real-time analyzing and processing ability, but relative to other storages
Equipment, such as flash, disk, internal memory are still very expensive storage medium, and higher than flash by one in storage energy consumption
The individual order of magnitude (DRAM:~100mW/GB, NAND flash:1-10mW/GB), memory OLAP analyzing and processing needs are with big data
Basis, the hardware cost of memory analysis treatment is very high.Used as a kind of Large Copacity, (hundreds of GB to TB grades is deposited PCIe Flash Card
Storage capacity) high speed storing technology, it has been widely used in high-performance data storehouse field, such as Oracle Exadata X3 internal memories
Database all-in-one is configured with the high speed flash card of Large Copacity, and provides Smart Flash Cache caching dsc datas, by number
According to storehouse logic optimization cache algorithm and cache optimization strategy can be specified based on table.On the one hand the application of high speed flash card is costliness
Memory storage provide cheap secondary storage extended capability, high speed flash card is mainly used as database and exists but then
Extension caching on flash, extends the capacity of memory cache (buffer), but not with the storage optimization of memory OLAP and
Query processing optimisation technique combines, and does not realize the OLAP optimisation techniques of flash-aware in OLAP algorithm aspects.
Current analytic type memory database is main with DRAM as main storage device, and flash is used as to substitute the standby of disk
Storage or disk buffering, during also the OLAP algorithms that flash includes internal memory are not designed.How by internal memory and Large Copacity flash
The secondary storage model application for being formed makes the Mainstream Platform of high-performance, high performance-price ratio to memory analysis process field,
And memory database will not only support the analyzing and processing of complete memorymodel, it is also desirable under supporting DRAM-flash two-level memories
Memory analysis be treated as technical problem urgently to be resolved hurrily.
The content of the invention
Regarding to the issue above, it is an object of the invention to provide the OLAP query under a kind of internal memory-flash memory mixing memory module
Optimization method, the method is based on DRAM-flash two-level memories, and memory storage sexual valence is improved by the cheap flash of Large Copacity
Than.Meanwhile, the method can effectively improve data storage efficiency.Further, the method that the present invention is provided can effectively improve internal memory
With CPU service efficiencies.
To achieve the above object, the present invention takes following technical scheme:Under a kind of internal memory-flash memory mixing memory module
OLAP query optimization method, it is comprised the following steps:1) OLAP storages use the storage model of flash-aware, i.e. basis
The characteristics of dimension table is smaller in OLAP stars or snowflake shape model, predicate operation is more and true table are made up of external key and metric attribute
The characteristics of, divided by the locality of data access between relatively small DRAM and relatively large flash storages, different
Storage optimization is carried out on the two-stage internal memory of structure;2) memory OLAP uses storage of array, and each attribute column is stored in continuous array
In unit, table is made up of each attribute array of equal length, and dimension table uses array index as major key, and true off-balancesheet key is dimension table
The data subscript of middle respective record, fact token record can directly position corresponding array location in dimension table according to foreign key value, will
Traditional attended operation is reduced to array index access, carries out the OLAP query treatment of AIR algorithms;Wherein AIR is array index
Access;3) the OLAP query treatment based on storage of array and AIR algorithms is decomposed into three data access processes of order:Dimension table
Access, true off-balancesheet key is accessed and true metric table attribute access, the intermediate data structure that three phases are produced includes:Dimension table mistake
Filter packet vector, selection vector sum packet vector, packet Multidimensional numerical;Dimension table filter packets vector, selection vector sum be grouped to
It is the shared data structure of each inquiry to measure, and different inquiries only needs to the content of renewal vector, is grouped Multidimensional numerical according to looking into
Inquiry different and dynamic is generated;4) use and deposited in DRAM-flash two-stages on the basis of the bit join index based on keyword
Optimize K keyword connection bitmap of storage in storage, therefrom select n high frequency access relation word and be stored in corresponding bitmap
DRAM, remaining bitmap is stored in flash, forms two grades of connection bitmap index structures.
The step 1) in, using memory storage engine, dimension table is resident DRAM;During external key in true table is multidimensional analysis
Visiting frequency row higher during Y-connection, in equally residing in DRAM;The metric attribute of true table is stored in flash, and
Opsition dependent on measure column is provided and accesses the random access that API supports measure column.
The step 3) in, dimension table filter packets vector, selection vector sum packet vector, packet Multidimensional numerical are stored in
In DRAM;True metric table attribute is stored in flash.
The step 2) in, the OLAP query treatment of the AIR algorithms is comprised the following steps:1. OLAP query is decomposed into
Packet filtering operation on dimension table:Selection in inquiry and division operation are divided by dimension table, is each by query decomposition
Subquery on individual dimension table;2. generation dimension table filter packets are vectorial:Each dimension table is carried out according to respective where clause to record
Filter and be projected out to meet the packet attributes of alternative condition, the packet attributes for meeting alternative condition carry out dictionary table compression, dictionary
Table is stored in array, and dictionary compression code is dictionary array index, and alternative condition is the position of false in filter packets attribute
- 1 is set to, packed compressed coding is otherwise set to, and record in the filter packets vector isometric with dimension table;3. true off-balancesheet key is more
Scanning is plowed to create selection and be grouped vectorial:True table is scanned successively by dimension table filter packets vector selection rate order from low to high
Corresponding foreign key column, the position that corresponding dimension table filter packets vector is specified is mapped to during per column scan by foreign key value, and vector value is non-
Will the current fact table record position insertion selection vector when negative;According to the note in current selection vector during next external key column scan
Record position uses random access manner, and updates selection vector by corresponding dimension table filter packets vector, and deletion is unsatisfactory for follow-up
The fact that external key mapping condition table record position;After each foreign key column end of scan, selection vector meets in have recorded true table
The record position of whole condition of contacts;When total selection rate is higher, packet vector is updated while selection vector is updated, with dimension
The subscript of current group incrementally calculates packet Multidimensional numerical subscript in table filter packets vector;It is first when total selection rate is very low
First generation selection vector, finally by each foreign key column of position random access of selection vector, disposable generation packet vector;4. pass through
Selection vector accesses flash storage measure column specified metric values:Position in selection vector accesses the degree in flash storages
Amount property value;Performed using multi-core parallel concurrent according to random access of the selection vector on flash;5. will measurement by being grouped vector
Value is assembled in Multidimensional numerical is grouped:According to the packet subscript that packet vector is recorded, the measurement for returning will be stored from flash
Aggregation computation is carried out in the packet Multidimensional numerical unit that property value " pushing away " is indicated to subscript value in packet vector, whole is completed
OLAP query treatment;6. each dimension back mapping of Multidimensional numerical will be grouped to acquisition packet attributes value in packet dictionary table and will be exported
Query Result.
The step 2. in, the filter packets vector be one dynamic generation dimension table additional column, instead of dimension table and thing
Real table is attached the coding for operating and providing linkage record packet attributes on leading dimension;When there is no packet attributes on dimension table
When, filter packets vector is reduced to a bitmap, for the connection filtering of true off-balancesheet key.
The step 4) in, DRAM-flash two-stage bitmap connecting strands are used on the basis of keyword bit join index
Draw method, n most frequently used conduct is accurately selected in K keyword connection bitmap according to memory storage space quota
DRAM is resident bit join index, and remaining K-n keyword bitmap is stored in flash as secondary index.
Due to taking above technical scheme, it has advantages below to the present invention:1st, the present invention is due to using DRAM-flash
Two-level memory as memory OLAP storage platform.Relative to complete memory storage model, DRAM-flash two-level memories are significantly
Degree reduces the demand to expensive internal memory, the holistic cost of hardware is reduced, by flash-aware storage models and AIR
The optimization of OLAP algorithms, a large amount of metric attributes to being stored on flash use efficient random access, reduce flash and deposit
The performance gap of storage.2nd, the AIR OLAP queries Processing Algorithm that the present invention is used draws a complete OLAP query processing procedure
It is divided into three independent processing stages, the data of small percentage, each dimension is pertained only in dimension table and true off-balancesheet key processing stage
Table generates a filter packets vector for fixed length;Inquire about how much the number of columns being related to all only needs to a selection on true table
The initial length of one packet vector of vector sum, selection and packet vector is determined by the upper minimum selection rate of dimension, in true off-balancesheet
Length constantly shortens in the Y-connection of key, and memory storage expense is limited;The maximum measurement category of memory space accounting in database
Property storage in Large Copacity flash, traditional line accesses (including the line in row storage and row storage is accessed) OLAP and looks into
Ask Hash connection and the operation of Hash packet aggregation that treatment completes pipeline system during the full table scan of true table, AIR OLAP
Search algorithm will be connected, division operation and aggregation operator are decomposed, the full table scan commonly used in traditional database is not used but
Connection division operation is completed on the fact that limited amount off-balancesheet key, is then accessed according to the extremely low selection vector opsition dependent of selection rate
The specified location of specified metric row, shifts the data access operation that flash is stored onto final processing stage, greatly reduces
Data access load on flash, while can also give play to the good random access performances of flash.3rd, the present invention is due to adopting
Bit join index is a kind of increment index, because the index upgrade operation that fact token record increases and produces is the suitable of bitmap
Sequence increases, and node content and structure are updated rather than as B+- trees index, can eliminate the data that data update on flash
Cost is rewritten, therefore is more suitable for flash storages.DRAM-flash two-stage memory bitmap join index methods are based on crucial word bit
Two-stage is used to keyword bitmap by key word of the inquiry visiting frequency and memory storage space on the basis of figure join index
Memory module, DRAM is stored in by the keyword bitmap that n high frequency in K keyword bitmap is used, remaining K-n crucial word bit
Figure is stored in flash, reduces the storage overhead of internal memory index.4th, the present invention be by OLAP patterns and load the characteristics of and
The locality characteristic (degree that i.e. data are accessed frequently) of data sets up isomery storage model in OLAP query Processing Algorithm, with
The intermediate data structure that the row group of different pieces of information locality intensity and inquiry are relied in table, index, table is object, according to internal memory
Capacity and data access performance constraint are by its Optimum distribution in high-performance but the relatively small DRAM of capacity and Large Copacity but performance
In relatively low flash storages, data storage efficiency is improved.5th, the present invention is according to DRAM and flash in query processing
The characteristic of storage, execution is postponed by the data access on flash, and the complete query processing processes of OLAP are divided into internal memory treatment
Two stages are calculated with flash, the different flowing water stored on dial-tone stages are parallel between supporting OLAP query, improve internal memory and CPU
Service efficiency.The present invention can be widely applied in general purpose O LAP application scenarios.
Brief description of the drawings
Fig. 1 is storage schematic diagram of the memory OLAP of the invention on DRAM-flash two-level memories;
Fig. 2 is query processing schematic diagram of the memory OLAP on DRAM-flash two-level memories in the embodiment of the present invention;
Fig. 3 is the keyword bit join index treatment schematic diagram on DRAM-flash two-level memories of the invention.
Specific embodiment
Existing memory OLAP technology generally uses complete internal memory computation schema, or Large Copacity flash is used as at a high speed
Caching, the former increased the cost of internal memory calculating, and the latter is difficult to optimize OLAP on from memory expansion to flash.Therefore, this hair
The bright OLAP query optimization side proposed under a kind of internal memory based on DRAM-flash two-level memories-flash memory mixing memory module
Method, storage of the optimization OLAP data on internal memory and flash according to the characteristics of OLAP patterns, load and OLAP algorithms, towards
Two-level memory feature optimizes OLAP query algorithm.The present invention is applied to general purpose O LAP application scenarios.Below in conjunction with the accompanying drawings and implement
Example is described in detail to the present invention.
As shown in figure 1, the present invention provides the OLAP query optimization method under a kind of internal memory-flash memory mixing memory module, should
Method is based on DRAM-flash two-level memories, towards the DRAM- constituted using DRAM and Large Copacity PCIe Flash Card
Memory OLAP enquiring and optimizing method on flash two-level memories, it is comprised the following steps:
1) OLAP storages use the storage model of flash-aware, i.e., according to dimension table in OLAP stars or snowflake shape model
The characteristics of the characteristics of operation of smaller, predicate is more and true table are made up of external key and metric attribute, in relatively small DRAM and
Divided by the locality of data access between relatively large flash storages, storage is carried out on the two-stage internal memory of isomery excellent
Change, to improve the efficiency that internal memory medium-high frequency uses data.
Due to the renewal that dimension table is smaller and modern OLAP supports on dimension table, therefore memory storage engine is used, dimension table is resident
DRAM.Visiting frequency row higher when external key in true table is Y-connection in multidimensional analysis, in equally residing in DRAM.Thing
The metric attribute of real table is more, OLAP query generally just for the metric in a small number of metric attributes and extremely low selection rate, therefore
It is stored in the flash of cheap mass, and opsition dependent access API on measure column is provided and supports that the opsition dependent of measure column is visited at random
Ask.
2) memory OLAP uses storage of array, and each attribute column is stored in continuous array location, and table is by equal length
Each attribute array composition, wherein more preferably, dimension table use array index as major key, true off-balancesheet key in dimension table accordingly to remember
The data subscript of record, fact token record can directly position corresponding array location in dimension table according to foreign key value, by traditional company
Connect operation and be reduced to array index access (ArrayIndex Reference, AIR), carry out at the OLAP query of AIR algorithms
Reason.
3) the OLAP query treatment based on storage of array and AIR algorithms is decomposed into three data access processes of order:
Dimension table is accessed, true off-balancesheet key is accessed and true metric table attribute access, and the intermediate data structure that three phases are produced includes:Dimension
Table filter packets vector, selection vector sum packet vector, packet Multidimensional numerical.Dimension table filter packets vector, selection vector sum point
Group vector is the shared data structure of each inquiry, and different inquiries only needs to the content of renewal vector;Packet Multidimensional numerical root
It is investigated that ask it is different and dynamic generate.This three classes intermediate data structure is reused during query processing, belongs to strong office
Property data set in portion's is, it is necessary to be stored in DRAM;Memory space shared by true metric table attribute is larger, but is generally only accessed in inquiry
Less metric attribute, and the record of extremely low ratio during random access metric attribute is arranged is only needed to by selecting vector, therefore
Metric attribute row can be stored in flash to reduce demand of the memory OLAP to DRAM, reduce the hardware cost of system.
AIR OLAP algorithms of the present invention need selection vector, packet vector, dimension table filter packets vector, packet
The data such as Multidimensional numerical are used for OLAP query processing procedure, and these data structures can be reused between OLAP query, shared
Memory headroom is fixed, therefore resides in DRAM.
4) data warehouse optimizes or eliminates the connection cost between dimension table and true table usually using bit join index.
The present invention using based on keyword bit join index (i.e. according to keywords rather than whole attribute member come set up bitmap connect
Connect index) on the basis of optimize K keyword of storage in DRAM-flash two-level memories and connect bitmap (i.e. everyone keyword pair
A bitmap isometric with true table is answered, the table record position of the fact that corresponding to the keyword is recorded), therefrom select n high frequency
Corresponding bitmap is simultaneously stored in DRAM by access relation word, and remaining bitmap is stored in flash, forms two grades of connection bitmap indexs
Structure;
Wherein, the present invention uses DRAM-flash two-stage bit join indexes on the basis of keyword bit join index
Method, accurately n most frequently used conduct is selected according to memory storage space quota in K keyword connection bitmap
DRAM is resident bit join index, and remaining K-n keyword bitmap is used as secondary index.Specially:Index entry is certain dimension table
The epitope set of the fact that corresponding to property value figure, the size of bitmap index is the bitmap size of fixed length and multiplying for keyword quantity
Product, bitmap index can further reduce memory storage space by data compression.It is crucial by query execution daily record and inquiry
Word analysis, it may be determined that K most frequently used dimension attribute keyword and for it sets up bitmap index, according to memory headroom quota
Wherein n most frequently used keyword bitmap can be resided at internal memory, remaining K-n keyword bitmap is stored in flash,
Form two grades of connection bitmap index structures.
Above-mentioned steps 2) in, the OLAP query treatment of AIR algorithms is comprised the following steps:
1. OLAP query is decomposed into the packet filtering operation on dimension table:Dimension table is pressed into selection in inquiry and division operation
Divided, be the subquery on each dimension table by query decomposition.
2. generation dimension table filter packets are vectorial:Each dimension table is filtered and thrown according to respective where clause to record
Shadow goes out to meet the packet attributes of alternative condition, and the packet attributes for meeting alternative condition carry out dictionary table compression, and dictionary table is stored in
In array, dictionary compression code is dictionary array index, and alternative condition is set to -1 for the position of false in filter packets attribute,
Packed compressed coding is otherwise set to, and is recorded in the filter packets vector isometric with dimension table.
Filter packets vector is a dimension table additional column for dynamic generation, and operation is attached simultaneously with true table instead of dimension table
The coding of linkage record packet attributes on leading dimension is provided.When not having packet attributes on dimension table, filter packets vector simplifies
It is a bitmap, for the connection filtering of true off-balancesheet key.
3. true many times scanning of off-balancesheet key creates selection and packet vector:By dimension table filter packets vector selection rate by it is low to
Order high scans the corresponding foreign key column of true table successively, per column scan when by foreign key value be mapped to corresponding dimension table filter packets to
The position that amount is specified, will the current fact table record position insertion selection vector during vector value non-negative;During next external key column scan
Random access manner is used according to the record position in current selection vector, and choosing is updated by corresponding dimension table filter packets vector
Vector is selected, the fact that be unsatisfactory for follow-up external key mapping condition table record position is deleted;After each foreign key column end of scan, select to
Amount have recorded the record position of the whole condition of contacts of satisfaction in true table.When total selection rate is higher, selection vector is being updated
Packet vector is updated simultaneously, is incrementally calculated under packet Multidimensional numerical with the subscript of current group in dimension table filter packets vector
Mark;When total selection rate is very low, selection vector is firstly generated, finally by each foreign key column of position random access of selection vector, one
Secondary property generation packet vector.
4. measure column specified metric value is stored by selecting vector to access flash:Position in selection vector accesses
Metric attribute value in flash storages, it is only necessary to which returning to few metric carries out packet aggregation calculating;Flash has good
Parallel random access performance, performed using multi-core parallel concurrent according to selection random access of the vector on flash, reduce flash
Data access delay.
5. metric is assembled in Multidimensional numerical is grouped by being grouped vector:According under the packet that packet vector is recorded
Mark, will store the packet Multidimensional numerical list that the metric attribute value " pushing away " for returning is indicated to subscript value in packet vector from flash
Aggregation computation is carried out in unit, whole OLAP query treatment are completed.
6. each dimension back mapping of Multidimensional numerical will be grouped to acquisition packet attributes value in packet dictionary table and inquiry will be exported
As a result.
Embodiment:
As shown in Fig. 2 flash-aware is embodied in storage on flash by the optimization to OLAP query algorithm
The access of metric attribute is shifted onto finally, true table sequential scan is converted to true metric table attribute is performed according to selection vector
The random access of low selection rate, reduces the delay that metric attribute is accessed on flash.
Step 2) by taking following querying command as an example:
SELECT c_nation,s_nation,sum(l_revenue),sum(l_price)
FROM customer,lineorder,supplier
WHERE lo_custkey=c_custkey
And lo_suppkey=s_suppkey
And c_region='AMERICA'
And s_region='ASIA'
group by c_nation,s_nation;
Querying command is needed by true table l ineorder and dimension table customer, supplier connection, then by dimension table
C_region and s_region attributes true metric table row l_revenue and l_price are asked it is cumulative and.
As shown in figure 3, step 4 of the present invention) in keyword bit join index treatment on DRAM-flash two-level memories
Embodiment:
In traditional database, index is a kind of planar structure, it is assumed that used in identical storage hierarchy.B+- tree ropes
The index technology for the disk database such as drawing realizes that internal memory lacks node in disk and the data exchange of internal memory by buffering area mechanism,
But still be a kind of opaque mechanism, its index accesses efficiency depends on the efficiency that buffering area replaces algorithm, it is impossible to by data
Storehouse customized type optimization.In OLAP applications, index entry is present in dimension table attribute, but the object of index is then corresponding in true table
Linkage record, common B+- trees index can only retrieve the data on single table, for OLAP applications, the rope on less dimension table
Draw acceleration of the raising of access performance to OLAP query overall performance limited, and setting up index to true off-balancesheet key can add
Index storage overhead in attended operation between fast fact table and dimension table, but true off-balancesheet key is extremely huge, and modern
True table needs largely to update in real time in OLAP applications, and the renewal of index is costly.The present invention uses bitmap connecting strand
Draw method, i.e., be that the attribute member specified on dimension table sets up true table connection bitmap by attended operation, each member sets up one
Individual bitmap, indicates link position of the member in fact token record.Bit join index is a kind of increment index, in true table
Mass data only needs to incrementally extend bitmap lengths when inserting, it is not necessary to which the connection bitmap to having set up is reconstructed.
The bit join index used in traditional database is, for all members set up connection bitmap, to be faced with low gesture with attribute as granularity
Set attribute bit join index space expense is small but selection rate is too high, and power set Attributions selection rate high is low but connects bitmap quantity
It is many, the excessive contradiction of storage overhead.The present invention is using the TOP K visiting frequencies selection by the dimension attribute keyword in query load
Bit join index keyword, and for K global high frequency access key sets up global bit join index, form a key/
Value index structures, key is the global name of keyword, including table and keyword message, value are then connection bitmaps.It is different
There is identical to connect bitmap lengths for dimension table, the keyword of different attribute, can store in global bit join index.Wherein
More preferably, the present invention proposes DRAM-flash two-stages bit join index side on the basis of keyword bit join index
Method, accurately most frequently used n is selected as DRAM according to memory storage space quota in K keyword connection bitmap
Resident bit join index, remaining K-n keyword bitmap is used as secondary index.As shown in Figure 3, it is shown that key word of the inquiry position
Figure is respectively present in the application scenarios of DRAM and flash two-level memories, and the bitmap in DRAM first completes selection operation, generates
Filter bitmap, when the selection rate of filtered bitmap is in certain threshold range (Slow,Shigh) between when, according to generation filtered bitmap in
The position of " 1 " carries out random access to the specified location of corresponding bitmap in flash storages, determines the final logic knot in the position
Really.Flash has preferable random access ability, and the present invention is visited flash bitmaps parallel using to filtered bitmap sequential access
The strategy asked, improves the concurrent access performance of data on flash, reduces the delay that bitmap index is calculated, as shown in Figure 3 simultaneously
Row flash accesses thread.Assuming that the selection rate of filtered bitmap is S in DRAM1, the selection rate of flash Bitmaps is S2(bitmap
Selection rate can be accurately given by the quantity of " 1 "), TF(S1) it is the bitmap access delay on flash, TP(S2,S1) be
Index selection rate is S2And S1When query processing time difference, work as TP(S2,S1)-TF(S1)>When 0, the bitmap on flash accesses tool
There is query performance income.SlowAnd ShighTo meet minimum and highest selection rate difference (such as S of inquiry income2-S1)。
In sum, compared with prior art, the present invention is looked into using the memory OLAP based on DRAM-flash two-level memories
Optimization method is ask, the memory OLAP query processing on DRAM-flash two-level memories is supported, memory OLAP is reduced to expensive internal memory
Demand, improve memory OLAP cost performance.By the storage optimization method based on DRAM-flash two-level memories, OLAP query
Optimization method and optimiged index method, pellucidly optimize data access and Directory Enquiries rationality in OLAP query processing procedure
Energy.The present invention improves the access data managing capacity and query processing performance of memory OLAP big data simultaneously.
The various embodiments described above are merely to illustrate the present invention, wherein the structure of each part, connected mode and manufacture craft etc. are all
Can be what is be varied from, every equivalents carried out on the basis of technical solution of the present invention and improvement should not be excluded
Outside protection scope of the present invention.
Claims (8)
1. a kind of internal memory-flash memory mixes the OLAP query optimization method under memory module, and it is comprised the following steps:
1) OLAP storage using flash-aware storage model, i.e., according to dimension table in OLAP stars or snowflake shape model it is smaller,
The characteristics of the characteristics of predicate operation is more and true table are made up of external key and metric attribute, in relatively small DRAM and relatively
Divided by the locality of data access between big flash storages, storage optimization is carried out on the two-stage internal memory of isomery;
2) memory OLAP uses storage of array, and each attribute column is stored in continuous array location, and table is by each of equal length
Attribute array is constituted, and dimension table uses array index as major key, and true off-balancesheet key is the data subscript of respective record in dimension table, thing
The record of real token can directly position corresponding array location in dimension table according to foreign key value, and traditional attended operation is reduced into array
Subscript is accessed, and carries out the OLAP query treatment of AIR algorithms;Wherein AIR algorithms are array index access;
3) the OLAP query treatment based on storage of array and AIR algorithms is decomposed into three data access processes of order:Dimension table
Access, true off-balancesheet key is accessed and true metric table attribute access, the intermediate data structure that three phases are produced includes:Dimension table mistake
Filter packet vector, selection vector sum packet vector, packet Multidimensional numerical;Dimension table filter packets vector, selection vector sum be grouped to
It is the shared data structure of each inquiry to measure, and different inquiries only needs to the content of renewal vector, is grouped Multidimensional numerical according to looking into
Inquiry different and dynamic is generated;
4) using optimization storage K in DRAM-flash two-level memories on the basis of the bit join index based on keyword
Keyword connects bitmap, therefrom selects n high frequency access relation word and corresponding bitmap is stored in into DRAM, and remaining bitmap is deposited
Flash is stored in, two grades of connection bitmap index structures are formed.
2. a kind of internal memory-flash memory as claimed in claim 1 mixes the OLAP query optimization method under memory module, and its feature exists
In:The step 1) in, using memory storage engine, dimension table is resident DRAM;External key in true table is star in multidimensional analysis
Visiting frequency row higher during connection, in equally residing in DRAM;The metric attribute of true table is stored in flash, and is provided
Opsition dependent accesses the random access that API supports measure column on measure column.
3. a kind of internal memory-flash memory as claimed in claim 1 mixes the OLAP query optimization method under memory module, and its feature exists
In:The step 3) in, dimension table filter packets vector, selection vector sum packet vector, packet Multidimensional numerical are stored in DRAM;
True metric table attribute is stored in flash.
4. a kind of internal memory-flash memory as claimed in claim 2 mixes the OLAP query optimization method under memory module, and its feature exists
In:The step 3) in, dimension table filter packets vector, selection vector sum packet vector, packet Multidimensional numerical are stored in DRAM;
True metric table attribute is stored in flash.
5. a kind of internal memory as claimed in claim 1 or 2 or 3 or 4-flash memory mixes the OLAP query optimization side under memory module
Method, it is characterised in that:The step 2) in, the OLAP query treatment of the AIR algorithms is comprised the following steps:
1. OLAP query is decomposed into the packet filtering operation on dimension table:Selection in inquiry and division operation are carried out by dimension table
Divide, be the subquery on each dimension table by query decomposition;
2. generation dimension table filter packets are vectorial:Each dimension table is filtered and is projected out according to respective where clause to record
Meet the packet attributes of alternative condition, the packet attributes for meeting alternative condition carry out dictionary table compression, and dictionary table is stored in array
In, dictionary compression code is dictionary array index, and alternative condition is set to -1 for the position of false in filter packets attribute, otherwise
Packed compressed coding is set to, and is recorded in the filter packets vector isometric with dimension table;
3. true many times scanning of off-balancesheet key creates selection and packet vector:By dimension table filter packets vector selection rate from low to high
Order scans the corresponding foreign key column of true table successively, is mapped to corresponding dimension table filter packets vector by foreign key value during per column scan and refers to
Fixed position, will the current fact table record position insertion selection vector during vector value non-negative;During next external key column scan according to
Record position in current selection vector uses random access manner, and by corresponding dimension table filter packets vector update selection to
Amount, deletes the fact that be unsatisfactory for follow-up external key mapping condition table record position;After each foreign key column end of scan, selection vector note
The record position of the whole condition of contacts of satisfaction in true table is recorded;When total selection rate is higher, while selection vector is updated
Packet vector is updated, packet Multidimensional numerical subscript is incrementally calculated with the subscript of current group in dimension table filter packets vector;When
When always selection rate is very low, selection vector is firstly generated, it is disposably raw finally by each foreign key column of position random access of selection vector
Into packet vector;
4. measure column specified metric value is stored by selecting vector to access flash:Position in selection vector accesses flash
Metric attribute value in storage;Performed using multi-core parallel concurrent according to random access of the selection vector on flash;
5. metric is assembled in Multidimensional numerical is grouped by being grouped vector:According to the packet subscript that packet vector is recorded,
To be stored from flash return metric attribute value " pushing away " to packet vector in subscript value indicate packet Multidimensional numerical unit in
Aggregation computation is carried out, whole OLAP query treatment are completed;
6. each dimension back mapping of Multidimensional numerical will be grouped to acquisition packet attributes value in packet dictionary table and Query Result will be exported.
6. a kind of internal memory-flash memory as claimed in claim 5 mixes the OLAP query optimization method under memory module, and its feature exists
In:The step 2. in, the filter packets vector be one dynamic generation dimension table additional column, instead of dimension table with the fact table enter
Row attended operation simultaneously provides the coding of linkage record packet attributes on leading dimension;When there is no packet attributes on dimension table, filtering
Packet vector is reduced to a bitmap, for the connection filtering of true off-balancesheet key.
7. the OLAP query optimization under a kind of internal memory as described in claim 1 or 2 or 3 or 4 or 6-flash memory mixing memory module
Method, it is characterised in that:The step 4) in, the levels of DRAM-flash two are used on the basis of keyword bit join index
Figure join index method, accurately selects most frequently used according to memory storage space quota in K keyword connection bitmap
N be resident bit join index as DRAM, remaining K-n keyword bitmap is as secondary index storage in flash.
8. a kind of internal memory-flash memory as claimed in claim 5 mixes the OLAP query optimization method under memory module, and its feature exists
In:The step 4) in, DRAM-flash two-stages bit join index side is used on the basis of keyword bit join index
Method, accurately most frequently used n is selected as DRAM according to memory storage space quota in K keyword connection bitmap
Resident bit join index, remaining K-n keyword bitmap is stored in flash as secondary index.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410717830.6A CN104361113B (en) | 2014-12-01 | 2014-12-01 | A kind of OLAP query optimization method under internal memory flash memory mixing memory module |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410717830.6A CN104361113B (en) | 2014-12-01 | 2014-12-01 | A kind of OLAP query optimization method under internal memory flash memory mixing memory module |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104361113A CN104361113A (en) | 2015-02-18 |
CN104361113B true CN104361113B (en) | 2017-06-06 |
Family
ID=52528373
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410717830.6A Active CN104361113B (en) | 2014-12-01 | 2014-12-01 | A kind of OLAP query optimization method under internal memory flash memory mixing memory module |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104361113B (en) |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105930388B (en) * | 2016-04-14 | 2019-04-23 | 中国人民大学 | A kind of OLAP packet aggregation method based on functional dependencies |
CN108255829B (en) * | 2016-12-28 | 2021-10-19 | 腾讯科技(北京)有限公司 | Data searching method and device |
CN108733681B (en) * | 2017-04-14 | 2021-10-22 | 华为技术有限公司 | Information processing method and device |
KR102507140B1 (en) * | 2017-11-13 | 2023-03-08 | 에스케이하이닉스 주식회사 | Data storage device and operating method thereof |
CN109086456B (en) * | 2018-08-31 | 2020-11-03 | 中国联合网络通信集团有限公司 | Data indexing method and device |
US11199991B2 (en) | 2019-01-03 | 2021-12-14 | Silicon Motion, Inc. | Method and apparatus for controlling different types of storage units |
TWI739075B (en) * | 2019-01-03 | 2021-09-11 | 慧榮科技股份有限公司 | Method and computer program product for performing data writes into a flash memory |
CN111782734B (en) * | 2019-04-04 | 2024-04-12 | 华为技术服务有限公司 | Data compression and decompression method and device |
CN110647722B (en) * | 2019-09-20 | 2024-03-01 | 中科寒武纪科技股份有限公司 | Data processing method and device and related products |
US11386089B2 (en) | 2020-01-13 | 2022-07-12 | The Toronto-Dominion Bank | Scan optimization of column oriented storage |
CN112597114B (en) * | 2020-12-23 | 2023-09-15 | 跬云(上海)信息科技有限公司 | OLAP (on-line analytical processing) precomputation engine optimization method and application based on object storage |
CN115309947B (en) * | 2022-08-15 | 2023-03-21 | 北京欧拉认知智能科技有限公司 | Method and system for realizing online analysis engine based on graph |
CN116483831B (en) * | 2023-04-12 | 2024-01-30 | 上海沄熹科技有限公司 | Recommendation index generation method for distributed database |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6651055B1 (en) * | 2001-03-01 | 2003-11-18 | Lawson Software, Inc. | OLAP query generation engine |
CN102663114A (en) * | 2012-04-17 | 2012-09-12 | 中国人民大学 | Database inquiry processing method facing concurrency OLAP (On Line Analytical Processing) |
CN103309958A (en) * | 2013-05-28 | 2013-09-18 | 中国人民大学 | OLAP star connection query optimizing method under CPU and GPU mixing framework |
CN103631911A (en) * | 2013-11-27 | 2014-03-12 | 中国人民大学 | OLAP query processing method based on array storage and vector processing |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9501550B2 (en) * | 2012-04-18 | 2016-11-22 | Renmin University Of China | OLAP query processing method oriented to database and HADOOP hybrid platform |
-
2014
- 2014-12-01 CN CN201410717830.6A patent/CN104361113B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6651055B1 (en) * | 2001-03-01 | 2003-11-18 | Lawson Software, Inc. | OLAP query generation engine |
CN102663114A (en) * | 2012-04-17 | 2012-09-12 | 中国人民大学 | Database inquiry processing method facing concurrency OLAP (On Line Analytical Processing) |
CN103309958A (en) * | 2013-05-28 | 2013-09-18 | 中国人民大学 | OLAP star connection query optimizing method under CPU and GPU mixing framework |
CN103631911A (en) * | 2013-11-27 | 2014-03-12 | 中国人民大学 | OLAP query processing method based on array storage and vector processing |
Also Published As
Publication number | Publication date |
---|---|
CN104361113A (en) | 2015-02-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104361113B (en) | A kind of OLAP query optimization method under internal memory flash memory mixing memory module | |
CN103309958B (en) | The star-like Connection inquiring optimization method of OLAP under GPU and CPU mixed architecture | |
US8660985B2 (en) | Multi-dimensional OLAP query processing method oriented to column store data warehouse | |
CN104866608B (en) | Enquiring and optimizing method based on join index in a kind of data warehouse | |
CN103631911B (en) | OLAP query processing method based on storage of array and Vector Processing | |
US8762407B2 (en) | Concurrent OLAP-oriented database query processing method | |
CN103294831B (en) | Based on the packet aggregation computational methods of Multidimensional numerical in column storage database | |
CN103942342B (en) | Memory database OLTP and OLAP concurrency query optimization method | |
CN105868388B (en) | A kind of memory OLAP enquiring and optimizing method based on FPGA | |
CN102663116A (en) | Multi-dimensional OLAP (On Line Analytical Processing) inquiry processing method facing column storage data warehouse | |
CN108536692B (en) | Execution plan generation method and device and database server | |
US20120011144A1 (en) | Aggregation in parallel computation environments with shared memory | |
US9141666B2 (en) | Incremental maintenance of range-partitioned statistics for query optimization | |
CN113032427B (en) | Vectorization query processing method for CPU and GPU platform | |
EP2469423B1 (en) | Aggregation in parallel computation environments with shared memory | |
CN105488231A (en) | Self-adaption table dimension division based big data processing method | |
CN104750727B (en) | A kind of column memory storage inquiry unit and column memory storage querying method | |
CN107943952A (en) | A kind of implementation method that full-text search is carried out based on Spark frames | |
CN105975587A (en) | Method for organizing and accessing memory database index with high performance | |
CN106095863A (en) | A kind of multidimensional data query and storage system and method | |
CN109597829B (en) | Middleware method for realizing searchable encryption relational database cache | |
US11294816B2 (en) | Evaluating SQL expressions on dictionary encoded vectors | |
CN103365923A (en) | Method and device for assessing partition schemes of database | |
CN104809210B (en) | One kind is based on magnanimity data weighting top k querying methods under distributed computing framework | |
CN105359142A (en) | Hash join method, device and database management system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |