The inverted table format can deliver fast and flexible query capabilities, but is not widely used. ADABAS is probably the most successful implementation, but how often do you see that nowadays? Following is a description of how to implement inverted structures within a relational database. All code run on Oracle Database 12c, release 12.1.0.1.
Consider this table and a few rows, that describe the contents of my larder:
create table food(id number,capacity varchar2(10),container varchar2(10),item varchar2(10));
insert into food values(1,'large','bag','potatoes');
insert into food values(2,'small','box','carrots');
insert into food values(3,'medium','tin','peas');
insert into food values(4,'large','box','potatoes');
insert into food values(5,'small','tin','carrots');
insert into food values(6,'medium','bag','peas');
insert into food values(7,'large','tin','potatoes');
insert into food values(8,'small','bag','carrots');
insert into food values(9,'medium','box','peas');
The queries I run against the table might be "how many large boxes have I?" or "give me all the potatoes, I don't care about how they are packed". The idea is that I do not know in advance what columns I will be using in my predicate: it could be any combination. This is a common issue in a data warehouse.
So how do I index the table to satisfy any possible query? Two obvious possibilities:
First, build an index on each column, and the optimizer can perform an index_combine operation on whatever columns happen to be listed in the predicate. But that means indexing every column - and the table might have hundreds of columns. No way can I do that.
Second, build a concatenated index across all the columns: in effect, use an IOT. That will give me range scan access if any of the predicated columns are in the leading edge of the index key followed by filtering on the rest of the predicate. Or if the predicate does not include the leading column(s), I can get skip scan access and filter. But this is pretty useless, too: there will be wildly divergent performance depending on the predicate.
The answer is to invert the table:
create table inverted(colname varchar2(10),colvalue varchar2(10),id number);
insert into inverted select 'capacity',capacity,id from food;
insert into inverted select 'container',container,id from food;
insert into inverted select 'item',item,id from food;
Now just one index on each table can satisfy all queries:
create index food_i on food(id);
create index inverted_i on inverted(colname,colvalue);
To retrieve all the large boxes:
orclz> set autotrace on explain
orclz> select * from food where id in
2 (select id from inverted where colname='capacity' and colvalue='large'
3 intersect
4 select id from inverted where colname='container' and colvalue='box');
ID CAPACITY CONTAINER ITEM
---------- ---------- ---------- ----------
4 large box potatoes
Execution Plan
----------------------------------------------------------
Plan hash value: 1945359172
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | C
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 141 |
| 1 | MERGE JOIN | | 3 | 141 |
| 2 | TABLE ACCESS BY INDEX ROWID | FOOD | 9 | 306 |
| 3 | INDEX FULL SCAN | FOOD_I | 9 | |
|* 4 | SORT JOIN | | 3 | 39 |
| 5 | VIEW | VW_NSO_1 | 3 | 39 |
| 6 | INTERSECTION | | | |
| 7 | SORT UNIQUE | | 3 | 81 |
| 8 | TABLE ACCESS BY INDEX ROWID BATCHED| INVERTED | 3 | 81 |
|* 9 | INDEX RANGE SCAN | INVERTED_I | 3 | |
| 10 | SORT UNIQUE | | 3 | 81 |
| 11 | TABLE ACCESS BY INDEX ROWID BATCHED| INVERTED | 3 | 81 |
|* 12 | INDEX RANGE SCAN | INVERTED_I | 3 | |
---------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("ID"="ID")
filter("ID"="ID")
9 - access("COLNAME"='capacity' AND "COLVALUE"='large')
12 - access("COLNAME"='container' AND "COLVALUE"='box')
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
orclz>
Or all the potatoes:
orclz> select * from food where id in
2 (select id from inverted where colname='item' and colvalue='potatoes');
ID CAPACITY CONTAINER ITEM
---------- ---------- ---------- ----------
1 large bag potatoes
4 large box potatoes
7 large tin potatoes
Execution Plan
----------------------------------------------------------
Plan hash value: 762525239
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cos
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 183 |
| 1 | NESTED LOOPS | | | |
| 2 | NESTED LOOPS | | 3 | 183 |
| 3 | SORT UNIQUE | | 3 | 81 |
| 4 | TABLE ACCESS BY INDEX ROWID BATCHED| INVERTED | 3 | 81 |
|* 5 | INDEX RANGE SCAN | INVERTED_I | 3 | |
|* 6 | INDEX RANGE SCAN | FOOD_I | 1 | |
| 7 | TABLE ACCESS BY INDEX ROWID | FOOD | 1 | 34 |
---------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("COLNAME"='item' AND "COLVALUE"='potatoes')
6 - access("ID"="ID")
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
- this is an adaptive plan
orclz>
Of course, consideration needs to be given to handling more complex boolean expressions; maintaining the inversion is going to take resources; and a query generator has to construct the inversion code and re-write the queries. But In principle, this structure can deliver indexed access for unpredictable predicates of any number of any columns, with no separate filtering operation. Can you do that with a normalized star schema? I don't think so.
I hope this little thought experiment has stimulated the little grey cells, and made the point that relational structures are not always optimal for all problems.
--
John Watson
Oracle Certified Master DBA
http://skillbuilders.com