table 结构

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
Table data structure:
                                                         + optional
                                                        /
    +--------------+--------------+--------------+------+-------+-----------------+-------------+--------+
    | data block 1 |      ...     | data block n | filter block | metaindex block | index block | footer |
    +--------------+--------------+--------------+--------------+-----------------+-------------+--------+

    Each block followed by a 5-bytes trailer contains compression type and checksum.

Table block trailer:

    +---------------------------+-------------------+
    | compression type (1-byte) | checksum (4-byte) |
    +---------------------------+-------------------+

    The checksum is a CRC-32 computed using Castagnoli's polynomial. Compression
    type also included in the checksum.

Table footer:

      +------------------- 40-bytes -------------------+
     /                                                  \
    +------------------------+--------------------+------+-----------------+
    | metaindex block handle / index block handle / ---- | magic (8-bytes) |
    +------------------------+--------------------+------+-----------------+

    The magic are first 64-bit of SHA-1 sum of "http://code.google.com/p/leveldb/".

block 结构

若干 kv entry 和结尾的 trailer 组成了 block,kv entry做了前缀优化,并且设置restart point,记录完整key值。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
Block data structure:

      + restart point                 + restart point (depends on restart interval)
     /                               /
    +---------------+---------------+---------------+---------------+---------+
    | block entry 1 | block entry 2 |      ...      | block entry n | trailer |
    +---------------+---------------+---------------+---------------+---------+

Key/value entry:

              +---- key len ----+
             /                   \
    +-------+---------+-----------+---------+--------------------+--------------+----------------+
    | shared (varint) | not shared (varint) | value len (varint) | key (varlen) | value (varlen) |
    +-----------------+---------------------+--------------------+--------------+----------------+

    Block entry shares key prefix with its preceding key:
    Conditions:
        restart_interval=2
        entry one  : key=deck,value=v1
        entry two  : key=dock,value=v2
        entry three: key=duck,value=v3
    The entries will be encoded as follow:

      + restart point (offset=0)                                                 + restart point (offset=16)
     /                                                                          /
    +-----+-----+-----+----------+--------+-----+-----+-----+---------+--------+-----+-----+-----+----------+--------+
    |  0  |  4  |  2  |  "deck"  |  "v1"  |  1  |  3  |  2  |  "ock"  |  "v2"  |  0  |  4  |  2  |  "duck"  |  "v3"  |
    +-----+-----+-----+----------+--------+-----+-----+-----+---------+--------+-----+-----+-----+----------+--------+
     \                                   / \                                  / \                                   /
      +----------- entry one -----------+   +----------- entry two ----------+   +---------- entry three ----------+

    The block trailer will contains two restart points:

    +------------+-----------+--------+
    |     0      |    16     |   2    |
    +------------+-----------+---+----+
     \                      /     \
      +-- restart points --+       + restart points length

Block trailer:

      +-- 4-bytes --+
     /               \
    +-----------------+-----------------+-----------------+------------------------------+
    | restart point 1 |       ....      | restart point n | restart points len (4-bytes) |
    +-----------------+-----------------+-----------------+------------------------------+


NOTE: All fixed-length integer are little-endian.

filter block 结构

若干 filter data 和结尾的 trailer 组成了 Filter block。trailer 包含了各个 data 对应的 offset、trailer 自身 offset,还有 1-byte base Lg。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
Filter block data structure:

      + offset 1      + offset 2      + offset n      + trailer offset
     /               /               /               /
    +---------------+---------------+---------------+---------+
    | filter data 1 |      ...      | filter data n | trailer |
    +---------------+---------------+---------------+---------+

Filter block trailer:

      +- 4-bytes -+
     /             \
    +---------------+---------------+---------------+-------------------------------+------------------+
    | data 1 offset |      ....     | data n offset | data-offsets offset (4-bytes) | base Lg (1-byte) |
    +-------------- +---------------+---------------+-------------------------------+------------------+


NOTE: All fixed-length integer are little-endian.

block builder