Bitmap allocator is used for blocks smaller than page size.

The memory is allocated in units and each unit has corresponding bit in the bitmap which is set when the unit is allocated.

Therefore, one page requires sys_page_size / UNIT_SIZE bits.

Typical page size is 4K, and if UNIT_SIZE were 16, the number of bits would be 4096 / 16 = 256, or 32 bytes.

One 4KB bitmap page contains 32768 bits and can manage 32768 / 256 = 128 pages or 512KB of data.

That's not too much. If some text analysis application wanted 16GB heap for tokens, the bitmap would take 32768 pages or 128M.

This raises the question how to quickly find a page to scan. Scanning large bitmap may take too much time and it would be good to quickly find a single page to start from.

The simplest strategy could be counting the number of bits set, but this only tells whether the block contain free slots or not. A better strategy could be maintaining the longest available chain of bits.

Another question is how to get bit index in the bitmap by address of allocated block. Units are too small and we can't allow such a luxiry as storing pointers to bitmap. We don't even store the size of units, leaving this to the user.

We could reserve the first unit on each data page for allocator's purposes. But that's too much?

We could reserve large enough region in virtual address space and commit pages as necessary, but there's aforementioned(?) problem with releasing memory back to the operating system. It's better to avoid munmap/mmap.


4KB bitmap -> 512KB data -- the data region is reserved, pages are committed on demand and released when no longer in use

The first unit may contain back reference to the bitmap. It's easy to get the start address of data region.

The structure should be simple and should allow unlimited amount of memory.

Linked list!

Superblock: one 4KB page may contain up to 512 64-bit pointers. The last one points to the next superblock. What pointers point to?

Again, 256 units per data page. 32768 units per bitmap page. Searching block within single data page, so each data page must contain longest chain. It takes 2 bytes: the counter runs from 0 to 256 inclusive, But given that one unit is reserved, the upper bound would be 255 and one byte is sufficient. But what other 15 bytes are for? They may contain some back reference? -- no, it's too wasteful.

The bitmap should take 4KB page. 128 * 2 = 256 bytes are for longest chain counters That means minus 8 data pages because each takes 32 bytes 120 * 2 = 240.

32768 pages or 128M bitmap -> 16GB data

It's not good to reserve entire bitmap of such size -- mind 32-bit arch.


Upper level bitmap contain bits for lower level bitmap. One page of the upper level bitmap may control 16MB of memory. Third level gives the total amount of 512GB.

So, using three-level scheme:

            Superblock bitmap
              /           \
      Chunk bitmap        ...
        /      \
Unit bitmap    ...
  /     \

Data page ...

Unit bitmap may hold up to UNIT_SIZE pages, i.e. 16: uint8_t data_units[ sys_page_size / UNIT_SIZE ];

Superblock bitmap: single page, each bit means the chunk is full. On 32-bit architectures the size of bitmap is limited by 184 bits, that gives slightly less than 3GB.

Chunk bitmap: single page, each bit means unit bitmap is full. Each chunk bitmap addresses 16GB. Reserved area: 32768 chunk bitmap pages or 128MB on 64-bit; 184 chunk bitmap pages or 736KB on 32-bit

Unit bitmap: single page containing multiple page bitmaps, typically up to 128 data pages or 512KB. Reserved area for bitmap per chunk: 32768 pages or 128MB

4096 bytes * 8 -> 32768 bits

128 data pages per chunk * 4KB -> 512KB 32768 chunks * 512KB -> 16GB


XXX previous

typedef struct _DataBitmapHeader {
    /*
     * This structure takes exactly one page.
     */
    void* data_start;  // start of reserved data area
    void* data_end;    // end of reserved data area
    uint16_t longest_max;  // longest free block among longest free blocks in all data pages
    uint16_t longest_min;  // shortest free block among longest free blocks in all data pages

    /* The following members can't be expressed in terms of C language
       because of their variable length.

       The length is expressed in num_data_pages which is calculated as follows:

       num_data_pages = (sys_page_size - sizeof(DataBitmapHeader)) / (units_per_page / 8 + sizeof(uint_16))

       Use macros XXX which? to get their addresses.
     */

    // uint16_t lfb[ num_data_pages ];  // longest free block for each data page

    // BitmapWord bitmap[ num_data_pages ][ units_per_page / sizeof(BitmapWord) ];

    /*
      Let page_size == 4096, UNIT_SIZE == 16, and sizeof(void*) == 8
      Then:
      units_per_page = 4096 / 16 = 256
      num_data_pages = (4096 - 8 - 8 - 2 - 2) / (256 / 8 + 2) = 4076 / 34 = 119
      Data area size = 119 * 4096 = 476K

      The first unit in the data area is reserved for
     */
} DataBitmapHeader;


/* XXX
 *
 * The following structure is implicit. It takes exactly one page
   and can manage 512KB of data if page_size = 4096 and UNIT_SIZE is 16.

typedef struct {
    BitmapWord bitmap[ bits_per_page / units_per_page       ][ units_per_page / sizeof(BitmapWord) ];
    //                 = number of data pages, typically 128        this is units bitmap
} UnitsBitmap;

*/
// using surrogate type
typedef BitmapWord UnitsBitmap;

typedef struct _ChunkBitmap {
    /*
     * This structure takes up to one page and defines reserved area for units bitmap.
     * The bitmap may contain up to (sys_page_size - 2 * sizeof(UnitsBitmap*)) * 8 pages,
     * which is typically about 32640 (~128MB). This can manage 16GB of data.
     * However, this is too large amount even for virtual address space to reserve,
     * to initial chunk is smaller.
     *
     * First N chunks can fit into one page and the optimal layout for 4096-byte page
     * on 64-bit architecture would be (in bytes):
     *
     * two pointers   bitmap   total
     *   16             128     140
     *   16             256     272
     *   16             512     528
     *   16            1024    1040
     *   16            2048    2064
     *                         ----
     *                         4044
     * Thus, initial chunk reserves 128 * 8 * 4KB = 4MB of virtual address space for the bitmap.
     */
    UnitsBitmap* units_bitmap_start;
    UnitsBitmap* units_bitmap_end;
    BitmapWord  chunk_bitmap[];
} ChunkBitmap;

/*


static unsigned find_small_block(BitmapWord* units_bitmap, unsigned bitmap_size, unsigned block_size)
/*
 * Binary search for free small block that fits into BITMAPWORD_WIDTH.
 *
 * bitmap_size, block_size, and offset (the result) are in bits (i.e. units).
 *
 * Return 1-based offset on success, zero if block not found.
 */
{
    if (bitmap_size == BITMAPWORD_WIDTH) {
        // can't split bitmap further, use bit operations to find the block
        BitmapWord w = units_bitmap[0];
        if (w == 0) {
            // it's entirely free
            return 1;
        }
        if (w == BITMAPWORD_MAX) {
            // it's entirely full
            return 0;
        }
        unsigned offset = 1;
        unsigned leading_zeros = count_leading_zeros(w);
        while (w) {
            unsigned n = count_trailing_zeros(w);
            if (block_size <= n) {
                return offset;
            }
            w >>= n;
            offset += n;

            n = count_trailing_ones(w);
            w >>= n;
            offset += n;
        }
        if (block_size <= leading_zeros) {
            return offset;
        }
        return 0;
    }

    // try middle word
    unsigned middle = bitmap_size >> 1;
    BitmapWord wm = units_bitmap[middle];
    if (wm == 0) {
        // entirely free
        return ++middle;
    }
    // middle word is not free, try left half
    unsigned offset = find_small_block(units_bitmap, middle, block_size, false);
    if (offset) {
        return offset;
    }
    // nothing found on the left half, try right half
    return find_small_block(units_bitmap + middle, middle, block_size, false);
}

/*
 * Alignment simplifies the code: -- not exactly: what if small block is expanded to large?
 *
 * Bitmap is accessed in BitmapWords only. No byte access eliminates endianess issues.
 *
 * Big blocks are allocated in BITMAPWORD_WIDTH * UNITS increment.
 * Corresponding bits in the bitmap are aligned on BitmapWord boundary.
 * If UNIT_SIZE is 16 bytes and BITMAPWORD_WIDTH is 64, allocation will be in 1K increment.
 *
 * Actually, there are two separate implementations of allocator: one for small
 * blocks, another is for big blocks.
 */

static inline unsigned align_to_word_bitwidth(unsigned unit_index)
{
    return (uint_index + BITMAPWORD_WIDTH - 1) & ~(BITMAPWORD_WIDTH - 1);
}

/*
 * Block search functions.
 *
 * block_size and result are in bits, i.e. units.
 *
 * Return offset on success.
 *
 * Given that first units of data page are always in use,
 * offset can never be zero on success.
 * Zero value indicates the block is not found.
 */

Superblock idea: per-LFB (longest free block)

Possible range for LFB: data_page_header_size_in_units ... units_per_page

Let it be simply units per page

For 4K page we have 256 LFB lengths

PageDirectory* pages_per_lfb[ units_per_page ];  // index in this array is page LFB

We can quickly find page directory by scanning pages_per_lfb from requested number of units. But how to quickly move pages across page directories? -- Only if page directory is a doubly-linked list of pointers to data pages! Or, if page has next/prev pointers.

Previous structures:

typedef struct _PageDirectory {
    /*
     * 4K page: ~400 entries
     *
     * Directory level 1: data pages, 4048 * 400 => 1.6M of data
     * Directory level 2: subdirectories, 1.6M * 400 => 640M each
     * Directory level 3: superblock, 640M * 400 => 256GB total
     */
    struct _PageDirectory* parent;
    unsigned self_index;
    unsigned capacity;     // big pages may contain more than one page directory ???
    uint16_t longest_lfb;  // longest free block among longest free blocks in all data pages

    // variable part

    // longest free block for each data page or subdirectory
    uint16_t lfb[ /* capacity */ ];

    // lfb array is followed by array of pointers to data pages or page subdirectories

    // DataPageHeader* data_pages[ /* capacity */ ];
    // PageDirectory*  subdirs[ /* capacity */ ];

} PageDirectory;


static inline DataPageHeader* get_data_pages_ptr(PageDirectory* pdir)
{
    return (DataPageHeader*) (
            ((uint8_t*) pdir) + sizeof(PageDirectory) + sizeof(uint16_t) * pdir->capacity
    );
}

static void update_lfb(PageDirectory* pdir, unsigned index, uint16_t lfb)
/*
 * Update lfb in the page directory and update longest_lfb up to the root directory.
 *
 * This function is called when lfb for particular data page is about to increase.
 * Either because new data page is allocated or a memblock is released.
 */
{
    while (pdir) {
        pdir->lfb[index] = lfb;
        if (pdir->longest_lfb < lfb) {
            pdir->longest_lfb = lfb;
        } else {
            // if lfb is same or less, no point to proceed to the upper level
            break;
        }
        pdir = pdir->parent;
        index = pdir->self_index;
    }
}

static void _lfb(PageDirectory* pdir, uint16_t lfb)
/*
 * Update longest_lfb up to the root directory.
 */
{
    while (pdir) {
        if (pdir->longest_lfb < lfb) {
            pdir->longest_lfb = lfb;
        }
        pdir = pdir->parent;
    }
}

typedef struct _DataPageHeader {
    /*
     * On 4K page this takes three 16-byte units, leaving 4048 bytes for data.
     */

    // Back reference to the page directory
    // This wastes memory but simplifies the code.
    PageDirectory* pdir;
    unsigned self_index;

    // variable part

    // the size of bitmap depends on page size, for 4K it takes 32 bytes
    BitmapWord bitmap[ /* sys_page_size / UNIT_SIZE / BITMAPWORD_WIDTH */ ];

} DataPageHeader;

With different types of sub-allocators shrink/grow make no sense. #pet comes back to realloc.

static inline BitmapWord count_leading_zeros(BitmapWord value)
{
    //return  __builtin_stdc_leading_zeros(value);
    return  __builtin_clz(value);
}

static inline BitmapWord count_leading_zeros(BitmapWord value)
{
    //return  __builtin_stdc_leading_zeros(value);
    return  __builtin_clzl(value);
}