With the memory segments now being dynamically stored in a rbtree(3) data
structure, it introduced a new issue.

Every struct uvm_physseg has various properties related to a segment like
starting page frame number, ending page frame number and so on, but the most
important of these is the pages related to that memory segment, which
effectively is an array of struct vm_page. Also the segments were of fixed
sizes, i.e. their start and end page frames does not change through out the
run, when pages are stolen from the uvm segments by MD (Machine Dependent)
code, it would just affect the avail_start and avail_end values. In this
case it was fine to have the pgs[] being allocated by a kmem_alloc() and
whenever pages were stolen from pgs[] the avail_start and avail_end page
frame numbers which doubles up as the index for accessing the pgs[] would be
adjusted. When plugging or unplugging segments as a whole this is not usually
an issue, because either you are unplugging all the pgs[] or plugging in all
the pgs[] at once.

But as we all know only a Sith would deal in absolutes, so there are these
interesting cases, where existing memory segment may fragment.

Even though my ASCII art skills are not amazing, here are some illustrations to
make things clearer.

+--------------------------------------------+
|                  Segment A                 |
+--------------------------------------------+

The above example shows the initial allocation of a memory segment’s pages. We
can safely assume that this is a single contiguous pgs[] array.

+---------------------------+----------------+
|           Segment A       |   Segment B    |
+---------------------------+----------------+

This is an example in which the Segment A gets fragmented little off the
middle. It is easy to handle the changes in most of the attributes on the
existing Segment A and setting the appropriate values in Segment B, but how
do you divide the pgs[] array in the memory segment shown above?

+--------------+--------------+--------------+
|   Segment A  |   Segment C  |  Segment B   |
+--------------+--------------+--------------+

A tougher example is the one in which the unplugging of the segment occurs in
the middle fragmenting the existing segment into 3.

So what do you do when memory management gets tough? We hire a new manager like
the PHB from Dilbert, albeit unlike in Dilbert we actually get a working manager
not just a purely functional (no pun intended) manager and that is how we got
extent(9).

After some rounds of discussion with me, Cherry (cherry@) finally decided to use
extent(9) to over come managing the pgs[] array.

It was a close race between vmem(9) and extend(9), but extend(9) won out
eventually because of the fact that it did not have tie ups with malloc(9) and
pool(9) and could be used in early boot time which is pretty much critical in
our implementation.

The implementation of extend(9) was not too hard and that is one of the good
things when you work with someone experienced, they make hard things look
easy. So now the struct uvm_physseg has a new member called
struct extent *ext which acts as a slab of memory, from which
struct vm_page *pgs get allocated in chunks

1
2
3
4
5
6
7
8
@@ -94,4 +95,5 @@
         paddr_t avail_end;              /* (PF# of last free page in segment) +1  */
         struct  vm_page *pgs;           /* vm_page structures (from start) */
+        struct  extent *ext;            /* extent(9) structure to manage pgs[] */
         struct  vm_page *lastpg;        /* vm_page structure for end */
         int     free_list;              /* which free list they belong on */
@@ -267,4 +269,7 @@
         struct vm_physseg *seg;

ext now acts as a slab of extent map that manages memory from a given
start to end range. Remember this is NOT yet allocated to
pages. Depending on when the ext is created using extent_create() it is
either fixed extent that is the map is give an a static array location of the
desired size that it manages, this is perfect for managing the boot time
allocated slabs and hence aptly named “Boot time slab” in the code. The other
type of extent map is the one that uses dynamic memory allocators like malloc(9)
to allocate memory as and when required.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
void
uvm_physseg_seg_chomp_slab(uvm_physseg_t upm, struct vm_page *pgs, size_t n)
{
        struct uvm_physseg *seg = HANDLE_TO_PHYSSEG_NODE(upm);

        /* max number of pre-boot unplug()s allowed */
#define UVM_PHYSSEG_BOOT_UNPLUG_MAX VM_PHYSSEG_MAX

        static char btslab_ex_storage[EXTENT_FIXED_STORAGE_SIZE(UVM_PHYSSEG_BOOT_UNPLUG_MAX)];

        if (__predict_false(uvm.page_init_done == false)) {
                seg->ext = extent_create("Boot time slab", (u_long) pgs, (u_long) (pgs + n),
                    (void *)btslab_ex_storage, sizeof(btslab_ex_storage), 0);
        } else {
                seg->ext = extent_create("Hotplug slab", (u_long) pgs, (u_long) (pgs + n), NULL, 0, 0);
        }

        KASSERT(seg->ext != NULL);
}

Now when pgs[] needs to be allocated this is done via the extent_alloc()
call which allocates a specific chunk of memory from the extent map.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
struct vm_page *
uvm_physseg_seg_alloc_from_slab(uvm_physseg_t upm, size_t pages)
{
        int err;
        struct uvm_physseg *seg;
        struct vm_page *pgs = NULL;

        seg = HANDLE_TO_PHYSSEG_NODE(upm);

        KASSERT(pages > 0);

        if (__predict_false(seg->ext == NULL)) {
                /*
                 * This is a situation unique to boot time.
                 * It shouldn't happen at any point other than from
                 * the first uvm_page.c:uvm_page_init() call
                 * Since we're in a loop, we can get away with the
                 * below.
                 */
                KASSERT(uvm.page_init_done != true);

                seg->ext = HANDLE_TO_PHYSSEG_NODE(uvm_physseg_get_prev(upm))->ext;

                KASSERT(seg->ext != NULL);
        }

        /* We allocate enough for this segment */
        err = extent_alloc(seg->ext, sizeof(*pgs) * pages, 1, 0, EX_BOUNDZERO, (u_long *)&pgs);

        if (err != 0) {
#ifdef DEBUG
                printf("%s: extent_alloc failed with error: %d \n",
                    __func__, err);
#endif
        }

        return pgs;
}

So far so good, everything that was being by kmem_alloc() is also being done
via extend(9) calls. Next is the tough part, so back to the scenario where we
fragmented a segment into two.

+---------------------------+----------------+
|           Segment A       |   Segment B    |
+---------------------------+----------------+

Segment B would be created just like you create a new segment, create the slab
using extent_create() and the allocate the pgs[] using extent_alloc(), but
for Segment A we need to free some pgs[] and for this we use extent_free()

1
2
3
4
5
if (__predict_true(uvm.page_init_done == true)) {
        /* XXX: KASSERT() that seg->pgs[] are not on any uvm lists */
        if (extent_free(seg->ext, (u_long)(seg->pgs + off), sizeof(struct vm_page) * pages, EX_MALLOCOK | EX_NOWAIT) != 0)
                return false;
}

And once all the pages in a memory segment collapses the segment will cease to
exist and at this point we need to invoke extend_destroy() to destroy the
memory map and then we remove the segment from the rbtree(3) and then
uvm_physseg_free() the segment.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#if defined(UVM_HOTPLUG) /* rbtree implementation */
                int segcount = 0;
                struct uvm_physseg *current_ps;
                /* Complete segment */
                if (uvm_physseg_graph.nentries == 1)
                        panic("%s: out of memory!", __func__);

                if (__predict_true(uvm.page_init_done == true)) {
                        RB_TREE_FOREACH(current_ps, &(uvm_physseg_graph.rb_tree)) {
                                if (seg->ext == current_ps->ext)
                                        segcount++;
                        }
                        KASSERT(segcount > 0);

                        if (segcount == 1) {
                                extent_destroy(seg->ext);
                        }

                        /*
                         * We assume that the unplug will succeed from
                         *  this point onwards
                         */
                        uvmexp.npages -= (int) pages;
                }

                rb_tree_remove_node(&(uvm_physseg_graph.rb_tree), upm);
                memset(seg, 0, sizeof(struct uvm_physseg));
                uvm_physseg_free(seg, sizeof(struct uvm_physseg));
                uvm_physseg_graph.nentries--;
#else /* UVM_HOTPLUG */

The extent(9) API is a pretty elegant solution to the memory management problem
we were facing and it was quite a nice code reading experience from Cherry’s
precise handy work in implementing it for uvm_hotplug(9).