So with only balloon(4) as the current consumer of uvm_hotplug(9), I lacked the
proper knowledge to use the benchmarking tools within this context and I had not
spent enough time on dtrace(1) to exploit it’s features to benchmark
PHYS_TO_VM_PAGE()
macro.
The only tool that I had some confidence in using is ATF and after some rounds
of discussion with Cherry (cherry@) we came up with a plan to that is a very
rough measure of performance. So here is how it goes.
The most frequent operation that is impacted by our change is the look up call
uvm_physseg_find()
which now searches through a R-B Tree instead of a Static
Array. In order to simulate this we copied over the PHYS_TO_VM_PAGE()
macro
and the related code from uvm_page.c
and then we wrote some ATF tests that
would load some memory segments, followed by multiple calls to
PHYS_TO_VM_PAGE()
that would search for random addresses within the plugged in
segments.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
ATF_TC(uvm_physseg_100);
ATF_TC_HEAD(uvm_physseg_100, tc)
{
atf_tc_set_md_var(tc, "descr", "Load test uvm_phys_to_vm_page() with \
100 calls, VM_PHYSSEG_MAX is 32.");
}
ATF_TC_BODY(uvm_physseg_100, tc)
{
paddr_t pa;
setup();
for(paddr_t i = VALID_START_PFN_1;
i < VALID_END_PFN_1; i += PF_STEP) {
uvm_page_physload(i, i + PF_STEP, i, i + PF_STEP,
VM_FREELIST_DEFAULT);
}
ATF_REQUIRE_EQ(VM_PHYSSEG_MAX, uvm_physseg_get_entries());
srandom((unsigned)time(NULL));
for(int i = 0; i < 100; i++) {
pa = (paddr_t) random() % (paddr_t) ctob(VALID_END_PFN_1);
PHYS_TO_VM_PAGE(pa);
}
ATF_CHECK_EQ(true, true);
}
The above snippet does tests for 100 calls like mentioned in the comment for it.
This methodology is not a perfect load test since there is a call to random()
which will cumulatively add up to the runtime of the function we are trying to
load test. I guess I could have spend a bit more time trying to remove this
overhead by having the random numbers initialze outside the test realm. After
some tweaking around I managed to write up the tests varying from 100 calls to
100 Million calls and then evaluate the time for them. Also you might notice the
ATF_CHECK_EQ(true, true)
at the bottom of the test indicating the test will
never fail. This is done because the test is NOT a check of correctness of
the function being called, we assume the function works as expected when this
test is running.
This specific test was written in such a way that it would run for both R-B Tree
and Static Array impementation giving an apples to apples comparison of how much
they differ.
A second type of test we came up with is a search done on highly fragmented
memory segments that have been unplugged from a fixed chunk.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
ATF_TC(uvm_physseg_1MB);
ATF_TC_HEAD(uvm_physseg_1MB, tc)
{
atf_tc_set_md_var(tc, "descr", "Load test uvm_phys_to_vm_page() with \
10,000,000 calls, VM_PHYSSEG_MAX is 32 on 1 MB Segment.");
}
ATF_TC_BODY(uvm_physseg_1MB, t)
{
paddr_t pa = 0;
paddr_t pf = 0;
psize_t pf_chunk_size = 0;
psize_t npages1 = (VALID_END_PFN_1 - VALID_START_PFN_1);
psize_t npages2 = (VALID_END_PFN_2 - VALID_START_PFN_2);
struct vm_page *slab = malloc(sizeof(struct vm_page) *
(npages1 + npages2));
setup();
/* We start with zero segments */
ATF_REQUIRE_EQ(true, uvm_physseg_plug(VALID_START_PFN_1, npages1, NULL));
ATF_REQUIRE_EQ(1, uvm_physseg_get_entries());
/* Post boot: Fake all segments and pages accounted for. */
uvm_page_init_fake(slab, npages1 + npages2);
ATF_REQUIRE_EQ(true, uvm_physseg_plug(VALID_START_PFN_2, npages2, NULL));
ATF_REQUIRE_EQ(2, uvm_physseg_get_entries());
srandom((unsigned)time(NULL));
for(pf = VALID_START_PFN_2; pf < VALID_END_PFN_2; pf += PF_STEP) {
pf_chunk_size = (psize_t) random() % (psize_t) (PF_STEP - 1) + 1;
uvm_physseg_unplug(pf, pf_chunk_size);
}
for(int i = 0; i < 10000000; i++) {
pa = (paddr_t) random() % (paddr_t) ctob(VALID_END_PFN_2);
if(pa < ctob(VALID_START_PFN_2))
pa += ctob(VALID_START_PFN_2);
PHYS_TO_VM_PAGE(pa);
}
ATF_CHECK_EQ(true, true);
}
This requires the boot process to be faked since we need to invoke the
uvm_physseg_unplug()
to fragment the memory. After this 10 Million calls are
made to the PHYS_TO_VM_PAGE()
macro and the memory segment sizes were varied
from 1 MB to 256 MB because my VirtualBox instance of NetBSD had a total of 512
MB to spare. This test is specific to R-B Tree implementation and cannot be run
for Static Array implementation, since VM_PHYSSEG_MAX
will limit the amount of
fragments that can happen to the array, and since this is a very small value
like 32, it would not make much sense to test it out.
Here is a quick and dirty output from the atf-run + atf-report on the series of
tests run with uvm_hotplug(9) enabled.
t_uvm_physseg_load (1/1): 11 test cases
uvm_physseg_100: [0.003286s] Passed.
uvm_physseg_100K: [0.010982s] Passed.
uvm_physseg_100M: [8.842482s] Passed.
uvm_physseg_10K: [0.004398s] Passed.
uvm_physseg_10M: [0.954270s] Passed.
uvm_physseg_128MB: [2.176629s] Passed.
uvm_physseg_1K: [0.002702s] Passed.
uvm_physseg_1M: [0.094821s] Passed.
uvm_physseg_1MB: [0.984185s] Passed.
uvm_physseg_256MB: [2.485398s] Passed.
uvm_physseg_64MB: [0.914363s] Passed.
[16.478686s]
Summary for 1 test programs:
11 passed test cases.
0 failed test cases.
0 expected failed test cases.
0 skipped test cases.
I will need to spend sometime to create an eye candy report pretty soon to make
ths presentable to a wider audience, so keep an eye out for updates.