Tuesday, January 4, 2011

Sparsemem-II

This is the second article of Sparsemem series. The first describes the logic and usability of Sparsemem. This one will focus on the code flow and implementation of the Framework. And the third one will focus on testing tools and techniques for validation a MM solution.

So First of all we statcially allocate mem_sections (2-d array) in include/linux/mmzone.h
Then the following action takes place during kernel initialization process.

--   In setup_arch(), prom_init() is called, which initializes different kernel variables.
--   prom_init() will call add_memory_region() with the complete RAM size.
--   In case the user has given the "mem=" optin with kernel cmdline then
    the previous added region is deleted and new region created using the
    memory given.
--   early_parse_mem() records the detected memory areas using add_memory_region().
--   add_memory_region() adds an entry in the boot_mem_map for each separate memory bank.
--   setup_arch() then calls arch_mem_init()
--   arch_mem_init() calls plat_mem_setup() and then calls bootmem_init() where all the available memory is bunched together and its min pfn and max pfn are identified.
--   Then bootmem_init() calls memory_present() for the detected memory.
--   memory_present() marks all the memory sections that lie within the detected memory banks as SECTION_MARKED_PRESENT.
--   The function which allocates mem_map(array of struct page) is sparse_init().
--   Then memory for bootmem map is allocated and the usable memory range is added to the early_node_map[]
--   After sparse_init(), mem_maps are allocated. (depending on the config.) But, here, mem_map is not initialized.This is because initialization logic of memmap doesn't depend on FLATMEM/DISCONTIGMEM/SPARSEMEM.
--   Initializing mem_map is done by free_area_init_node(). This function initializes memory range registered by add_active_range() (see mm/page_alloc.c) (*)There are architecutures which doesn't use add_active_range(), but this function is for generic use.
--   After free_area_init_node(), all mem_map are initialized as PG_reserved and NODE_DATA(nid)->start_pfn, etc..are available.
--   PG_reserved is cleared at free_all_bootmem(). If you want to keep pages as Reserved (because of holes), OR, don't register memory hole as bootmem then, pages will be kept as Reserved.
--   memmap_init_zone() doesn't care about valid/invalid sections. Regardless of hole, it initializes page descriptors(including struct page which are on a hole). But if page descriptors on holes are _Reserved_ then they don't go to the buddy allocator as free page. To confirm this, free_bootmem_node marks 0x0 on bitmap about only _valid_ pages by bank. Afterwards, free_all_bootmem_core() doesn't insert pages on hole into buddy by using bitmap and rejecting invalid pages.
--   To free memory, even memmap on hole would be freed on ARM by free_unused_memmap_node.

Clarification:
 memory_present() ......... prepare for section[] and mark up PRESENT.
 sparse_init() .................. allocates mem_map. but just allocates it.
 free_area_init_node() .... initizalize mem_map at el.
 memmap_init_zone() ..... mark all pages as reserved.
 free_all_bootmem() ....... make pages available and put into buddy allocator.
 pfn_valid() ..................... useful for checking there are mem_map.

Notes:If a section contains both of valid pages and holes, the section itself is marked as SECTION_MARKED_PRESENT.

 kernel allocates memmap for section which has mixed with valid and invalid(ex, hole) pages. For example, if a memory bank supports 64M but system has only 16M. Let's assume section size is 64M. In this case, section has a hole of 48M, but it will still be a valid section.

It's not encouraged to detect a section is valid or invalid but you can use pfn_valid() to check there are memmap or not. (*) pfn_valid(pfn) is not for detecting there is memory but for detecting
there is memmap.

How to make pages kept as Reserved , reserve bootmem or not register to bootmem.
All of the above may depend on various CONFIG options.

 An active memory region is simply a memory region that does not contain any holes. add_active_range() must be used to register a region in the global variable early_node_map.

Each region is described by the following data structure in mmzone.h
struct node_active_region {
unsigned long start_pfn;
unsigned long end_pfn;
int nid;
};
where start_pfn and end_pfn denote the first and last page frame in a continuous region, and nid is the NUMA ID of the node to which the memory belongs. UMA systems naturally set this to 0.

So that concludes this second article of the series. It was more about the code flow and some small tidbits I remember about this topic from discussions on linux-mm mailing list with Kamezawa Hiroyuki and Minchan Kim. Theres still some more to come which will be about testing sparsemem implementation or memory management in general.

Pls let me know about your views on this article and how can I improve it. Waiting for your comments.

Till Next time, Adios.

References:
1) Linux-MM mailing list.
2) Discussions with Kamezawa and Minchan.
3) Linux kernel books.

  

1 comment:

  1. Nice post. I have a question. When does kernel set up the vmemmap page tables?

    ReplyDelete