Re: [PATCH v8 1/1] mm/page_alloc.c: refactor initialization of struct page for holes in memory layout
From: Andrew Morton
Date: Thu Feb 25 2021 - 19:09:39 EST
On 2021年2月26日 00:43:51 +0200 Mike Rapoport <rppt@xxxxxxxxxx> wrote:
>
From: Mike Rapoport <rppt@xxxxxxxxxxxxx>
>
>
There could be struct pages that are not backed by actual physical memory.
>
This can happen when the actual memory bank is not a multiple of
>
SECTION_SIZE or when an architecture does not register memory holes
>
reserved by the firmware as memblock.memory.
>
>
Such pages are currently initialized using init_unavailable_mem() function
>
that iterates through PFNs in holes in memblock.memory and if there is a
>
struct page corresponding to a PFN, the fields of this page are set to
>
default values and it is marked as Reserved.
>
>
init_unavailable_mem() does not take into account zone and node the page
>
belongs to and sets both zone and node links in struct page to zero.
>
>
Before commit 73a6e474cb37 ("mm: memmap_init: iterate over memblock regions
>
rather that check each PFN") the holes inside a zone were re-initialized
>
during memmap_init() and got their zone/node links right. However, after
>
that commit nothing updates the struct pages representing such holes.
>
>
On a system that has firmware reserved holes in a zone above ZONE_DMA, for
>
instance in a configuration below:
>
>
# grep -A1 E820 /proc/iomem
>
7a17b000-7a216fff : Unknown E820 type
>
7a217000-7bffffff : System RAM
>
>
unset zone link in struct page will trigger
>
>
VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page);
>
>
in set_pfnblock_flags_mask() when called with a struct page from a range
>
other than E820_TYPE_RAM because there are pages in the range of ZONE_DMA32
>
but the unset zone link in struct page makes them appear as a part of
>
ZONE_DMA.
>
>
Interleave initialization of the unavailable pages with the normal
>
initialization of memory map, so that zone and node information will be
>
properly set on struct pages that are not backed by the actual memory.
>
>
With this change the pages for holes inside a zone will get proper
>
zone/node links and the pages that are not spanned by any node will get
>
links to the adjacent zone/node. The holes between nodes will be prepended
>
to the zone/node above the hole and the trailing pages in the last section
>
that will be appended to the zone/node below.
>
>
...
>
>
+#if !defined(CONFIG_FLAT_NODE_MEM_MAP)
>
+/*
>
+ * Only struct pages that correspond to ranges defined by memblock.memory
>
+ * are zeroed and initialized by going through __init_single_page() during
>
+ * memmap_init_zone().
>
+ *
>
+ * But, there could be struct pages that correspond to holes in
>
+ * memblock.memory. This can happen because of the following reasons:
>
+ * - physical memory bank size is not necessarily the exact multiple of the
>
+ * arbitrary section size
>
+ * - early reserved memory may not be listed in memblock.memory
>
+ * - memory layouts defined with memmap= kernel parameter may not align
>
+ * nicely with memmap sections
>
+ *
>
+ * Explicitly initialize those struct pages so that:
>
+ * - PG_Reserved is set
>
+ * - zone and node links point to zone and node that span the page if the
>
+ * hole is in the middle of a zone
>
+ * - zone and node links point to adjacent zone/node if the hole falls on
>
+ * the zone boundary; the pages in such holes will be prepended to the
>
+ * zone/node above the hole except for the trailing pages in the last
>
+ * section that will be appended to the zone/node below.
>
+ */
The comment helps lot.
>
void __meminit __weak memmap_init_zone(struct zone *zone)
>
{
>
unsigned long zone_start_pfn = zone->zone_start_pfn;
>
unsigned long zone_end_pfn = zone_start_pfn + zone->spanned_pages;
>
int i, nid = zone_to_nid(zone), zone_id = zone_idx(zone);
>
+ static unsigned long hole_pfn = 0;
static implies that pgdat->node_zones[] is alwyas sorted in ascending
pfn order. Always true?
>
unsigned long start_pfn, end_pfn;
>
+ u64 pgcnt = 0;
>
>
for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, NULL) {
>
start_pfn = clamp(start_pfn, zone_start_pfn, zone_end_pfn);
>
@@ -6295,7 +6348,29 @@ void __meminit __weak memmap_init_zone(struct zone *zone)
>
memmap_init_range(end_pfn - start_pfn, nid,
>
zone_id, start_pfn, zone_end_pfn,
>
MEMINIT_EARLY, NULL, MIGRATE_MOVABLE);
>
+
>
+ if (hole_pfn < start_pfn)
>
+ pgcnt += init_unavailable_range(hole_pfn, start_pfn,
>
+ zone_id, nid);
>
+ hole_pfn = end_pfn;
>
}
>
+
>
+#ifdef CONFIG_SPARSEMEM
>
+ /*
>
+ * Initialize the hole in the range [zone_end_pfn, section_end].
>
+ * If zone boundary falls in the middle of a section, this hole
>
+ * will be re-initialized during the call to this function for the
>
+ * higher zone.
>
+ */
>
+ end_pfn = round_up(zone_end_pfn, PAGES_PER_SECTION);
>
+ if (hole_pfn < end_pfn)
>
+ pgcnt += init_unavailable_range(hole_pfn, end_pfn,
>
+ zone_id, nid);
>
+#endif
>
+
>
+ if (pgcnt)
>
+ pr_info(" %s zone: %lld pages in unavailable ranges\n",
>
+ zone->name, pgcnt);
I'll make that %llu