Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[ImgBot] Optimize images #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
imgbot wants to merge 1 commit into master
base: master
Choose a base branch
Loading
from imgbot
Open

[ImgBot] Optimize images #2

imgbot wants to merge 1 commit into master from imgbot

Conversation

@imgbot
Copy link

@imgbot imgbot bot commented Apr 12, 2020
edited
Loading

Beep boop. Your images are optimized!

Your image file size has been reduced by 24% 🎉

Details
File Before After Percent reduction
/Documentation/RCU/Design/Memory-Ordering/rcu_node-lock.svg 6.37kb 1.72kb 72.95%
/Documentation/RCU/Design/Memory-Ordering/TreeRCU-callback-invocation.svg 16.36kb 8.54kb 47.80%
/Documentation/userspace-api/media/v4l/subdev-image-processing-crop.svg 8.35kb 4.64kb 44.37%
/Documentation/RCU/Design/Data-Structures/TreeMapping.svg 8.99kb 5.17kb 42.49%
/Documentation/RCU/Design/Data-Structures/BigTreeClassicRCU.svg 12.41kb 7.24kb 41.72%
/Documentation/RCU/Design/Data-Structures/TreeMappingLevel.svg 11.37kb 6.78kb 40.33%
/Documentation/RCU/Design/Data-Structures/nxtlist.svg 11.45kb 6.87kb 40.04%
/Documentation/RCU/Design/Memory-Ordering/TreeRCU-callback-registry.svg 23.37kb 14.04kb 39.92%
/Documentation/RCU/Design/Data-Structures/TreeLevel.svg 22.98kb 14.10kb 38.65%
/Documentation/RCU/Design/Data-Structures/HugeTreeClassicRCU.svg 24.77kb 15.24kb 38.45%
/Documentation/userspace-api/media/v4l/nv12mt.svg 13.79kb 8.54kb 38.09%
/Documentation/RCU/Design/Data-Structures/blkd_task.svg 20.36kb 12.80kb 37.13%
/Documentation/RCU/Design/Expedited-Grace-Periods/Funnel0.svg 10.33kb 6.56kb 36.48%
/Documentation/RCU/Design/Expedited-Grace-Periods/Funnel1.svg 10.33kb 6.56kb 36.47%
/Documentation/RCU/Design/Expedited-Grace-Periods/ExpRCUFlow.svg 32.03kb 20.36kb 36.44%
/Documentation/RCU/Design/Expedited-Grace-Periods/ExpSchedFlow.svg 31.91kb 20.31kb 36.34%
/Documentation/RCU/Design/Expedited-Grace-Periods/Funnel2.svg 10.83kb 6.91kb 36.22%
/Documentation/RCU/Design/Data-Structures/BigTreePreemptRCUBHdyntickCB.svg 22.47kb 14.42kb 35.80%
/Documentation/RCU/Design/Expedited-Grace-Periods/Funnel3.svg 12.39kb 7.96kb 35.80%
/Documentation/RCU/Design/Expedited-Grace-Periods/Funnel4.svg 12.39kb 7.96kb 35.79%
/Documentation/userspace-api/media/v4l/subdev-image-processing-scaling-multi-source.svg 14.71kb 9.45kb 35.75%
/Documentation/RCU/Design/Expedited-Grace-Periods/Funnel8.svg 11.85kb 7.62kb 35.73%
/Documentation/RCU/Design/Expedited-Grace-Periods/Funnel5.svg 12.90kb 8.31kb 35.59%
/Documentation/RCU/Design/Expedited-Grace-Periods/Funnel6.svg 12.91kb 8.32kb 35.57%
/Documentation/RCU/Design/Expedited-Grace-Periods/Funnel7.svg 13.42kb 8.67kb 35.39%
/Documentation/RCU/Design/Requirements/GPpartitionReaders1.svg 16.87kb 10.93kb 35.24%
/Documentation/i2c/i2c_bus.svg 54.70kb 35.52kb 35.07%
/Documentation/RCU/Design/Memory-Ordering/TreeRCU-dyntick.svg 25.07kb 16.36kb 34.74%
/Documentation/RCU/Design/Requirements/ReadersPartitionGP1.svg 29.35kb 19.33kb 34.12%
/Documentation/userspace-api/media/v4l/subdev-image-processing-full.svg 20.06kb 13.24kb 33.98%
/Documentation/RCU/Design/Memory-Ordering/TreeRCU-hotplug.svg 28.02kb 18.64kb 33.49%
/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp-init-1.svg 23.51kb 15.75kb 33.01%
/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp-init-3.svg 22.62kb 15.48kb 31.55%
/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp-init-2.svg 23.82kb 16.59kb 30.36%
/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp-cleanup.svg 42.46kb 29.65kb 30.17%
/Documentation/RCU/Design/Memory-Ordering/TreeRCU-qs.svg 43.21kb 30.37kb 29.73%
/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp-fqs.svg 49.63kb 34.93kb 29.61%
/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp.svg 208.62kb 148.33kb 28.90%
/Documentation/userspace-api/media/v4l/vbi_hsync.svg 18.03kb 12.94kb 28.24%
/Documentation/userspace-api/media/v4l/crop.svg 17.82kb 12.94kb 27.38%
/Documentation/doc-guide/svg_image.svg 0.57kb 0.42kb 25.34%
/Documentation/userspace-api/media/v4l/fieldseq_bt.svg 169.89kb 127.26kb 25.09%
/Documentation/userspace-api/media/v4l/nv12mt_example.svg 44.88kb 33.62kb 25.08%
/Documentation/userspace-api/media/v4l/fieldseq_tb.svg 171.23kb 128.29kb 25.08%
/Documentation/userspace-api/media/v4l/constraints.svg 7.58kb 5.93kb 21.86%
/Documentation/userspace-api/media/v4l/vbi_625.svg 58.37kb 46.08kb 21.05%
/Documentation/userspace-api/media/v4l/vbi_525.svg 53.88kb 42.76kb 20.64%
/Documentation/userspace-api/media/dvb/dvbstb.svg 10.02kb 8.30kb 17.17%
/Documentation/admin-guide/blockdev/drbd/DRBD-data-packets.svg 17.02kb 14.43kb 15.24%
/Documentation/admin-guide/blockdev/drbd/DRBD-8.3-data-packets.svg 21.70kb 18.39kb 15.21%
/Documentation/userspace-api/media/v4l/bayer.svg 19.40kb 17.73kb 8.63%
/Documentation/userspace-api/media/typical_media_device.svg 80.82kb 73.98kb 8.46%
/Documentation/input/interactive.svg 3.32kb 3.22kb 2.97%
/Documentation/userspace-api/media/v4l/selection.svg 204.45kb 199.05kb 2.64%
/Documentation/input/shape.svg 5.55kb 5.41kb 2.52%
/Documentation/admin-guide/media/ipu3_rcb.svg 75.50kb 73.70kb 2.39%
/Documentation/networking/tls-offload-reorder-good.svg 6.38kb 6.28kb 1.55%
/Documentation/networking/tls-offload-reorder-bad.svg 6.38kb 6.28kb 1.55%
/Documentation/networking/tls-offload-layers.svg 49.03kb 48.87kb 0.33%
/Documentation/logo.gif 15.95kb 15.92kb 0.23%
Total : 2,034.75kb 1,545.99kb 24.02%

Black Lives Matter | 💰 donate | 🎓 learn | ✍🏾 sign

📝 docs | :octocat: repo | 🙋🏾 issues | 🏅 swag | 🏪 marketplace

pull bot pushed a commit that referenced this pull request Apr 26, 2020
FuzzUSB (a variant of syzkaller) found a free-while-still-in-use bug
in the USB scatter-gather library:
BUG: KASAN: use-after-free in atomic_read
include/asm-generic/atomic-instrumented.h:26 [inline]
BUG: KASAN: use-after-free in usb_hcd_unlink_urb+0x5f/0x170
drivers/usb/core/hcd.c:1607
Read of size 4 at addr ffff888065379610 by task kworker/u4:1/27
CPU: 1 PID: 27 Comm: kworker/u4:1 Not tainted 5.5.11 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.10.2-1ubuntu1 04/01/2014
Workqueue: scsi_tmf_2 scmd_eh_abort_handler
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0xce/0x128 lib/dump_stack.c:118
 print_address_description.constprop.4+0x21/0x3c0 mm/kasan/report.c:374
 __kasan_report+0x153/0x1cb mm/kasan/report.c:506
 kasan_report+0x12/0x20 mm/kasan/common.c:639
 check_memory_region_inline mm/kasan/generic.c:185 [inline]
 check_memory_region+0x152/0x1b0 mm/kasan/generic.c:192
 __kasan_check_read+0x11/0x20 mm/kasan/common.c:95
 atomic_read include/asm-generic/atomic-instrumented.h:26 [inline]
 usb_hcd_unlink_urb+0x5f/0x170 drivers/usb/core/hcd.c:1607
 usb_unlink_urb+0x72/0xb0 drivers/usb/core/urb.c:657
 usb_sg_cancel+0x14e/0x290 drivers/usb/core/message.c:602
 usb_stor_stop_transport+0x5e/0xa0 drivers/usb/storage/transport.c:937
This bug occurs when cancellation of the S-G transfer races with
transfer completion. When that happens, usb_sg_cancel() may continue
to access the transfer's URBs after usb_sg_wait() has freed them.
The bug is caused by the fact that usb_sg_cancel() does not take any
sort of reference to the transfer, and so there is nothing to prevent
the URBs from being deallocated while the routine is trying to use
them. The fix is to take such a reference by incrementing the
transfer's io->count field while the cancellation is in progres and
decrementing it afterward. The transfer's URBs are not deallocated
until io->complete is triggered, which happens when io->count reaches
zero.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-and-tested-by: Kyungtae Kim <kt0755@gmail.com>
CC: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/Pine.LNX.4.44L0.2003281615140.14837-100000@netrider.rowland.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
pull bot pushed a commit that referenced this pull request May 3, 2020
...f fs_info::journal_info
[BUG]
One run of btrfs/063 triggered the following lockdep warning:
 ============================================
 WARNING: possible recursive locking detected
 5.6.0-rc7-custom+ #48 Not tainted
 --------------------------------------------
 kworker/u24:0/7 is trying to acquire lock:
 ffff88817d3a46e0 (sb_internal#2){.+.+}, at: start_transaction+0x66c/0x890 [btrfs]
 but task is already holding lock:
 ffff88817d3a46e0 (sb_internal#2){.+.+}, at: start_transaction+0x66c/0x890 [btrfs]
 other info that might help us debug this:
 Possible unsafe locking scenario:
 CPU0
 ----
 lock(sb_internal#2);
 lock(sb_internal#2);
 *** DEADLOCK ***
 May be due to missing lock nesting notation
 4 locks held by kworker/u24:0/7:
 #0: ffff88817b495948 ((wq_completion)btrfs-endio-write){+.+.}, at: process_one_work+0x557/0xb80
 #1: ffff888189ea7db8 ((work_completion)(&work->normal_work)){+.+.}, at: process_one_work+0x557/0xb80
 #2: ffff88817d3a46e0 (sb_internal#2){.+.+}, at: start_transaction+0x66c/0x890 [btrfs]
 #3: ffff888174ca4da8 (&fs_info->reloc_mutex){+.+.}, at: btrfs_record_root_in_trans+0x83/0xd0 [btrfs]
 stack backtrace:
 CPU: 0 PID: 7 Comm: kworker/u24:0 Not tainted 5.6.0-rc7-custom+ #48
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
 Workqueue: btrfs-endio-write btrfs_work_helper [btrfs]
 Call Trace:
 dump_stack+0xc2/0x11a
 __lock_acquire.cold+0xce/0x214
 lock_acquire+0xe6/0x210
 __sb_start_write+0x14e/0x290
 start_transaction+0x66c/0x890 [btrfs]
 btrfs_join_transaction+0x1d/0x20 [btrfs]
 find_free_extent+0x1504/0x1a50 [btrfs]
 btrfs_reserve_extent+0xd5/0x1f0 [btrfs]
 btrfs_alloc_tree_block+0x1ac/0x570 [btrfs]
 btrfs_copy_root+0x213/0x580 [btrfs]
 create_reloc_root+0x3bd/0x470 [btrfs]
 btrfs_init_reloc_root+0x2d2/0x310 [btrfs]
 record_root_in_trans+0x191/0x1d0 [btrfs]
 btrfs_record_root_in_trans+0x90/0xd0 [btrfs]
 start_transaction+0x16e/0x890 [btrfs]
 btrfs_join_transaction+0x1d/0x20 [btrfs]
 btrfs_finish_ordered_io+0x55d/0xcd0 [btrfs]
 finish_ordered_fn+0x15/0x20 [btrfs]
 btrfs_work_helper+0x116/0x9a0 [btrfs]
 process_one_work+0x632/0xb80
 worker_thread+0x80/0x690
 kthread+0x1a3/0x1f0
 ret_from_fork+0x27/0x50
It's pretty hard to reproduce, only one hit so far.
[CAUSE]
This is because we're calling btrfs_join_transaction() without re-using
the current running one:
btrfs_finish_ordered_io()
|- btrfs_join_transaction()		<<< Call #1
 |- btrfs_record_root_in_trans()
 |- btrfs_reserve_extent()
	 |- btrfs_join_transaction()	<<< Call #2
Normally such btrfs_join_transaction() call should re-use the existing
one, without trying to re-start a transaction.
But the problem is, in btrfs_join_transaction() call #1, we call
btrfs_record_root_in_trans() before initializing current::journal_info.
And in btrfs_join_transaction() call #2, we're relying on
current::journal_info to avoid such deadlock.
[FIX]
Call btrfs_record_root_in_trans() after we have initialized
current::journal_info.
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
pull bot pushed a commit that referenced this pull request May 7, 2020
...kernel/git/kvmarm/kvmarm into kvm-master
KVM/arm fixes for Linux 5.7, take #2
- Fix compilation with Clang
- Correctly initialize GICv4.1 in the absence of a virtual ITS
- Move SP_EL0 save/restore to the guest entry/exit code
- Handle PC wrap around on 32bit guests, and narrow all 32bit
 registers on userspace access
pull bot pushed a commit that referenced this pull request May 8, 2020
Since 5.7-rc1, on btrfs we have a percpu counter initialization for
which we always pass a GFP_KERNEL gfp_t argument (this happens since
commit 2992df7 ("btrfs: Implement DREW lock")).
That is safe in some contextes but not on others where allowing fs
reclaim could lead to a deadlock because we are either holding some
btrfs lock needed for a transaction commit or holding a btrfs
transaction handle open. Because of that we surround the call to the
function that initializes the percpu counter with a NOFS context using
memalloc_nofs_save() (this is done at btrfs_init_fs_root()).
However it turns out that this is not enough to prevent a possible
deadlock because percpu_alloc() determines if it is in an atomic context
by looking exclusively at the gfp flags passed to it (GFP_KERNEL in this
case) and it is not aware that a NOFS context is set.
Because percpu_alloc() thinks it is in a non atomic context it locks the
pcpu_alloc_mutex. This can result in a btrfs deadlock when
pcpu_balance_workfn() is running, has acquired that mutex and is waiting
for reclaim, while the btrfs task that called percpu_counter_init() (and
therefore percpu_alloc()) is holding either the btrfs commit_root
semaphore or a transaction handle (done fs/btrfs/backref.c:
iterate_extent_inodes()), which prevents reclaim from finishing as an
attempt to commit the current btrfs transaction will deadlock.
Lockdep reports this issue with the following trace:
 ======================================================
 WARNING: possible circular locking dependency detected
 5.6.0-rc7-btrfs-next-77 #1 Not tainted
 ------------------------------------------------------
 kswapd0/91 is trying to acquire lock:
 ffff8938a3b3fdc8 (&delayed_node->mutex){+.+.}, at: __btrfs_release_delayed_node.part.0+0x3f/0x320 [btrfs]
 but task is already holding lock:
 ffffffffb4f0dbc0 (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30
 which lock already depends on the new lock.
 the existing dependency chain (in reverse order) is:
 -> #4 (fs_reclaim){+.+.}:
 fs_reclaim_acquire.part.0+0x25/0x30
 __kmalloc+0x5f/0x3a0
 pcpu_create_chunk+0x19/0x230
 pcpu_balance_workfn+0x56a/0x680
 process_one_work+0x235/0x5f0
 worker_thread+0x50/0x3b0
 kthread+0x120/0x140
 ret_from_fork+0x3a/0x50
 -> #3 (pcpu_alloc_mutex){+.+.}:
 __mutex_lock+0xa9/0xaf0
 pcpu_alloc+0x480/0x7c0
 __percpu_counter_init+0x50/0xd0
 btrfs_drew_lock_init+0x22/0x70 [btrfs]
 btrfs_get_fs_root+0x29c/0x5c0 [btrfs]
 resolve_indirect_refs+0x120/0xa30 [btrfs]
 find_parent_nodes+0x50b/0xf30 [btrfs]
 btrfs_find_all_leafs+0x60/0xb0 [btrfs]
 iterate_extent_inodes+0x139/0x2f0 [btrfs]
 iterate_inodes_from_logical+0xa1/0xe0 [btrfs]
 btrfs_ioctl_logical_to_ino+0xb4/0x190 [btrfs]
 btrfs_ioctl+0x165a/0x3130 [btrfs]
 ksys_ioctl+0x87/0xc0
 __x64_sys_ioctl+0x16/0x20
 do_syscall_64+0x5c/0x260
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
 -> #2 (&fs_info->commit_root_sem){++++}:
 down_write+0x38/0x70
 btrfs_cache_block_group+0x2ec/0x500 [btrfs]
 find_free_extent+0xc6a/0x1600 [btrfs]
 btrfs_reserve_extent+0x9b/0x180 [btrfs]
 btrfs_alloc_tree_block+0xc1/0x350 [btrfs]
 alloc_tree_block_no_bg_flush+0x4a/0x60 [btrfs]
 __btrfs_cow_block+0x122/0x5a0 [btrfs]
 btrfs_cow_block+0x106/0x240 [btrfs]
 commit_cowonly_roots+0x55/0x310 [btrfs]
 btrfs_commit_transaction+0x509/0xb20 [btrfs]
 sync_filesystem+0x74/0x90
 generic_shutdown_super+0x22/0x100
 kill_anon_super+0x14/0x30
 btrfs_kill_super+0x12/0x20 [btrfs]
 deactivate_locked_super+0x31/0x70
 cleanup_mnt+0x100/0x160
 task_work_run+0x93/0xc0
 exit_to_usermode_loop+0xf9/0x100
 do_syscall_64+0x20d/0x260
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
 -> #1 (&space_info->groups_sem){++++}:
 down_read+0x3c/0x140
 find_free_extent+0xef6/0x1600 [btrfs]
 btrfs_reserve_extent+0x9b/0x180 [btrfs]
 btrfs_alloc_tree_block+0xc1/0x350 [btrfs]
 alloc_tree_block_no_bg_flush+0x4a/0x60 [btrfs]
 __btrfs_cow_block+0x122/0x5a0 [btrfs]
 btrfs_cow_block+0x106/0x240 [btrfs]
 btrfs_search_slot+0x50c/0xd60 [btrfs]
 btrfs_lookup_inode+0x3a/0xc0 [btrfs]
 __btrfs_update_delayed_inode+0x90/0x280 [btrfs]
 __btrfs_commit_inode_delayed_items+0x81f/0x870 [btrfs]
 __btrfs_run_delayed_items+0x8e/0x180 [btrfs]
 btrfs_commit_transaction+0x31b/0xb20 [btrfs]
 iterate_supers+0x87/0xf0
 ksys_sync+0x60/0xb0
 __ia32_sys_sync+0xa/0x10
 do_syscall_64+0x5c/0x260
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
 -> #0 (&delayed_node->mutex){+.+.}:
 __lock_acquire+0xef0/0x1c80
 lock_acquire+0xa2/0x1d0
 __mutex_lock+0xa9/0xaf0
 __btrfs_release_delayed_node.part.0+0x3f/0x320 [btrfs]
 btrfs_evict_inode+0x40d/0x560 [btrfs]
 evict+0xd9/0x1c0
 dispose_list+0x48/0x70
 prune_icache_sb+0x54/0x80
 super_cache_scan+0x124/0x1a0
 do_shrink_slab+0x176/0x440
 shrink_slab+0x23a/0x2c0
 shrink_node+0x188/0x6e0
 balance_pgdat+0x31d/0x7f0
 kswapd+0x238/0x550
 kthread+0x120/0x140
 ret_from_fork+0x3a/0x50
 other info that might help us debug this:
 Chain exists of:
 &delayed_node->mutex --> pcpu_alloc_mutex --> fs_reclaim
 Possible unsafe locking scenario:
 CPU0 CPU1
 ---- ----
 lock(fs_reclaim);
 lock(pcpu_alloc_mutex);
 lock(fs_reclaim);
 lock(&delayed_node->mutex);
 *** DEADLOCK ***
 3 locks held by kswapd0/91:
 #0: (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30
 #1: (shrinker_rwsem){++++}, at: shrink_slab+0x12f/0x2c0
 #2: (&type->s_umount_key#43){++++}, at: trylock_super+0x16/0x50
 stack backtrace:
 CPU: 1 PID: 91 Comm: kswapd0 Not tainted 5.6.0-rc7-btrfs-next-77 #1
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
 Call Trace:
 dump_stack+0x8f/0xd0
 check_noncircular+0x170/0x190
 __lock_acquire+0xef0/0x1c80
 lock_acquire+0xa2/0x1d0
 __mutex_lock+0xa9/0xaf0
 __btrfs_release_delayed_node.part.0+0x3f/0x320 [btrfs]
 btrfs_evict_inode+0x40d/0x560 [btrfs]
 evict+0xd9/0x1c0
 dispose_list+0x48/0x70
 prune_icache_sb+0x54/0x80
 super_cache_scan+0x124/0x1a0
 do_shrink_slab+0x176/0x440
 shrink_slab+0x23a/0x2c0
 shrink_node+0x188/0x6e0
 balance_pgdat+0x31d/0x7f0
 kswapd+0x238/0x550
 kthread+0x120/0x140
 ret_from_fork+0x3a/0x50
This could be fixed by making btrfs pass GFP_NOFS instead of GFP_KERNEL
to percpu_counter_init() in contextes where it is not reclaim safe,
however that type of approach is discouraged since
memalloc_[nofs|noio]_save() were introduced. Therefore this change
makes pcpu_alloc() look up into an existing nofs/noio context before
deciding whether it is in an atomic context or not.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Dennis Zhou <dennis@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Link: http://lkml.kernel.org/r/20200430164356.15543-1-fdmanana@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
pull bot pushed a commit that referenced this pull request May 10, 2020
abs_vdebt is an atomic_64 which tracks how much over budget a given cgroup
is and controls the activation of use_delay mechanism. Once a cgroup goes
over budget from forced IOs, it has to pay it back with its future budget.
The progress guarantee on debt paying comes from the iocg being active -
active iocgs are processed by the periodic timer, which ensures that as time
passes the debts dissipate and the iocg returns to normal operation.
However, both iocg activation and vdebt handling are asynchronous and a
sequence like the following may happen.
1. The iocg is in the process of being deactivated by the periodic timer.
2. A bio enters ioc_rqos_throttle(), calls iocg_activate() which returns
 without anything because it still sees that the iocg is already active.
3. The iocg is deactivated.
4. The bio from #2 is over budget but needs to be forced. It increases
 abs_vdebt and goes over the threshold and enables use_delay.
5. IO control is enabled for the iocg's subtree and now IOs are attributed
 to the descendant cgroups and the iocg itself no longer issues IOs.
This leaves the iocg with stuck abs_vdebt - it has debt but inactive and no
further IOs which can activate it. This can end up unduly punishing all the
descendants cgroups.
The usual throttling path has the same issue - the iocg must be active while
throttled to ensure that future event will wake it up - and solves the
problem by synchronizing the throttling path with a spinlock. abs_vdebt
handling is another form of overage handling and shares a lot of
characteristics including the fact that it isn't in the hottest path.
This patch fixes the above and other possible races by strictly
synchronizing abs_vdebt and use_delay handling with iocg->waitq.lock.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Vlad Dmitriev <vvd@fb.com>
Cc: stable@vger.kernel.org # v5.4+
Fixes: e1518f6 ("blk-iocost: Don't let merges push vtime into the future")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
pull bot pushed a commit that referenced this pull request May 24, 2020
This BUG halt was reported a while back, but the patch somehow got
missed:
PID: 2879 TASK: c16adaa0 CPU: 1 COMMAND: "sctpn"
 #0 [f418dd28] crash_kexec at c04a7d8c
 #1 [f418dd7c] oops_end at c0863e02
 #2 [f418dd90] do_invalid_op at c040aaca
 #3 [f418de28] error_code (via invalid_op) at c08631a5
 EAX: f34baac0 EBX: 00000090 ECX: f418deb0 EDX: f5542950 EBP: 00000000
 DS: 007b ESI: f34ba800 ES: 007b EDI: f418dea0 GS: 00e0
 CS: 0060 EIP: c046fa5e ERR: ffffffff EFLAGS: 00010286
 #4 [f418de5c] add_timer at c046fa5e
 #5 [f418de68] sctp_do_sm at f8db8c77 [sctp]
 #6 [f418df30] sctp_primitive_SHUTDOWN at f8dcc1b5 [sctp]
 #7 [f418df48] inet_shutdown at c080baf9
 #8 [f418df5c] sys_shutdown at c079eedf
 #9 [f418df7] sys_socketcall at c079fe88
 EAX: ffffffda EBX: 0000000d ECX: bfceea90 EDX: 0937af98
 DS: 007b ESI: 0000000c ES: 007b EDI: b7150ae4
 SS: 007b ESP: bfceea7c EBP: bfceeaa8 GS: 0033
 CS: 0073 EIP: b775c424 ERR: 00000066 EFLAGS: 00000282
It appears that the side effect that starts the shutdown timer was processed
multiple times, which can happen as multiple paths can trigger it. This of
course leads to the BUG halt in add_timer getting called.
Fix seems pretty straightforward, just check before the timer is added if its
already been started. If it has mod the timer instead to min(current
expiration, new expiration)
Its been tested but not confirmed to fix the problem, as the issue has only
occured in production environments where test kernels are enjoined from being
installed. It appears to be a sane fix to me though. Also, recentely,
Jere found a reproducer posted on list to confirm that this resolves the
issues
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: jere.leppanen@nokia.com
CC: marcelo.leitner@gmail.com
CC: netdev@vger.kernel.org
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
pull bot pushed a commit that referenced this pull request May 24, 2020
Ido Schimmel says:
====================
netdevsim: Two small fixes
Fix two bugs observed while analyzing regression failures.
Patch #1 fixes a bug where sometimes the drop counter of a packet trap
policer would not increase.
Patch #2 adds a missing initialization of a variable in a related
selftest.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
pull bot pushed a commit that referenced this pull request May 24, 2020
Ido Schimmel says:
====================
mlxsw: Various fixes
Patch #1 from Jiri fixes a use-after-free discovered while fuzzing mlxsw
/ devlink with syzkaller.
Patch #2 from Amit works around a limitation in new versions of arping,
which is used in several selftests.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
pull bot pushed a commit that referenced this pull request May 24, 2020
...inux/kernel/git/dhowells/linux-fs
David Howells says:
====================
rxrpc: Fix a warning and a leak [ver #2]
Here are a couple of fixes for AF_RXRPC:
 (1) Fix an uninitialised variable warning.
 (2) Fix a leak of the ticket on error in rxkad.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
pull bot pushed a commit that referenced this pull request May 31, 2020
Be there a platform with the following layout:
 Regular NIC
 |
 +----> DSA master for switch port
 |
 +----> DSA master for another switch port
After changing DSA back to static lockdep class keys in commit
1a33e10 ("net: partially revert dynamic lockdep key changes"), this
kernel splat can be seen:
[ 13.361198] ============================================
[ 13.366524] WARNING: possible recursive locking detected
[ 13.371851] 5.7.0-rc4-02121-gc32a05ecd7af-dirty torvalds#988 Not tainted
[ 13.377874] --------------------------------------------
[ 13.383201] swapper/0/0 is trying to acquire lock:
[ 13.388004] ffff0000668ff298 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0
[ 13.397879]
[ 13.397879] but task is already holding lock:
[ 13.403727] ffff0000661a1698 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0
[ 13.413593]
[ 13.413593] other info that might help us debug this:
[ 13.420140] Possible unsafe locking scenario:
[ 13.420140]
[ 13.426075] CPU0
[ 13.428523] ----
[ 13.430969] lock(&dsa_slave_netdev_xmit_lock_key);
[ 13.435946] lock(&dsa_slave_netdev_xmit_lock_key);
[ 13.440924]
[ 13.440924] *** DEADLOCK ***
[ 13.440924]
[ 13.446860] May be due to missing lock nesting notation
[ 13.446860]
[ 13.453668] 6 locks held by swapper/0/0:
[ 13.457598] #0: ffff800010003de0 ((&idev->mc_ifc_timer)){+.-.}-{0:0}, at: call_timer_fn+0x0/0x400
[ 13.466593] #1: ffffd4d3fb478700 (rcu_read_lock){....}-{1:2}, at: mld_sendpack+0x0/0x560
[ 13.474803] #2: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: ip6_finish_output2+0x64/0xb10
[ 13.483886] #3: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x6c/0xbe0
[ 13.492793] #4: ffff0000661a1698 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0
[ 13.503094] #5: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x6c/0xbe0
[ 13.512000]
[ 13.512000] stack backtrace:
[ 13.516369] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.7.0-rc4-02121-gc32a05ecd7af-dirty torvalds#988
[ 13.530421] Call trace:
[ 13.532871] dump_backtrace+0x0/0x1d8
[ 13.536539] show_stack+0x24/0x30
[ 13.539862] dump_stack+0xe8/0x150
[ 13.543271] __lock_acquire+0x1030/0x1678
[ 13.547290] lock_acquire+0xf8/0x458
[ 13.550873] _raw_spin_lock+0x44/0x58
[ 13.554543] __dev_queue_xmit+0x84c/0xbe0
[ 13.558562] dev_queue_xmit+0x24/0x30
[ 13.562232] dsa_slave_xmit+0xe0/0x128
[ 13.565988] dev_hard_start_xmit+0xf4/0x448
[ 13.570182] __dev_queue_xmit+0x808/0xbe0
[ 13.574200] dev_queue_xmit+0x24/0x30
[ 13.577869] neigh_resolve_output+0x15c/0x220
[ 13.582237] ip6_finish_output2+0x244/0xb10
[ 13.586430] __ip6_finish_output+0x1dc/0x298
[ 13.590709] ip6_output+0x84/0x358
[ 13.594116] mld_sendpack+0x2bc/0x560
[ 13.597786] mld_ifc_timer_expire+0x210/0x390
[ 13.602153] call_timer_fn+0xcc/0x400
[ 13.605822] run_timer_softirq+0x588/0x6e0
[ 13.609927] __do_softirq+0x118/0x590
[ 13.613597] irq_exit+0x13c/0x148
[ 13.616918] __handle_domain_irq+0x6c/0xc0
[ 13.621023] gic_handle_irq+0x6c/0x160
[ 13.624779] el1_irq+0xbc/0x180
[ 13.627927] cpuidle_enter_state+0xb4/0x4d0
[ 13.632120] cpuidle_enter+0x3c/0x50
[ 13.635703] call_cpuidle+0x44/0x78
[ 13.639199] do_idle+0x228/0x2c8
[ 13.642433] cpu_startup_entry+0x2c/0x48
[ 13.646363] rest_init+0x1ac/0x280
[ 13.649773] arch_call_rest_init+0x14/0x1c
[ 13.653878] start_kernel+0x490/0x4bc
Lockdep keys themselves were added in commit ab92d68 ("net: core:
add generic lockdep keys"), and it's very likely that this splat existed
since then, but I have no real way to check, since this stacked platform
wasn't supported by mainline back then.
>From Taehee's own words:
 This patch was considered that all stackable devices have LLTX flag.
 But the dsa doesn't have LLTX, so this splat happened.
 After this patch, dsa shares the same lockdep class key.
 On the nested dsa interface architecture, which you illustrated,
 the same lockdep class key will be used in __dev_queue_xmit() because
 dsa doesn't have LLTX.
 So that lockdep detects deadlock because the same lockdep class key is
 used recursively although actually the different locks are used.
 There are some ways to fix this problem.
 1. using NETIF_F_LLTX flag.
 If possible, using the LLTX flag is a very clear way for it.
 But I'm so sorry I don't know whether the dsa could have LLTX or not.
 2. using dynamic lockdep again.
 It means that each interface uses a separate lockdep class key.
 So, lockdep will not detect recursive locking.
 But this way has a problem that it could consume lockdep class key
 too many.
 Currently, lockdep can have 8192 lockdep class keys.
 - you can see this number with the following command.
 cat /proc/lockdep_stats
 lock-classes: 1251 [max: 8192]
 ...
 The [max: 8192] means that the maximum number of lockdep class keys.
 If too many lockdep class keys are registered, lockdep stops to work.
 So, using a dynamic(separated) lockdep class key should be considered
 carefully.
 In addition, updating lockdep class key routine might have to be existing.
 (lockdep_register_key(), lockdep_set_class(), lockdep_unregister_key())
 3. Using lockdep subclass.
 A lockdep class key could have 8 subclasses.
 The different subclass is considered different locks by lockdep
 infrastructure.
 But "lock-classes" is not counted by subclasses.
 So, it could avoid stopping lockdep infrastructure by an overflow of
 lockdep class keys.
 This approach should also have an updating lockdep class key routine.
 (lockdep_set_subclass())
 4. Using nonvalidate lockdep class key.
 The lockdep infrastructure supports nonvalidate lockdep class key type.
 It means this lockdep is not validated by lockdep infrastructure.
 So, the splat will not happen but lockdep couldn't detect real deadlock
 case because lockdep really doesn't validate it.
 I think this should be used for really special cases.
 (lockdep_set_novalidate_class())
Further discussion here:
https://patchwork.ozlabs.org/project/netdev/patch/20200503052220.4536-2-xiyou.wangcong@gmail.com/
There appears to be no negative side-effect to declaring lockless TX for
the DSA virtual interfaces, which means they handle their own locking.
So that's what we do to make the splat go away.
Patch tested in a wide variety of cases: unicast, multicast, PTP, etc.
Fixes: ab92d68 ("net: core: add generic lockdep keys")
Suggested-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
pull bot pushed a commit that referenced this pull request Jun 1, 2020
Removing the pcrypt module triggers this:
 general protection fault, probably for non-canonical
 address 0xdead000000000122
 CPU: 5 PID: 264 Comm: modprobe Not tainted 5.6.0+ #2
 Hardware name: QEMU Standard PC
 RIP: 0010:__cpuhp_state_remove_instance+0xcc/0x120
 Call Trace:
 padata_sysfs_release+0x74/0xce
 kobject_put+0x81/0xd0
 padata_free+0x12/0x20
 pcrypt_exit+0x43/0x8ee [pcrypt]
padata instances wrongly use the same hlist node for the online and dead
states, so __padata_free()'s second cpuhp remove call chokes on the node
that the first poisoned.
cpuhp multi-instance callbacks only walk forward in cpuhp_step->list and
the same node is linked in both the online and dead lists, so the list
corruption that results from padata_alloc() adding the node to a second
list without removing it from the first doesn't cause problems as long
as no instances are freed.
Avoid the issue by giving each state its own node.
Fixes: 894c9ef ("padata: validate cpumask without removed CPU during offline")
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
pull bot pushed a commit that referenced this pull request Jun 2, 2020
Realloc of size zero is a free not an error, avoid this causing a double
free. Caught by clang's address sanitizer:
==2634==ERROR: AddressSanitizer: attempting double-free on 0x6020000015f0 in thread T0:
 #0 0x5649659297fd in free llvm/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:123:3
 #1 0x5649659e9251 in __zfree tools/lib/zalloc.c:13:2
 #2 0x564965c0f92c in mem2node__exit tools/perf/util/mem2node.c:114:2
 #3 0x564965a08b4c in perf_c2c__report tools/perf/builtin-c2c.c:2867:2
 #4 0x564965a0616a in cmd_c2c tools/perf/builtin-c2c.c:2989:10
 #5 0x564965944348 in run_builtin tools/perf/perf.c:312:11
 #6 0x564965943235 in handle_internal_command tools/perf/perf.c:364:8
 #7 0x5649659440c4 in run_argv tools/perf/perf.c:408:2
 #8 0x564965942e41 in main tools/perf/perf.c:538:3
0x6020000015f0 is located 0 bytes inside of 1-byte region [0x6020000015f0,0x6020000015f1)
freed by thread T0 here:
 #0 0x564965929da3 in realloc third_party/llvm/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
 #1 0x564965c0f55e in mem2node__init tools/perf/util/mem2node.c:97:16
 #2 0x564965a08956 in perf_c2c__report tools/perf/builtin-c2c.c:2803:8
 #3 0x564965a0616a in cmd_c2c tools/perf/builtin-c2c.c:2989:10
 #4 0x564965944348 in run_builtin tools/perf/perf.c:312:11
 #5 0x564965943235 in handle_internal_command tools/perf/perf.c:364:8
 #6 0x5649659440c4 in run_argv tools/perf/perf.c:408:2
 #7 0x564965942e41 in main tools/perf/perf.c:538:3
previously allocated by thread T0 here:
 #0 0x564965929c42 in calloc third_party/llvm/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:154:3
 #1 0x5649659e9220 in zalloc tools/lib/zalloc.c:8:9
 #2 0x564965c0f32d in mem2node__init tools/perf/util/mem2node.c:61:12
 #3 0x564965a08956 in perf_c2c__report tools/perf/builtin-c2c.c:2803:8
 #4 0x564965a0616a in cmd_c2c tools/perf/builtin-c2c.c:2989:10
 #5 0x564965944348 in run_builtin tools/perf/perf.c:312:11
 #6 0x564965943235 in handle_internal_command tools/perf/perf.c:364:8
 #7 0x5649659440c4 in run_argv tools/perf/perf.c:408:2
 #8 0x564965942e41 in main tools/perf/perf.c:538:3
v2: add a WARN_ON_ONCE when the free condition arises.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: clang-built-linux@googlegroups.com
Link: http://lore.kernel.org/lkml/20200320182347.87675-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
pull bot pushed a commit that referenced this pull request Jun 3, 2020
Alloc new map and request for new hardware queue when increse
hardware queue count. Before this patch, it will show a
warning for each new hardware queue, but it's not enough, these
hctx have no maps and reqeust, when a bio was mapped to these
hardware queue, it will trigger kernel panic when get request
from these hctx.
Test environment:
 * A NVMe disk supports 128 io queues
 * 96 cpus in system
A corner case can always trigger this panic, there are 96
io queues allocated for HCTX_TYPE_DEFAULT type, the corresponding kernel
log: nvme nvme0: 96/0/0 default/read/poll queues. Now we set nvme write
queues to 96, then nvme will alloc others(32) queues for read, but
blk_mq_update_nr_hw_queues does not alloc map and request for these new
added io queues. So when process read nvme disk, it will trigger kernel
panic when get request from these hardware context.
Reproduce script:
nr=$(expr `cat /sys/block/nvme0n1/device/queue_count` - 1)
echo $nr > /sys/module/nvme/parameters/write_queues
echo 1 > /sys/block/nvme0n1/device/reset_controller
dd if=/dev/nvme0n1 of=/dev/null bs=4K count=1
[ 8040.805626] ------------[ cut here ]------------
[ 8040.805627] WARNING: CPU: 82 PID: 12921 at block/blk-mq.c:2578 blk_mq_map_swqueue+0x2b6/0x2c0
[ 8040.805627] Modules linked in: nvme nvme_core nf_conntrack_netlink xt_addrtype br_netfilter overlay xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nft_counter nf_nat_tftp nf_conntrack_tftp nft_masq nf_tables_set nft_fib_inet nft_f
ib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack tun bridge nf_defrag_ipv6 nf_defrag_ipv4 stp llc ip6_tables ip_tables nft_compat rfkill ip_set nf_tables nfne
tlink sunrpc intel_rapl_msr intel_rapl_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass ipmi_ssif crct10dif_pclmul crc32_pclmul iTCO_wdt iTCO_vendor_support ghash_clmulni_intel intel_
cstate intel_uncore raid0 joydev intel_rapl_perf ipmi_si pcspkr mei_me ioatdma sg ipmi_devintf mei i2c_i801 dca lpc_ich ipmi_msghandler acpi_power_meter acpi_pad xfs libcrc32c sd_mod ast i2c_algo_bit drm_vram_helper drm_ttm_helper ttm d
rm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
[ 8040.805637] ahci drm i40e libahci crc32c_intel libata t10_pi wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: nvme_core]
[ 8040.805640] CPU: 82 PID: 12921 Comm: kworker/u194:2 Kdump: loaded Tainted: G W 5.6.0-rc5.78317c+ #2
[ 8040.805640] Hardware name: Inspur SA5212M5/YZMB-00882-104, BIOS 4.0.9 08/27/2019
[ 8040.805641] Workqueue: nvme-reset-wq nvme_reset_work [nvme]
[ 8040.805642] RIP: 0010:blk_mq_map_swqueue+0x2b6/0x2c0
[ 8040.805643] Code: 00 00 00 00 00 41 83 c5 01 44 39 6d 50 77 b8 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 8b bb 98 00 00 00 89 d6 e8 8c 81 03 00 eb 83 <0f> 0b e9 52 ff ff ff 0f 1f 00 0f 1f 44 00 00 41 57 48 89 f1 41 56
[ 8040.805643] RSP: 0018:ffffba590d2e7d48 EFLAGS: 00010246
[ 8040.805643] RAX: 0000000000000000 RBX: ffff9f013e1ba800 RCX: 000000000000003d
[ 8040.805644] RDX: ffff9f00ffff6000 RSI: 0000000000000003 RDI: ffff9ed200246d90
[ 8040.805644] RBP: ffff9f00f6a79860 R08: 0000000000000000 R09: 000000000000003d
[ 8040.805645] R10: 0000000000000001 R11: ffff9f0138c3d000 R12: ffff9f00fb3a9008
[ 8040.805645] R13: 000000000000007f R14: ffffffff96822660 R15: 000000000000005f
[ 8040.805645] FS: 0000000000000000(0000) GS:ffff9f013fa80000(0000) knlGS:0000000000000000
[ 8040.805646] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8040.805646] CR2: 00007f7f397fa6f8 CR3: 0000003d8240a002 CR4: 00000000007606e0
[ 8040.805647] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 8040.805647] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 8040.805647] PKRU: 55555554
[ 8040.805647] Call Trace:
[ 8040.805649] blk_mq_update_nr_hw_queues+0x31b/0x390
[ 8040.805650] nvme_reset_work+0xb4b/0xeab [nvme]
[ 8040.805651] process_one_work+0x1a7/0x370
[ 8040.805652] worker_thread+0x1c9/0x380
[ 8040.805653] ? max_active_store+0x80/0x80
[ 8040.805655] kthread+0x112/0x130
[ 8040.805656] ? __kthread_parkme+0x70/0x70
[ 8040.805657] ret_from_fork+0x35/0x40
[ 8040.805658] ---[ end trace b5f13b1e73ccb5d3 ]---
[ 8229.365135] BUG: kernel NULL pointer dereference, address: 0000000000000004
[ 8229.365165] #PF: supervisor read access in kernel mode
[ 8229.365178] #PF: error_code(0x0000) - not-present page
[ 8229.365191] PGD 0 P4D 0
[ 8229.365201] Oops: 0000 [#1] SMP PTI
[ 8229.365212] CPU: 77 PID: 13024 Comm: dd Kdump: loaded Tainted: G W 5.6.0-rc5.78317c+ #2
[ 8229.365232] Hardware name: Inspur SA5212M5/YZMB-00882-104, BIOS 4.0.9 08/27/2019
[ 8229.365253] RIP: 0010:blk_mq_get_tag+0x227/0x250
[ 8229.365265] Code: 44 24 04 44 01 e0 48 8b 74 24 38 65 48 33 34 25 28 00 00 00 75 33 48 83 c4 40 5b 5d 41 5c 41 5d 41 5e c3 48 8d 68 10 4c 89 ef <44> 8b 60 04 48 89 ee e8 dd f9 ff ff 83 f8 ff 75 c8 e9 67 fe ff ff
[ 8229.365304] RSP: 0018:ffffba590e977970 EFLAGS: 00010246
[ 8229.365317] RAX: 0000000000000000 RBX: ffff9f00f6a79860 RCX: ffffba590e977998
[ 8229.365333] RDX: 0000000000000000 RSI: ffff9f012039b140 RDI: ffffba590e977a38
[ 8229.365349] RBP: 0000000000000010 R08: ffffda58ff94e190 R09: ffffda58ff94e198
[ 8229.365365] R10: 0000000000000011 R11: ffff9f00f6a79860 R12: 0000000000000000
[ 8229.365381] R13: ffffba590e977a38 R14: ffff9f012039b140 R15: 0000000000000001
[ 8229.365397] FS: 00007f481c230580(0000) GS:ffff9f013f940000(0000) knlGS:0000000000000000
[ 8229.365415] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8229.365428] CR2: 0000000000000004 CR3: 0000005f35e26004 CR4: 00000000007606e0
[ 8229.365444] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 8229.365460] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 8229.365476] PKRU: 55555554
[ 8229.365484] Call Trace:
[ 8229.365498] ? finish_wait+0x80/0x80
[ 8229.365512] blk_mq_get_request+0xcb/0x3f0
[ 8229.365525] blk_mq_make_request+0x143/0x5d0
[ 8229.365538] generic_make_request+0xcf/0x310
[ 8229.365553] ? scan_shadow_nodes+0x30/0x30
[ 8229.365564] submit_bio+0x3c/0x150
[ 8229.365576] mpage_readpages+0x163/0x1a0
[ 8229.365588] ? blkdev_direct_IO+0x490/0x490
[ 8229.365601] read_pages+0x6b/0x190
[ 8229.365612] __do_page_cache_readahead+0x1c1/0x1e0
[ 8229.365626] ondemand_readahead+0x182/0x2f0
[ 8229.365639] generic_file_buffered_read+0x590/0xab0
[ 8229.365655] new_sync_read+0x12a/0x1c0
[ 8229.365666] vfs_read+0x8a/0x140
[ 8229.365676] ksys_read+0x59/0xd0
[ 8229.365688] do_syscall_64+0x55/0x1d0
[ 8229.365700] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com>
Tested-by: Weiping Zhang <zhangweiping@didiglobal.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
pull bot pushed a commit that referenced this pull request Jun 3, 2020
We need to check mddev->del_work before flush workqueu since the purpose
of flush is to ensure the previous md is disappeared. Otherwise the similar
deadlock appeared if LOCKDEP is enabled, it is due to md_open holds the
bdev->bd_mutex before flush workqueue.
kernel: [ 154.522645] ======================================================
kernel: [ 154.522647] WARNING: possible circular locking dependency detected
kernel: [ 154.522650] 5.6.0-rc7-lp151.27-default #25 Tainted: G O
kernel: [ 154.522651] ------------------------------------------------------
kernel: [ 154.522653] mdadm/2482 is trying to acquire lock:
kernel: [ 154.522655] ffff888078529128 ((wq_completion)md_misc){+.+.}, at: flush_workqueue+0x84/0x4b0
kernel: [ 154.522673]
kernel: [ 154.522673] but task is already holding lock:
kernel: [ 154.522675] ffff88804efa9338 (&bdev->bd_mutex){+.+.}, at: __blkdev_get+0x79/0x590
kernel: [ 154.522691]
kernel: [ 154.522691] which lock already depends on the new lock.
kernel: [ 154.522691]
kernel: [ 154.522694]
kernel: [ 154.522694] the existing dependency chain (in reverse order) is:
kernel: [ 154.522696]
kernel: [ 154.522696] -> #4 (&bdev->bd_mutex){+.+.}:
kernel: [ 154.522704] __mutex_lock+0x87/0x950
kernel: [ 154.522706] __blkdev_get+0x79/0x590
kernel: [ 154.522708] blkdev_get+0x65/0x140
kernel: [ 154.522709] blkdev_get_by_dev+0x2f/0x40
kernel: [ 154.522716] lock_rdev+0x3d/0x90 [md_mod]
kernel: [ 154.522719] md_import_device+0xd6/0x1b0 [md_mod]
kernel: [ 154.522723] new_dev_store+0x15e/0x210 [md_mod]
kernel: [ 154.522728] md_attr_store+0x7a/0xc0 [md_mod]
kernel: [ 154.522732] kernfs_fop_write+0x117/0x1b0
kernel: [ 154.522735] vfs_write+0xad/0x1a0
kernel: [ 154.522737] ksys_write+0xa4/0xe0
kernel: [ 154.522745] do_syscall_64+0x64/0x2b0
kernel: [ 154.522748] entry_SYSCALL_64_after_hwframe+0x49/0xbe
kernel: [ 154.522749]
kernel: [ 154.522749] -> #3 (&mddev->reconfig_mutex){+.+.}:
kernel: [ 154.522752] __mutex_lock+0x87/0x950
kernel: [ 154.522756] new_dev_store+0xc9/0x210 [md_mod]
kernel: [ 154.522759] md_attr_store+0x7a/0xc0 [md_mod]
kernel: [ 154.522761] kernfs_fop_write+0x117/0x1b0
kernel: [ 154.522763] vfs_write+0xad/0x1a0
kernel: [ 154.522765] ksys_write+0xa4/0xe0
kernel: [ 154.522767] do_syscall_64+0x64/0x2b0
kernel: [ 154.522769] entry_SYSCALL_64_after_hwframe+0x49/0xbe
kernel: [ 154.522770]
kernel: [ 154.522770] -> #2 (kn->count#253){++++}:
kernel: [ 154.522775] __kernfs_remove+0x253/0x2c0
kernel: [ 154.522778] kernfs_remove+0x1f/0x30
kernel: [ 154.522780] kobject_del+0x28/0x60
kernel: [ 154.522783] mddev_delayed_delete+0x24/0x30 [md_mod]
kernel: [ 154.522786] process_one_work+0x2a7/0x5f0
kernel: [ 154.522788] worker_thread+0x2d/0x3d0
kernel: [ 154.522793] kthread+0x117/0x130
kernel: [ 154.522795] ret_from_fork+0x3a/0x50
kernel: [ 154.522796]
kernel: [ 154.522796] -> #1 ((work_completion)(&mddev->del_work)){+.+.}:
kernel: [ 154.522800] process_one_work+0x27e/0x5f0
kernel: [ 154.522802] worker_thread+0x2d/0x3d0
kernel: [ 154.522804] kthread+0x117/0x130
kernel: [ 154.522806] ret_from_fork+0x3a/0x50
kernel: [ 154.522807]
kernel: [ 154.522807] -> #0 ((wq_completion)md_misc){+.+.}:
kernel: [ 154.522813] __lock_acquire+0x1392/0x1690
kernel: [ 154.522816] lock_acquire+0xb4/0x1a0
kernel: [ 154.522818] flush_workqueue+0xab/0x4b0
kernel: [ 154.522821] md_open+0xb6/0xc0 [md_mod]
kernel: [ 154.522823] __blkdev_get+0xea/0x590
kernel: [ 154.522825] blkdev_get+0x65/0x140
kernel: [ 154.522828] do_dentry_open+0x1d1/0x380
kernel: [ 154.522831] path_openat+0x567/0xcc0
kernel: [ 154.522834] do_filp_open+0x9b/0x110
kernel: [ 154.522836] do_sys_openat2+0x201/0x2a0
kernel: [ 154.522838] do_sys_open+0x57/0x80
kernel: [ 154.522840] do_syscall_64+0x64/0x2b0
kernel: [ 154.522842] entry_SYSCALL_64_after_hwframe+0x49/0xbe
kernel: [ 154.522844]
kernel: [ 154.522844] other info that might help us debug this:
kernel: [ 154.522844]
kernel: [ 154.522846] Chain exists of:
kernel: [ 154.522846] (wq_completion)md_misc --> &mddev->reconfig_mutex --> &bdev->bd_mutex
kernel: [ 154.522846]
kernel: [ 154.522850] Possible unsafe locking scenario:
kernel: [ 154.522850]
kernel: [ 154.522852] CPU0 CPU1
kernel: [ 154.522853] ---- ----
kernel: [ 154.522854] lock(&bdev->bd_mutex);
kernel: [ 154.522856] lock(&mddev->reconfig_mutex);
kernel: [ 154.522858] lock(&bdev->bd_mutex);
kernel: [ 154.522860] lock((wq_completion)md_misc);
kernel: [ 154.522861]
kernel: [ 154.522861] *** DEADLOCK ***
kernel: [ 154.522861]
kernel: [ 154.522864] 1 lock held by mdadm/2482:
kernel: [ 154.522865] #0: ffff88804efa9338 (&bdev->bd_mutex){+.+.}, at: __blkdev_get+0x79/0x590
kernel: [ 154.522868]
kernel: [ 154.522868] stack backtrace:
kernel: [ 154.522873] CPU: 1 PID: 2482 Comm: mdadm Tainted: G O 5.6.0-rc7-lp151.27-default #25
kernel: [ 154.522875] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
kernel: [ 154.522878] Call Trace:
kernel: [ 154.522881] dump_stack+0x8f/0xcb
kernel: [ 154.522884] check_noncircular+0x194/0x1b0
kernel: [ 154.522888] ? __lock_acquire+0x1392/0x1690
kernel: [ 154.522890] __lock_acquire+0x1392/0x1690
kernel: [ 154.522893] lock_acquire+0xb4/0x1a0
kernel: [ 154.522895] ? flush_workqueue+0x84/0x4b0
kernel: [ 154.522898] flush_workqueue+0xab/0x4b0
kernel: [ 154.522900] ? flush_workqueue+0x84/0x4b0
kernel: [ 154.522905] ? md_open+0xb6/0xc0 [md_mod]
kernel: [ 154.522908] md_open+0xb6/0xc0 [md_mod]
kernel: [ 154.522910] __blkdev_get+0xea/0x590
kernel: [ 154.522912] ? bd_acquire+0xc0/0xc0
kernel: [ 154.522914] blkdev_get+0x65/0x140
kernel: [ 154.522916] ? bd_acquire+0xc0/0xc0
kernel: [ 154.522918] do_dentry_open+0x1d1/0x380
kernel: [ 154.522921] path_openat+0x567/0xcc0
kernel: [ 154.522923] ? __lock_acquire+0x380/0x1690
kernel: [ 154.522926] do_filp_open+0x9b/0x110
kernel: [ 154.522929] ? __alloc_fd+0xe5/0x1f0
kernel: [ 154.522935] ? kmem_cache_alloc+0x28c/0x630
kernel: [ 154.522939] ? do_sys_openat2+0x201/0x2a0
kernel: [ 154.522941] do_sys_openat2+0x201/0x2a0
kernel: [ 154.522944] do_sys_open+0x57/0x80
kernel: [ 154.522946] do_syscall_64+0x64/0x2b0
kernel: [ 154.522948] entry_SYSCALL_64_after_hwframe+0x49/0xbe
kernel: [ 154.522951] RIP: 0033:0x7f98d279d9ae
And md_alloc also flushed the same workqueue, but the thing is different
here. Because all the paths call md_alloc don't hold bdev->bd_mutex, and
the flush is necessary to avoid race condition, so leave it as it is.
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
pull bot pushed a commit that referenced this pull request Jun 3, 2020
Dave Airlie reported the following lockdep complaint:
> ======================================================
> WARNING: possible circular locking dependency detected
> 5.7.0-0.rc5.20200515git1ae7efb38854.1.fc33.x86_64 #1 Not tainted
> ------------------------------------------------------
> kswapd0/159 is trying to acquire lock:
> ffff9b38d01a4470 (&xfs_nondir_ilock_class){++++}-{3:3},
> at: xfs_ilock+0xde/0x2c0 [xfs]
>
> but task is already holding lock:
> ffffffffbbb8bd00 (fs_reclaim){+.+.}-{0:0}, at:
> __fs_reclaim_acquire+0x5/0x30
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> -> #1 (fs_reclaim){+.+.}-{0:0}:
> fs_reclaim_acquire+0x34/0x40
> __kmalloc+0x4f/0x270
> kmem_alloc+0x93/0x1d0 [xfs]
> kmem_alloc_large+0x4c/0x130 [xfs]
> xfs_attr_copy_value+0x74/0xa0 [xfs]
> xfs_attr_get+0x9d/0xc0 [xfs]
> xfs_get_acl+0xb6/0x200 [xfs]
> get_acl+0x81/0x160
> posix_acl_xattr_get+0x3f/0xd0
> vfs_getxattr+0x148/0x170
> getxattr+0xa7/0x240
> path_getxattr+0x52/0x80
> do_syscall_64+0x5c/0xa0
> entry_SYSCALL_64_after_hwframe+0x49/0xb3
>
> -> #0 (&xfs_nondir_ilock_class){++++}-{3:3}:
> __lock_acquire+0x1257/0x20d0
> lock_acquire+0xb0/0x310
> down_write_nested+0x49/0x120
> xfs_ilock+0xde/0x2c0 [xfs]
> xfs_reclaim_inode+0x3f/0x400 [xfs]
> xfs_reclaim_inodes_ag+0x20b/0x410 [xfs]
> xfs_reclaim_inodes_nr+0x31/0x40 [xfs]
> super_cache_scan+0x190/0x1e0
> do_shrink_slab+0x184/0x420
> shrink_slab+0x182/0x290
> shrink_node+0x174/0x680
> balance_pgdat+0x2d0/0x5f0
> kswapd+0x21f/0x510
> kthread+0x131/0x150
> ret_from_fork+0x3a/0x50
>
> other info that might help us debug this:
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(fs_reclaim);
> lock(&xfs_nondir_ilock_class);
> lock(fs_reclaim);
> lock(&xfs_nondir_ilock_class);
>
> *** DEADLOCK ***
>
> 4 locks held by kswapd0/159:
> #0: ffffffffbbb8bd00 (fs_reclaim){+.+.}-{0:0}, at:
> __fs_reclaim_acquire+0x5/0x30
> #1: ffffffffbbb7cef8 (shrinker_rwsem){++++}-{3:3}, at:
> shrink_slab+0x115/0x290
> #2: ffff9b39f07a50e8
> (&type->s_umount_key#56){++++}-{3:3}, at: super_cache_scan+0x38/0x1e0
> #3: ffff9b39f077f258
> (&pag->pag_ici_reclaim_lock){+.+.}-{3:3}, at:
> xfs_reclaim_inodes_ag+0x82/0x410 [xfs]
This is a known false positive because inodes cannot simultaneously be
getting reclaimed and the target of a getxattr operation, but lockdep
doesn't know that. We can (selectively) shut up lockdep until either
it gets smarter or we change inode reclaim not to require the ILOCK by
applying a stupid GFP_NOLOCKDEP bandaid.
Reported-by: Dave Airlie <airlied@gmail.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Tested-by: Dave Airlie <airlied@gmail.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
pull bot pushed a commit that referenced this pull request Jun 4, 2020
It doesn't make senst to update a request that was not
created. So, instead of using cpu_latency_qos_update_request(),
let's use, instead cpu_latency_qos_add_request() at device
probing code.
This should fix this issue:
[ 9.691775] cpu_latency_qos_update_request called for unknown object
[ 9.695279] WARNING: CPU: 3 PID: 523 at kernel/power/qos.c:296 cpu_latency_qos_update_request+0x3a/0xb0
[ 9.698826] Modules linked in: snd_soc_acpi_intel_match snd_rawmidi snd_soc_acpi snd_soc_rl6231 snd_soc_core ath mac80211 snd_compress snd_hdmi_lpe_audio ac97_bus hid_sensor_accel_3d snd_pcm_dmaengine hid_sensor_gyro_3d hid_sensor_trigger industrialio_triggered_buffer kfifo_buf hid_sensor_iio_common processor_thermal_device industrialio cfg80211 snd_pcm snd_seq intel_rapl_common atomisp(C+) libarc4 intel_soc_dts_iosf cros_ec_ishtp intel_xhci_usb_role_switch mei_txe cros_ec videobuf_vmalloc mei roles atomisp_ov2680(C) videobuf_core snd_seq_device snd_timer spi_pxa2xx_platform videodev snd mc dw_dmac intel_hid dw_dmac_core 8250_dw soundcore int3406_thermal int3400_thermal intel_int0002_vgpio acpi_pad acpi_thermal_rel soc_button_array int3403_thermal int340x_thermal_zone mac_hid sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 hid_sensor_custom hid_sensor_hub intel_ishtp_loader intel_ishtp_hid crct10dif_pclmul crc32_pclmul ghash_clmulni_intel i915 mmc_block i2c_algo_bit
[ 9.698885] aesni_intel crypto_simd drm_kms_helper cryptd syscopyarea sysfillrect glue_helper sysimgblt fb_sys_fops cec intel_ish_ipc drm lpc_ich intel_ishtp hid_asus intel_soc_pmic_chtdc_ti asus_wmi i2c_hid sparse_keymap sdhci_acpi wmi video sdhci hid_generic usbhid hid
[ 9.736699] CPU: 3 PID: 523 Comm: systemd-udevd Tainted: G C 5.7.0-rc1+ #2
[ 9.741309] Hardware name: ASUSTeK COMPUTER INC. T101HA/T101HA, BIOS T101HA.305 01/24/2018
[ 9.745962] RIP: 0010:cpu_latency_qos_update_request+0x3a/0xb0
[ 9.750615] Code: 89 e5 41 55 41 54 41 89 f4 53 48 89 fb 48 81 7f 28 e0 7f c6 9e 74 1c 48 c7 c6 60 f3 65 9e 48 c7 c7 e8 a9 99 9e e8 b2 a6 f9 ff <0f> 0b 5b 41 5c 41 5d 5d c3 0f 1f 44 00 00 44 3b 23 74 ef 44 89 e2
[ 9.760065] RSP: 0018:ffffa865404f39c0 EFLAGS: 00010282
[ 9.764734] RAX: 0000000000000000 RBX: ffff9d2aefc84350 RCX: 0000000000000000
[ 9.769435] RDX: ffff9d2afbfa97c0 RSI: ffff9d2afbf99808 RDI: ffff9d2afbf99808
[ 9.774125] RBP: ffffa865404f39d8 R08: 0000000000000304 R09: 0000000000aaaaaa
[ 9.778804] R10: 0000000000000000 R11: 0000000000000001 R12: 00000000ffffffff
[ 9.783491] R13: ffff9d2afb4640b0 R14: ffffffffc07ecf20 R15: 0000000091000000
[ 9.788187] FS: 00007efe67ff8880(0000) GS:ffff9d2afbf80000(0000) knlGS:0000000000000000
[ 9.792864] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9.797482] CR2: 00007ffc6424bdc8 CR3: 0000000178998000 CR4: 00000000001006e0
[ 9.802126] Call Trace:
[ 9.806775] atomisp_pci_probe.cold.19+0x15f/0x116f [atomisp]
[ 9.811441] local_pci_probe+0x47/0x80
[ 9.816085] pci_device_probe+0xff/0x1b0
[ 9.820706] really_probe+0x1c8/0x3e0
[ 9.825247] driver_probe_device+0xd9/0x120
[ 9.829769] device_driver_attach+0x58/0x60
[ 9.834294] __driver_attach+0x8f/0x150
[ 9.838782] ? device_driver_attach+0x60/0x60
[ 9.843205] ? device_driver_attach+0x60/0x60
[ 9.847634] bus_for_each_dev+0x79/0xc0
[ 9.852033] ? kmem_cache_alloc_trace+0x167/0x230
[ 9.856462] driver_attach+0x1e/0x20
Reported-by: Patrik Gfeller <patrik.gfeller@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
pull bot pushed a commit that referenced this pull request Jun 4, 2020
The reclaim code that balances between swapping and cache reclaim tries to
predict likely reuse based on in-memory reference patterns alone. This
works in many cases, but when it fails it cannot detect when the cache is
thrashing pathologically, or when we're in the middle of a swap storm.
The high seek cost of rotational drives under which the algorithm evolved
also meant that mistakes could quickly result in lockups from too
aggressive swapping (which is predominantly random IO). As a result, the
balancing code has been tuned over time to a point where it mostly goes
for page cache and defers swapping until the VM is under significant
memory pressure.
The resulting strategy doesn't make optimal caching decisions - where
optimal is the least amount of IO required to execute the workload.
The proliferation of fast random IO devices such as SSDs, in-memory
compression such as zswap, and persistent memory technologies on the
horizon, has made this undesirable behavior very noticable: Even in the
presence of large amounts of cold anonymous memory and a capable swap
device, the VM refuses to even seriously scan these pages, and can leave
the page cache thrashing needlessly.
This series sets out to address this. Since commit ("a528910e12ec mm:
thrash detection-based file cache sizing") we have exact tracking of
refault IO - the ultimate cost of reclaiming the wrong pages. This allows
us to use an IO cost based balancing model that is more aggressive about
scanning anonymous memory when the cache is thrashing, while being able to
avoid unnecessary swap storms.
These patches base the LRU balance on the rate of refaults on each list,
times the relative IO cost between swap device and filesystem
(swappiness), in order to optimize reclaim for least IO cost incurred.
	History
I floated these changes in 2016. At the time they were incomplete and
full of workarounds due to a lack of infrastructure in the reclaim code:
We didn't have PageWorkingset, we didn't have hierarchical cgroup
statistics, and problems with the cgroup swap controller. As swapping
wasn't too high a priority then, the patches stalled out. With all
dependencies in place now, here we are again with much cleaner,
feature-complete patches.
I kept the acks for patches that stayed materially the same :-)
Below is a series of test results that demonstrate certain problematic
behavior of the current code, as well as showcase the new code's more
predictable and appropriate balancing decisions.
	Test #1: No convergence
This test shows an edge case where the VM currently doesn't converge at
all on a new file workingset with a stale anon/tmpfs set.
The test sets up a cold anon set the size of 3/4 RAM, then tries to
establish a new file set half the size of RAM (flat access pattern).
The vanilla kernel refuses to even scan anon pages and never converges.
The file set is perpetually served from the filesystem.
The first test kernel is with the series up to the workingset patch
applied. This allows thrashing page cache to challenge the anonymous
workingset. The VM then scans the lists based on the current
scanned/rotated balancing algorithm. It converges on a stable state where
all cold anon pages are pushed out and the fileset is served entirely from
cache:
			 noconverge/5.7-rc5-mm	noconverge/5.7-rc5-mm-workingset
Scanned			417719308.00 ( +0.00%)		64091155.00 ( -84.66%)
Reclaimed		417711094.00 ( +0.00%)		61640308.00 ( -85.24%)
Reclaim efficiency %	 100.00 ( +0.00%)		 96.18 ( -3.78%)
Scanned file		417719308.00 ( +0.00%)		59211118.00 ( -85.83%)
Scanned anon			0.00 ( +0.00%)	 4880037.00 ( )
Swapouts			0.00 ( +0.00%)	 2439957.00 ( )
Swapins				0.00 ( +0.00%)		 257.00 ( )
Refaults		415246605.00 ( +0.00%)		59183722.00 ( -85.75%)
Restore refaults		0.00 ( +0.00%)	 54988252.00 ( )
The second test kernel is with the full patch series applied, which
replaces the scanned/rotated ratios with refault/swapin rate-based
balancing. It evicts the cold anon pages more aggressively in the
presence of a thrashing cache and the absence of swapins, and so converges
with about 60% of the IO and reclaim activity:
			noconverge/5.7-rc5-mm-workingset	noconverge/5.7-rc5-mm-lrubalance
Scanned				64091155.00 ( +0.00%)		37579741.00 ( -41.37%)
Reclaimed			61640308.00 ( +0.00%)		35129293.00 ( -43.01%)
Reclaim efficiency %		 96.18 ( +0.00%)		 93.48 ( -2.78%)
Scanned file			59211118.00 ( +0.00%)		32708385.00 ( -44.76%)
Scanned anon			 4880037.00 ( +0.00%)		 4871356.00 ( -0.18%)
Swapouts			 2439957.00 ( +0.00%)		 2435565.00 ( -0.18%)
Swapins				 257.00 ( +0.00%)		 262.00 ( +1.94%)
Refaults			59183722.00 ( +0.00%)		32675667.00 ( -44.79%)
Restore refaults		54988252.00 ( +0.00%)		28480430.00 ( -48.21%)
We're triggering this case in host sideloading scenarios: When a host's
primary workload is not saturating the machine (primary load is usually
driven by user activity), we can optimistically sideload a batch job; if
user activity picks up and the primary workload needs the whole host
during this time, we freeze the sideload and rely on it getting pushed to
swap. Frequently that swapping doesn't happen and the completely inactive
sideload simply stays resident while the expanding primary worklad is
struggling to gain ground.
	Test #2: Kernel build
This test is a a kernel build that is slightly memory-restricted (make -j4
inside a 400M cgroup).
Despite the very aggressive swapping of cold anon pages in test #1, this
test shows that the new kernel carefully balances swap against cache
refaults when both the file and the cache set are pressured.
It shows the patched kernel to be slightly better at finding the coldest
memory from the combined anon and file set to evict under pressure. The
result is lower aggregate reclaim and paging activity:
z				 5.7-rc5-mm	5.7-rc5-mm-lrubalance
Real time		 210.60 ( +0.00%)	 210.97 ( +0.18%)
User time		 745.42 ( +0.00%)	 746.48 ( +0.14%)
System time		 69.78 ( +0.00%)	 69.79 ( +0.02%)
Scanned file		354682.00 ( +0.00%)	293661.00 ( -17.20%)
Scanned anon		465381.00 ( +0.00%)	378144.00 ( -18.75%)
Swapouts		185920.00 ( +0.00%)	147801.00 ( -20.50%)
Swapins			 34583.00 ( +0.00%)	 32491.00 ( -6.05%)
Refaults		212664.00 ( +0.00%)	172409.00 ( -18.93%)
Restore refaults	 48861.00 ( +0.00%)	 80091.00 ( +63.91%)
Total paging IO		433167.00 ( +0.00%)	352701.00 ( -18.58%)
	Test #3: Overload
This next test is not about performance, but rather about the
predictability of the algorithm. The current balancing behavior doesn't
always lead to comprehensible results, which makes performance analysis
and parameter tuning (swappiness e.g.) very difficult.
The test shows the balancing behavior under equivalent anon and file
input. Anon and file sets are created of equal size (3/4 RAM), have the
same access patterns (a hot-cold gradient), and synchronized access rates.
Swappiness is raised from the default of 60 to 100 to indicate equal IO
cost between swap and cache.
With the vanilla balancing code, anon scans make up around 9% of the total
pages scanned, or a ~1:10 ratio. This is a surprisingly skewed ratio, and
it's an outcome that is hard to explain given the input parameters to the
VM.
The new balancing model targets a 1:2 balance: All else being equal,
reclaiming a file page costs one page IO - the refault; reclaiming an anon
page costs two IOs - the swapout and the swapin. In the test we observe a
~1:3 balance.
The scanned and paging IO numbers indicate that the anon LRU algorithm we
have in place right now does a slightly worse job at picking the coldest
pages compared to the file algorithm. There is ongoing work to improve
this, like Joonsoo's anon workingset patches; however, it's difficult to
compare the two aging strategies when the balancing between them is
behaving unintuitively.
The slightly less efficient anon reclaim results in a deviation from the
optimal 1:2 scan ratio we would like to see here - however, 1:3 is much
closer to what we'd want to see in this test than the vanilla kernel's
aging of 10+ cache pages for every anonymous one:
			overload-100/5.7-rc5-mm-workingset	overload-100/5.7-rc5-mm-lrubalance-realfile
Scanned				 533633725.00 ( +0.00%)			 595687785.00 ( +11.63%)
Reclaimed			 494325440.00 ( +0.00%)			 518154380.00 ( +4.82%)
Reclaim efficiency %			92.63 ( +0.00%)				 86.98 ( -6.03%)
Scanned file			 484532894.00 ( +0.00%)			 456937722.00 ( -5.70%)
Scanned anon			 49100831.00 ( +0.00%)			 138750063.00 ( +182.58%)
Swapouts			 8096423.00 ( +0.00%)			 48982142.00 ( +504.98%)
Swapins				 10027384.00 ( +0.00%)			 6232504.00 ( +521.55%)
Refaults			 479819973.00 ( +0.00%)			 451309483.00 ( -5.94%)
Restore refaults		 426422087.00 ( +0.00%)			 399914067.00 ( -6.22%)
Total paging IO			 497943780.00 ( +0.00%)			 562616669.00 ( +12.99%)
	Test #4: Parallel IO
It's important to note that these patches only affect the situation where
the kernel has to reclaim workingset memory, which is usually a
transitionary period. The vast majority of page reclaim occuring in a
system is from trimming the ever-expanding page cache.
These patches don't affect cache trimming behavior. We never swap as long
as we only have use-once cache moving through the file LRU, we only
consider swapping when the cache is actively thrashing.
The following test demonstrates this. It has an anon workingset that
takes up half of RAM and then writes a file that is twice the size of RAM
out to disk.
As the cache is funneled through the inactive file list, no anon pages are
scanned (aside from apparently some background noise of 10 pages):
					 5.7-rc5-mm		 5.7-rc5-mm-lrubalance
Scanned			 10714722.00 ( +0.00%)		 10723445.00 ( +0.08%)
Reclaimed		 10703596.00 ( +0.00%)		 10712166.00 ( +0.08%)
Reclaim efficiency %		 99.90 ( +0.00%)			 99.89 ( -0.00%)
Scanned file		 10714722.00 ( +0.00%)		 10723435.00 ( +0.08%)
Scanned anon			 0.00 ( +0.00%)			 10.00 ( )
Swapouts			 0.00 ( +0.00%)			 7.00 ( )
Swapins				 0.00 ( +0.00%)			 0.00 ( +0.00%)
Refaults			 92.00 ( +0.00%)			 41.00 ( -54.84%)
Restore refaults		 0.00 ( +0.00%)			 0.00 ( +0.00%)
Total paging IO			 92.00 ( +0.00%)			 48.00 ( -47.31%)
This patch (of 14):
Currently, THP are counted as single pages until they are split right
before being swapped out. However, at that point the VM is already in the
middle of reclaim, and adjusting the LRU balance then is useless.
Always account THP by the number of basepages, and remove the fixup from
the splitting path.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Rik van Riel <riel@surriel.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Link: http://lkml.kernel.org/r/20200520232525.798933-1-hannes@cmpxchg.org
Link: http://lkml.kernel.org/r/20200520232525.798933-2-hannes@cmpxchg.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
pull bot pushed a commit that referenced this pull request Jun 4, 2020
It appears that preliminary documentation has a typo in the ID list,
i.e. LPSS UART #2 had been advertised wrongly.
Fix the driver according to the EDS v0.9.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
pull bot pushed a commit that referenced this pull request Jun 4, 2020
.../git/lee/mfd
Pull MFD updates from Lee Jones:
 "Core Frameworks:
 - Constify 'properties' attribute in core header file
 New Drivers:
 - Add support for Gateworks System Controller
 - Add support for MediaTek MT6358 PMIC
 - Add support for Mediatek MT6360 PMIC
 - Add support for Monolithic Power Systems MP2629 ADC and Battery charger
 Fix-ups:
 - Use new I2C API in htc-i2cpld
 - Remove superfluous code in sprd-sc27xx-spi
 - Improve error handling in stm32-timers
 - Device Tree additions/fixes in mt6397
 - Defer probe betterment in wm8994-core
 - Improve module handling in wm8994-core
 - Staticify in stpmic1
 - Trivial (spelling, formatting) in tqmx86
 Bug Fixes:
 - Fix incorrect register/PCI IDs in intel-lpss-pci
 - Fix unbalanced Regulator API calls in wm8994-core
 - Fix double free() in wcd934x
 - Remove IRQ domain on failure in stmfx
 - Reset chip on resume in stmfx
 - Disable/enable IRQs on suspend/resume in stmfx
 - Do not use bulk writes on H/W which does not support them in max77620"
* tag 'mfd-next-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (29 commits)
 mfd: mt6360: Remove duplicate REGMAP_IRQ_REG_LINE() entry
 mfd: Add support for PMIC MT6360
 mfd: max77620: Use single-byte writes on MAX77620
 mfd: wcd934x: Drop kfree for memory allocated with devm_kzalloc
 mfd: stmfx: Disable IRQ in suspend to avoid spurious interrupt
 mfd: stmfx: Fix stmfx_irq_init error path
 mfd: stmfx: Reset chip on resume as supply was disabled
 mfd: wm8994: Silence warning about supplies during deferred probe
 mfd: wm8994: Fix unbalanced calls to regulator_bulk_disable()
 mfd: wm8994: Fix driver operation if loaded as modules
 dt-bindings: mfd: mediatek: Add MT6397 Pin Controller
 mfd: Constify properties in mfd_cell
 mfd: stm32-timers: Use dma_request_chan() instead dma_request_slave_channel()
 mfd: sprd: Remove unnecessary spi_bus_type setting
 mfd: intel-lpss: Update LPSS UART #2 PCI ID for Jasper Lake
 mfd: tqmx86: Fix a typo in MODULE_DESCRIPTION
 mfd: stpmic1: Make stpmic1_regmap_config static
 mfd: htc-i2cpld: Convert to use i2c_new_client_device()
 MAINTAINERS: Add entry for mp2629 Battery Charger driver
 power: supply: mp2629: Add impedance compensation config
 ...
pull bot pushed a commit that referenced this pull request Jun 5, 2020
Implement rtas_call_reentrant() for reentrant rtas-calls:
"ibm,int-on", "ibm,int-off",ibm,get-xive" and "ibm,set-xive".
On LoPAPR Version 1.1 (March 24, 2016), from 7.3.10.1 to 7.3.10.4,
items 2 and 3 say:
2 - For the PowerPC External Interrupt option: The * call must be
reentrant to the number of processors on the platform.
3 - For the PowerPC External Interrupt option: The * argument call
buffer for each simultaneous call must be physically unique.
So, these rtas-calls can be called in a lockless way, if using
a different buffer for each cpu doing such rtas call.
For this, it was suggested to add the buffer (struct rtas_args)
in the PACA struct, so each cpu can have it's own buffer.
The PACA struct received a pointer to rtas buffer, which is
allocated in the memory range available to rtas 32-bit.
Reentrant rtas calls are useful to avoid deadlocks in crashing,
where rtas-calls are needed, but some other thread crashed holding
the rtas.lock.
This is a backtrace of a deadlock from a kdump testing environment:
 #0 arch_spin_lock
 #1 lock_rtas ()
 #2 rtas_call (token=8204, nargs=1, nret=1, outputs=0x0)
 #3 ics_rtas_mask_real_irq (hw_irq=4100)
 #4 machine_kexec_mask_interrupts
 #5 default_machine_crash_shutdown
 #6 machine_crash_shutdown
 #7 __crash_kexec
 #8 crash_kexec
 #9 oops_end
Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
[mpe: Move under #ifdef PSERIES to avoid build breakage]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200518234245.200672-3-leobras.c@gmail.com 
pull bot pushed a commit that referenced this pull request Jun 6, 2020
For INVADER_SERIES, each set of 8 reply queues (0 - 7, 8 - 15,..), and for
VENTURA_SERIES, each set of 16 reply queues (0 - 15, 16 - 31,..) need to be
within the same 4 GB boundary. Driver uses limitation of VENTURA_SERIES to
manage INVADER_SERIES as well. The driver is allocating the DMA able
memory for RDPQs accordingly.
1) At driver load, set DMA mask to 64 and allocate memory for RDPQs
2) Check if allocated resources for RDPQ are in the same 4GB range
3) If #2 is true, continue with 64 bit DMA and go to #6
4) If #2 is false, then free all the resources from #1
5) Set DMA mask to 32 and allocate RDPQs
6) Proceed with driver loading and other allocations
Link: https://lore.kernel.org/r/1587626596-1044-5-git-send-email-suganath-prabu.subramani@broadcom.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Suganath Prabu <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
pull bot pushed a commit that referenced this pull request Jun 11, 2020
I made every global per-network-namespace instead. But perhaps doing
that to this slab was a step too far.
The kmem_cache_create call in our net init method also seems to be
responsible for this lockdep warning:
[ 45.163710] Unable to find swap-space signature
[ 45.375718] trinity-c1 (855): attempted to duplicate a private mapping with mremap. This is not supported.
[ 46.055744] futex_wake_op: trinity-c1 tries to shift op by -209; fix this program
[ 51.011723]
[ 51.013378] ======================================================
[ 51.013875] WARNING: possible circular locking dependency detected
[ 51.014378] 5.2.0-rc2 #1 Not tainted
[ 51.014672] ------------------------------------------------------
[ 51.015182] trinity-c2/886 is trying to acquire lock:
[ 51.015593] 000000005405f099 (slab_mutex){+.+.}, at: slab_attr_store+0xa2/0x130
[ 51.016190]
[ 51.016190] but task is already holding lock:
[ 51.016652] 00000000ac662005 (kn->count#43){++++}, at: kernfs_fop_write+0x286/0x500
[ 51.017266]
[ 51.017266] which lock already depends on the new lock.
[ 51.017266]
[ 51.017909]
[ 51.017909] the existing dependency chain (in reverse order) is:
[ 51.018497]
[ 51.018497] -> #1 (kn->count#43){++++}:
[ 51.018956] __lock_acquire+0x7cf/0x1a20
[ 51.019317] lock_acquire+0x17d/0x390
[ 51.019658] __kernfs_remove+0x892/0xae0
[ 51.020020] kernfs_remove_by_name_ns+0x78/0x110
[ 51.020435] sysfs_remove_link+0x55/0xb0
[ 51.020832] sysfs_slab_add+0xc1/0x3e0
[ 51.021332] __kmem_cache_create+0x155/0x200
[ 51.021720] create_cache+0xf5/0x320
[ 51.022054] kmem_cache_create_usercopy+0x179/0x320
[ 51.022486] kmem_cache_create+0x1a/0x30
[ 51.022867] nfsd_reply_cache_init+0x278/0x560
[ 51.023266] nfsd_init_net+0x20f/0x5e0
[ 51.023623] ops_init+0xcb/0x4b0
[ 51.023928] setup_net+0x2fe/0x670
[ 51.024315] copy_net_ns+0x30a/0x3f0
[ 51.024653] create_new_namespaces+0x3c5/0x820
[ 51.025257] unshare_nsproxy_namespaces+0xd1/0x240
[ 51.025881] ksys_unshare+0x506/0x9c0
[ 51.026381] __x64_sys_unshare+0x3a/0x50
[ 51.026937] do_syscall_64+0x110/0x10b0
[ 51.027509] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 51.028175]
[ 51.028175] -> #0 (slab_mutex){+.+.}:
[ 51.028817] validate_chain+0x1c51/0x2cc0
[ 51.029422] __lock_acquire+0x7cf/0x1a20
[ 51.029947] lock_acquire+0x17d/0x390
[ 51.030438] __mutex_lock+0x100/0xfa0
[ 51.030995] mutex_lock_nested+0x27/0x30
[ 51.031516] slab_attr_store+0xa2/0x130
[ 51.032020] sysfs_kf_write+0x11d/0x180
[ 51.032529] kernfs_fop_write+0x32a/0x500
[ 51.033056] do_loop_readv_writev+0x21d/0x310
[ 51.033627] do_iter_write+0x2e5/0x380
[ 51.034148] vfs_writev+0x170/0x310
[ 51.034616] do_pwritev+0x13e/0x160
[ 51.035100] __x64_sys_pwritev+0xa3/0x110
[ 51.035633] do_syscall_64+0x110/0x10b0
[ 51.036200] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 51.036924]
[ 51.036924] other info that might help us debug this:
[ 51.036924]
[ 51.037876] Possible unsafe locking scenario:
[ 51.037876]
[ 51.038556] CPU0 CPU1
[ 51.039130] ---- ----
[ 51.039676] lock(kn->count#43);
[ 51.040084] lock(slab_mutex);
[ 51.040597] lock(kn->count#43);
[ 51.041062] lock(slab_mutex);
[ 51.041320]
[ 51.041320] *** DEADLOCK ***
[ 51.041320]
[ 51.041793] 3 locks held by trinity-c2/886:
[ 51.042128] #0: 000000001f55e152 (sb_writers#5){.+.+}, at: vfs_writev+0x2b9/0x310
[ 51.042739] #1: 00000000c7d6c034 (&of->mutex){+.+.}, at: kernfs_fop_write+0x25b/0x500
[ 51.043400] #2: 00000000ac662005 (kn->count#43){++++}, at: kernfs_fop_write+0x286/0x500
Reported-by: kernel test robot <lkp@intel.com>
Fixes: 3ba7583 "drc containerization"
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
pull bot pushed a commit that referenced this pull request Jun 12, 2020
The first version of Clang that supports -tsan-distinguish-volatile will
be able to support KCSAN. The first Clang release to do so, will be
Clang 11. This is due to satisfying all the following requirements:
1. Never emit calls to __tsan_func_{entry,exit}.
2. __no_kcsan functions should not call anything, not even
 kcsan_{enable,disable}_current(), when using __{READ,WRITE}_ONCE => Requires
 leaving them plain!
3. Support atomic_{read,set}*() with KCSAN, which rely on
 arch_atomic_{read,set}*() using __{READ,WRITE}_ONCE() => Because of
 #2, rely on Clang 11's -tsan-distinguish-volatile support. We will
 double-instrument atomic_{read,set}*(), but that's reasonable given
 it's still lower cost than the data_race() variant due to avoiding 2
 extra calls (kcsan_{en,dis}able_current() calls).
4. __always_inline functions inlined into __no_kcsan functions are never
 instrumented.
5. __always_inline functions inlined into instrumented functions are
 instrumented.
6. __no_kcsan_or_inline functions may be inlined into __no_kcsan functions =>
 Implies leaving 'noinline' off of __no_kcsan_or_inline.
7. Because of #6, __no_kcsan and __no_kcsan_or_inline functions should never be
 spuriously inlined into instrumented functions, causing the accesses of the
 __no_kcsan function to be instrumented.
Older versions of Clang do not satisfy #3. The latest GCC currently
doesn't support at least #1, #3, and #7.
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lkml.kernel.org/r/CANpmjNMTsY_8241bS7=XAfqvZHFLrVEkv_uM4aDUWE_kh3Rvbw@mail.gmail.com
Link: https://lkml.kernel.org/r/20200521142047.169334-7-elver@google.com 
pull bot pushed a commit that referenced this pull request Jun 12, 2020
It is unsafe to traverse kvm->arch.spapr_tce_tables and
stt->iommu_tables without the RCU read lock held. Also, add
cond_resched_rcu() in places with the RCU read lock held that could take
a while to finish.
 arch/powerpc/kvm/book3s_64_vio.c:76 RCU-list traversed in non-reader section!!
 other info that might help us debug this:
 rcu_scheduler_active = 2, debug_locks = 1
 no locks held by qemu-kvm/4265.
 stack backtrace:
 CPU: 96 PID: 4265 Comm: qemu-kvm Not tainted 5.7.0-rc4-next-20200508+ #2
 Call Trace:
 [c000201a8690f720] [c000000000715948] dump_stack+0xfc/0x174 (unreliable)
 [c000201a8690f770] [c0000000001d9470] lockdep_rcu_suspicious+0x140/0x164
 [c000201a8690f7f0] [c008000010b9fb48] kvm_spapr_tce_release_iommu_group+0x1f0/0x220 [kvm]
 [c000201a8690f870] [c008000010b8462c] kvm_spapr_tce_release_vfio_group+0x54/0xb0 [kvm]
 [c000201a8690f8a0] [c008000010b84710] kvm_vfio_destroy+0x88/0x140 [kvm]
 [c000201a8690f8f0] [c008000010b7d488] kvm_put_kvm+0x370/0x600 [kvm]
 [c000201a8690f990] [c008000010b7e3c0] kvm_vm_release+0x38/0x60 [kvm]
 [c000201a8690f9c0] [c0000000005223f4] __fput+0x124/0x330
 [c000201a8690fa20] [c000000000151cd8] task_work_run+0xb8/0x130
 [c000201a8690fa70] [c0000000001197e8] do_exit+0x4e8/0xfa0
 [c000201a8690fb70] [c00000000011a374] do_group_exit+0x64/0xd0
 [c000201a8690fbb0] [c000000000132c90] get_signal+0x1f0/0x1200
 [c000201a8690fcc0] [c000000000020690] do_notify_resume+0x130/0x3c0
 [c000201a8690fda0] [c000000000038d64] syscall_exit_prepare+0x1a4/0x280
 [c000201a8690fe20] [c00000000000c8f8] system_call_common+0xf8/0x278
 ====
 arch/powerpc/kvm/book3s_64_vio.c:368 RCU-list traversed in non-reader section!!
 other info that might help us debug this:
 rcu_scheduler_active = 2, debug_locks = 1
 2 locks held by qemu-kvm/4264:
 #0: c000201ae2d000d8 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0xdc/0x950 [kvm]
 #1: c000200c9ed0c468 (&kvm->srcu){....}-{0:0}, at: kvmppc_h_put_tce+0x88/0x340 [kvm]
 ====
 arch/powerpc/kvm/book3s_64_vio.c:108 RCU-list traversed in non-reader section!!
 other info that might help us debug this:
 rcu_scheduler_active = 2, debug_locks = 1
 1 lock held by qemu-kvm/4257:
 #0: c000200b1b363a40 (&kv->lock){+.+.}-{3:3}, at: kvm_vfio_set_attr+0x598/0x6c0 [kvm]
 ====
 arch/powerpc/kvm/book3s_64_vio.c:146 RCU-list traversed in non-reader section!!
 other info that might help us debug this:
 rcu_scheduler_active = 2, debug_locks = 1
 1 lock held by qemu-kvm/4257:
 #0: c000200b1b363a40 (&kv->lock){+.+.}-{3:3}, at: kvm_vfio_set_attr+0x598/0x6c0 [kvm]
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
pull bot pushed a commit that referenced this pull request Jun 13, 2020
xen_failsafe_callback() is invoked from XEN for two cases:
 1. Fault while reloading DS, ES, FS or GS
 2. Fault while executing IRET
 #1 retries the IRET after XEN has fixed up the segments.
 #2 injects a #GP which kills the task
For #1 there is no reason to go through the full exception return path
because the tasks TIF state is still the same. So just going straight to
the IRET path is good enough.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202118.423224507@linutronix.de 
pull bot pushed a commit that referenced this pull request Jun 13, 2020
The commits cd0e00c and 92d7223 broke boot on the Alpha Avanti
platform. The patches move memory barriers after a write before the write.
The result is that if there's iowrite followed by ioread, there is no
barrier between them.
The Alpha architecture allows reordering of the accesses to the I/O space,
and the missing barrier between write and read causes hang with serial
port and real time clock.
This patch makes barriers confiorm to the specification.
1. We add mb() before readX_relaxed and writeX_relaxed -
 memory-barriers.txt claims that these functions must be ordered w.r.t.
 each other. Alpha doesn't order them, so we need an explicit barrier.
2. We add mb() before reads from the I/O space - so that if there's a
 write followed by a read, there should be a barrier between them.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Fixes: cd0e00c ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering")
Fixes: 92d7223 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering #2")
Cc: stable@vger.kernel.org # v4.17+
Acked-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Reviewed-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Matt Turner <mattst88@gmail.com>
pull bot pushed a commit that referenced this pull request Jun 19, 2020
In blkdev_get() we call __blkdev_get() to do some internal jobs and if
there is some errors in __blkdev_get(), the bdput() is called which
means we have released the refcount of the bdev (actually the refcount of
the bdev inode). This means we cannot access bdev after that point. But
acctually bdev is still accessed in blkdev_get() after calling
__blkdev_get(). This results in use-after-free if the refcount is the
last one we released in __blkdev_get(). Let's take a look at the
following scenerio:
 CPU0 CPU1 CPU2
blkdev_open blkdev_open Remove disk
 bd_acquire
		 blkdev_get
		 __blkdev_get del_gendisk
					bdev_unhash_inode
 bd_acquire bdev_get_gendisk
 bd_forget failed because of unhashed
	 bdput
	 bdput (the last one)
		 bdev_evict_inode
	 	 access bdev => use after free
[ 459.350216] BUG: KASAN: use-after-free in __lock_acquire+0x24c1/0x31b0
[ 459.351190] Read of size 8 at addr ffff88806c815a80 by task syz-executor.0/20132
[ 459.352347]
[ 459.352594] CPU: 0 PID: 20132 Comm: syz-executor.0 Not tainted 4.19.90 #2
[ 459.353628] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[ 459.354947] Call Trace:
[ 459.355337] dump_stack+0x111/0x19e
[ 459.355879] ? __lock_acquire+0x24c1/0x31b0
[ 459.356523] print_address_description+0x60/0x223
[ 459.357248] ? __lock_acquire+0x24c1/0x31b0
[ 459.357887] kasan_report.cold+0xae/0x2d8
[ 459.358503] __lock_acquire+0x24c1/0x31b0
[ 459.359120] ? _raw_spin_unlock_irq+0x24/0x40
[ 459.359784] ? lockdep_hardirqs_on+0x37b/0x580
[ 459.360465] ? _raw_spin_unlock_irq+0x24/0x40
[ 459.361123] ? finish_task_switch+0x125/0x600
[ 459.361812] ? finish_task_switch+0xee/0x600
[ 459.362471] ? mark_held_locks+0xf0/0xf0
[ 459.363108] ? __schedule+0x96f/0x21d0
[ 459.363716] lock_acquire+0x111/0x320
[ 459.364285] ? blkdev_get+0xce/0xbe0
[ 459.364846] ? blkdev_get+0xce/0xbe0
[ 459.365390] __mutex_lock+0xf9/0x12a0
[ 459.365948] ? blkdev_get+0xce/0xbe0
[ 459.366493] ? bdev_evict_inode+0x1f0/0x1f0
[ 459.367130] ? blkdev_get+0xce/0xbe0
[ 459.367678] ? destroy_inode+0xbc/0x110
[ 459.368261] ? mutex_trylock+0x1a0/0x1a0
[ 459.368867] ? __blkdev_get+0x3e6/0x1280
[ 459.369463] ? bdev_disk_changed+0x1d0/0x1d0
[ 459.370114] ? blkdev_get+0xce/0xbe0
[ 459.370656] blkdev_get+0xce/0xbe0
[ 459.371178] ? find_held_lock+0x2c/0x110
[ 459.371774] ? __blkdev_get+0x1280/0x1280
[ 459.372383] ? lock_downgrade+0x680/0x680
[ 459.373002] ? lock_acquire+0x111/0x320
[ 459.373587] ? bd_acquire+0x21/0x2c0
[ 459.374134] ? do_raw_spin_unlock+0x4f/0x250
[ 459.374780] blkdev_open+0x202/0x290
[ 459.375325] do_dentry_open+0x49e/0x1050
[ 459.375924] ? blkdev_get_by_dev+0x70/0x70
[ 459.376543] ? __x64_sys_fchdir+0x1f0/0x1f0
[ 459.377192] ? inode_permission+0xbe/0x3a0
[ 459.377818] path_openat+0x148c/0x3f50
[ 459.378392] ? kmem_cache_alloc+0xd5/0x280
[ 459.379016] ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 459.379802] ? path_lookupat.isra.0+0x900/0x900
[ 459.380489] ? __lock_is_held+0xad/0x140
[ 459.381093] do_filp_open+0x1a1/0x280
[ 459.381654] ? may_open_dev+0xf0/0xf0
[ 459.382214] ? find_held_lock+0x2c/0x110
[ 459.382816] ? lock_downgrade+0x680/0x680
[ 459.383425] ? __lock_is_held+0xad/0x140
[ 459.384024] ? do_raw_spin_unlock+0x4f/0x250
[ 459.384668] ? _raw_spin_unlock+0x1f/0x30
[ 459.385280] ? __alloc_fd+0x448/0x560
[ 459.385841] do_sys_open+0x3c3/0x500
[ 459.386386] ? filp_open+0x70/0x70
[ 459.386911] ? trace_hardirqs_on_thunk+0x1a/0x1c
[ 459.387610] ? trace_hardirqs_off_caller+0x55/0x1c0
[ 459.388342] ? do_syscall_64+0x1a/0x520
[ 459.388930] do_syscall_64+0xc3/0x520
[ 459.389490] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 459.390248] RIP: 0033:0x416211
[ 459.390720] Code: 75 14 b8 02 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83
04 19 00 00 c3 48 83 ec 08 e8 0a fa ff ff 48 89 04 24 b8 02 00 00 00 0f
 05 <48> 8b 3c 24 48 89 c2 e8 53 fa ff ff 48 89 d0 48 83 c4 08 48 3d
 01
[ 459.393483] RSP: 002b:00007fe45dfe9a60 EFLAGS: 00000293 ORIG_RAX: 0000000000000002
[ 459.394610] RAX: ffffffffffffffda RBX: 00007fe45dfea6d4 RCX: 0000000000416211
[ 459.395678] RDX: 00007fe45dfe9b0a RSI: 0000000000000002 RDI: 00007fe45dfe9b00
[ 459.396758] RBP: 000000000076bf20 R08: 0000000000000000 R09: 000000000000000a
[ 459.397930] R10: 0000000000000075 R11: 0000000000000293 R12: 00000000ffffffff
[ 459.399022] R13: 0000000000000bd9 R14: 00000000004cdb80 R15: 000000000076bf2c
[ 459.400168]
[ 459.400430] Allocated by task 20132:
[ 459.401038] kasan_kmalloc+0xbf/0xe0
[ 459.401652] kmem_cache_alloc+0xd5/0x280
[ 459.402330] bdev_alloc_inode+0x18/0x40
[ 459.402970] alloc_inode+0x5f/0x180
[ 459.403510] iget5_locked+0x57/0xd0
[ 459.404095] bdget+0x94/0x4e0
[ 459.404607] bd_acquire+0xfa/0x2c0
[ 459.405113] blkdev_open+0x110/0x290
[ 459.405702] do_dentry_open+0x49e/0x1050
[ 459.406340] path_openat+0x148c/0x3f50
[ 459.406926] do_filp_open+0x1a1/0x280
[ 459.407471] do_sys_open+0x3c3/0x500
[ 459.408010] do_syscall_64+0xc3/0x520
[ 459.408572] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 459.409415]
[ 459.409679] Freed by task 1262:
[ 459.410212] __kasan_slab_free+0x129/0x170
[ 459.410919] kmem_cache_free+0xb2/0x2a0
[ 459.411564] rcu_process_callbacks+0xbb2/0x2320
[ 459.412318] __do_softirq+0x225/0x8ac
Fix this by delaying bdput() to the end of blkdev_get() which means we
have finished accessing bdev.
Fixes: 77ea887 ("implement in-kernel gendisk events handling")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
pull bot pushed a commit that referenced this pull request Jun 19, 2020
Unfortunately, most versions of clang that support BTI are capable of
miscompiling the kernel when converting a switch statement into a jump
table. As an example, attempting to spawn a KVM guest results in a panic:
[ 56.253312] Kernel panic - not syncing: bad mode
[ 56.253834] CPU: 0 PID: 279 Comm: lkvm Not tainted 5.8.0-rc1 #2
[ 56.254225] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
[ 56.254712] Call trace:
[ 56.254952] dump_backtrace+0x0/0x1d4
[ 56.255305] show_stack+0x1c/0x28
[ 56.255647] dump_stack+0xc4/0x128
[ 56.255905] panic+0x16c/0x35c
[ 56.256146] bad_el0_sync+0x0/0x58
[ 56.256403] el1_sync_handler+0xb4/0xe0
[ 56.256674] el1_sync+0x7c/0x100
[ 56.256928] kvm_vm_ioctl_check_extension_generic+0x74/0x98
[ 56.257286] __arm64_sys_ioctl+0x94/0xcc
[ 56.257569] el0_svc_common+0x9c/0x150
[ 56.257836] do_el0_svc+0x84/0x90
[ 56.258083] el0_sync_handler+0xf8/0x298
[ 56.258361] el0_sync+0x158/0x180
This is because the switch in kvm_vm_ioctl_check_extension_generic()
is executed as an indirect branch to tail-call through a jump table:
ffff800010032dc8: 3869694c ldrb w12, [x10, x9]
ffff800010032dcc: 8b0c096b add x11, x11, x12, lsl #2
ffff800010032dd0: d61f0160 br x11
However, where the target case uses the stack, the landing pad is elided
due to the presence of a paciasp instruction:
ffff800010032e14: d503233f paciasp
ffff800010032e18: a9bf7bfd stp x29, x30, [sp, #-16]!
ffff800010032e1c: 910003fd mov x29, sp
ffff800010032e20: aa0803e0 mov x0, x8
ffff800010032e24: 940017c0 bl ffff800010038d24 <kvm_vm_ioctl_check_extension>
ffff800010032e28: 93407c00 sxtw x0, w0
ffff800010032e2c: a8c17bfd ldp x29, x30, [sp], #16
ffff800010032e30: d50323bf autiasp
ffff800010032e34: d65f03c0 ret
Unfortunately, this results in a fatal exception because paciasp is
compatible only with branch-and-link (call) instructions and not simple
indirect branches.
A fix is being merged into Clang 10.0.1 so that a 'bti j' instruction is
emitted as an explicit landing pad in this situation. Make in-kernel
BTI depend on that compiler version when building with clang.
Cc: Tom Stellard <tstellar@redhat.com>
Cc: Daniel Kiss <daniel.kiss@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Acked-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Acked-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/r/20200615105524.GA2694@willie-the-truck
Link: https://lore.kernel.org/r/20200616183630.2445-1-will@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
pull bot pushed a commit that referenced this pull request Jun 25, 2020
destroy_qp_common is called for flows where QP is already created by
HW. While it is called from IB/core, the ibqp.* fields will be fully
initialized, but it is not the case if this function is called during QP
creation.
Don't rely on ibqp fields as much as possible and initialize
send_cq/recv_cq as temporal solution till all drivers will be converted to
IB/core QP allocation scheme.
refcount_t: underflow; use-after-free.
WARNING: CPU: 1 PID: 5372 at lib/refcount.c:28 refcount_warn_saturate+0xfe/0x1a0
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 5372 Comm: syz-executor.2 Not tainted 5.5.0-rc5 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
Call Trace:
 mlx5_core_put_rsc+0x70/0x80
 destroy_resource_common+0x8e/0xb0
 mlx5_core_destroy_qp+0xaf/0x1d0
 mlx5_ib_destroy_qp+0xeb0/0x1460
 ib_destroy_qp_user+0x2d5/0x7d0
 create_qp+0xed3/0x2130
 ib_uverbs_create_qp+0x13e/0x190
 ? ib_uverbs_ex_create_qp
 ib_uverbs_write+0xaa5/0xdf0
 __vfs_write+0x7c/0x100
 ksys_write+0xc8/0x200
 do_syscall_64+0x9c/0x390
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 08d5397 ("RDMA/mlx5: Copy response to the user in one place")
Link: https://lore.kernel.org/r/20200617130148.2846643-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
pull bot pushed a commit that referenced this pull request Jun 26, 2020
wenxu says:
====================
several fixes for indirect flow_blocks offload
v2:
patch2: store the cb_priv of representor to the flow_block_cb->indr.cb_priv
in the driver. And make the correct check with the statments
this->indr.cb_priv == cb_priv
patch4: del the driver list only in the indriect cleanup callbacks
v3:
add the cover letter and changlogs.
v4:
collapsed 1/4, 2/4, 4/4 in v3 to one fix
Add the prepare patch 1 and 2
v5:
patch1: place flow_indr_block_cb_alloc() right before
flow_indr_dev_setup_offload() to avoid moving flow_block_indr_init()
This series fixes commit 1fac52d ("net: flow_offload: consolidate
indirect flow_block infrastructure") that revists the flow_block
infrastructure.
patch #1 #2: prepare for fix patch #3
add and use flow_indr_block_cb_alloc/remove function
patch #3: fix flow_indr_dev_unregister path
If the representor is removed, then identify the indirect flow_blocks
that need to be removed by the release callback and the port representor
structure. To identify the port representor structure, a new
indr.cb_priv field needs to be introduced. The flow_block also needs to
be removed from the driver list from the cleanup path
patch#4 fix block->nooffloaddevcnt warning dmesg log.
When a indr device add in offload success. The block->nooffloaddevcnt
should be 0. After the representor go away. When the dir device go away
the flow_block UNBIND operation with -EOPNOTSUPP which lead the warning
demesg log.
The block->nooffloaddevcnt should always count for indr block.
even the indr block offload successful. The representor maybe
gone away and the ingress qdisc can work in software mode.
====================
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
pull bot pushed a commit that referenced this pull request Jul 4, 2020
GFP_KERNEL flag specifies a normal kernel allocation in which executing
in process context without any locks and can sleep.
mmio_diff takes sometime to finish all the diff compare and it has
locks, continue using GFP_KERNEL will output below trace if LOCKDEP
enabled.
Use GFP_ATOMIC instead.
V2: Rebase.
=====================================================
WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
5.7.0-rc2 torvalds#400 Not tainted
-----------------------------------------------------
is trying to acquire:
ffffffffb47bea20 (fs_reclaim){+.+.}-{0:0}, at: fs_reclaim_acquire.part.0+0x0/0x30
 and this task is already holding:
ffff88845b85cc90 (&gvt->scheduler.mmio_context_lock){+.-.}-{2:2}, at: vgpu_mmio_diff_show+0xcf/0x2e0
which would create a new lock dependency:
 (&gvt->scheduler.mmio_context_lock){+.-.}-{2:2} -> (fs_reclaim){+.+.}-{0:0}
 but this new dependency connects a SOFTIRQ-irq-safe lock:
 (&gvt->scheduler.mmio_context_lock){+.-.}-{2:2}
 ... which became SOFTIRQ-irq-safe at:
 lock_acquire+0x175/0x4e0
 _raw_spin_lock_irqsave+0x2b/0x40
 shadow_context_status_change+0xfe/0x2f0
 notifier_call_chain+0x6a/0xa0
 __atomic_notifier_call_chain+0x5f/0xf0
 execlists_schedule_out+0x42a/0x820
 process_csb+0xe7/0x3e0
 execlists_submission_tasklet+0x5c/0x1d0
 tasklet_action_common.isra.0+0xeb/0x260
 __do_softirq+0x11d/0x56f
 irq_exit+0xf6/0x100
 do_IRQ+0x7f/0x160
 ret_from_intr+0x0/0x2a
 cpuidle_enter_state+0xcd/0x5b0
 cpuidle_enter+0x37/0x60
 do_idle+0x337/0x3f0
 cpu_startup_entry+0x14/0x20
 start_kernel+0x58b/0x5c5
 secondary_startup_64+0xa4/0xb0
 to a SOFTIRQ-irq-unsafe lock:
 (fs_reclaim){+.+.}-{0:0}
 ... which became SOFTIRQ-irq-unsafe at:
...
 lock_acquire+0x175/0x4e0
 fs_reclaim_acquire.part.0+0x20/0x30
 kmem_cache_alloc_node_trace+0x2e/0x290
 alloc_worker+0x2b/0xb0
 init_rescuer.part.0+0x17/0xe0
 workqueue_init+0x293/0x3bb
 kernel_init_freeable+0x149/0x325
 kernel_init+0x8/0x116
 ret_from_fork+0x3a/0x50
 other info that might help us debug this:
 Possible interrupt unsafe locking scenario:
 CPU0 CPU1
 ---- ----
 lock(fs_reclaim);
 local_irq_disable();
 lock(&gvt->scheduler.mmio_context_lock);
 lock(fs_reclaim);
 <Interrupt>
 lock(&gvt->scheduler.mmio_context_lock);
 *** DEADLOCK ***
3 locks held by cat/1439:
 #0: ffff888444a23698 (&p->lock){+.+.}-{3:3}, at: seq_read+0x49/0x680
 #1: ffff88845b858068 (&gvt->lock){+.+.}-{3:3}, at: vgpu_mmio_diff_show+0xc7/0x2e0
 #2: ffff88845b85cc90 (&gvt->scheduler.mmio_context_lock){+.-.}-{2:2}, at: vgpu_mmio_diff_show+0xcf/0x2e0
 the dependencies between SOFTIRQ-irq-safe lock and the holding lock:
-> (&gvt->scheduler.mmio_context_lock){+.-.}-{2:2} ops: 31 {
 HARDIRQ-ON-W at:
 lock_acquire+0x175/0x4e0
 _raw_spin_lock_bh+0x2f/0x40
 vgpu_mmio_diff_show+0xcf/0x2e0
 seq_read+0x242/0x680
 full_proxy_read+0x95/0xc0
 vfs_read+0xc2/0x1b0
 ksys_read+0xc4/0x160
 do_syscall_64+0x63/0x290
 entry_SYSCALL_64_after_hwframe+0x49/0xb3
 IN-SOFTIRQ-W at:
 lock_acquire+0x175/0x4e0
 _raw_spin_lock_irqsave+0x2b/0x40
 shadow_context_status_change+0xfe/0x2f0
 notifier_call_chain+0x6a/0xa0
 __atomic_notifier_call_chain+0x5f/0xf0
 execlists_schedule_out+0x42a/0x820
 process_csb+0xe7/0x3e0
 execlists_submission_tasklet+0x5c/0x1d0
 tasklet_action_common.isra.0+0xeb/0x260
 __do_softirq+0x11d/0x56f
 irq_exit+0xf6/0x100
 do_IRQ+0x7f/0x160
 ret_from_intr+0x0/0x2a
 cpuidle_enter_state+0xcd/0x5b0
 cpuidle_enter+0x37/0x60
 do_idle+0x337/0x3f0
 cpu_startup_entry+0x14/0x20
 start_kernel+0x58b/0x5c5
 secondary_startup_64+0xa4/0xb0
 INITIAL USE at:
 lock_acquire+0x175/0x4e0
 _raw_spin_lock_irqsave+0x2b/0x40
 shadow_context_status_change+0xfe/0x2f0
 notifier_call_chain+0x6a/0xa0
 __atomic_notifier_call_chain+0x5f/0xf0
 execlists_schedule_in+0x2c8/0x690
 __execlists_submission_tasklet+0x1303/0x1930
 execlists_submit_request+0x1e7/0x230
 submit_notify+0x105/0x2a4
 __i915_sw_fence_complete+0xaa/0x380
 __engine_park+0x313/0x5a0
 ____intel_wakeref_put_last+0x3e/0x90
 intel_gt_resume+0x41e/0x440
 intel_gt_init+0x283/0xbc0
 i915_gem_init+0x197/0x240
 i915_driver_probe+0xc2d/0x12e0
 i915_pci_probe+0xa2/0x1e0
 local_pci_probe+0x6f/0xb0
 pci_device_probe+0x171/0x230
 really_probe+0x17a/0x380
 driver_probe_device+0x70/0xf0
 device_driver_attach+0x82/0x90
 __driver_attach+0x60/0x100
 bus_for_each_dev+0xe4/0x140
 bus_add_driver+0x257/0x2a0
 driver_register+0xd3/0x150
 i915_init+0x6d/0x80
 do_one_initcall+0xb8/0x3a0
 kernel_init_freeable+0x2b4/0x325
 kernel_init+0x8/0x116
 ret_from_fork+0x3a/0x50
 }
__key.77812+0x0/0x40
 ... acquired at:
 lock_acquire+0x175/0x4e0
 fs_reclaim_acquire.part.0+0x20/0x30
 kmem_cache_alloc_trace+0x2e/0x260
 mmio_diff_handler+0xc0/0x150
 intel_gvt_for_each_tracked_mmio+0x7b/0x140
 vgpu_mmio_diff_show+0x111/0x2e0
 seq_read+0x242/0x680
 full_proxy_read+0x95/0xc0
 vfs_read+0xc2/0x1b0
 ksys_read+0xc4/0x160
 do_syscall_64+0x63/0x290
 entry_SYSCALL_64_after_hwframe+0x49/0xb3
 the dependencies between the lock to be acquired
 and SOFTIRQ-irq-unsafe lock:
-> (fs_reclaim){+.+.}-{0:0} ops: 1999031 {
 HARDIRQ-ON-W at:
 lock_acquire+0x175/0x4e0
 fs_reclaim_acquire.part.0+0x20/0x30
 kmem_cache_alloc_node_trace+0x2e/0x290
 alloc_worker+0x2b/0xb0
 init_rescuer.part.0+0x17/0xe0
 workqueue_init+0x293/0x3bb
 kernel_init_freeable+0x149/0x325
 kernel_init+0x8/0x116
 ret_from_fork+0x3a/0x50
 SOFTIRQ-ON-W at:
 lock_acquire+0x175/0x4e0
 fs_reclaim_acquire.part.0+0x20/0x30
 kmem_cache_alloc_node_trace+0x2e/0x290
 alloc_worker+0x2b/0xb0
 init_rescuer.part.0+0x17/0xe0
 workqueue_init+0x293/0x3bb
 kernel_init_freeable+0x149/0x325
 kernel_init+0x8/0x116
 ret_from_fork+0x3a/0x50
 INITIAL USE at:
 lock_acquire+0x175/0x4e0
 fs_reclaim_acquire.part.0+0x20/0x30
 kmem_cache_alloc_node_trace+0x2e/0x290
 alloc_worker+0x2b/0xb0
 init_rescuer.part.0+0x17/0xe0
 workqueue_init+0x293/0x3bb
 kernel_init_freeable+0x149/0x325
 kernel_init+0x8/0x116
 ret_from_fork+0x3a/0x50
 }
__fs_reclaim_map+0x0/0x60
 ... acquired at:
 lock_acquire+0x175/0x4e0
 fs_reclaim_acquire.part.0+0x20/0x30
 kmem_cache_alloc_trace+0x2e/0x260
 mmio_diff_handler+0xc0/0x150
 intel_gvt_for_each_tracked_mmio+0x7b/0x140
 vgpu_mmio_diff_show+0x111/0x2e0
 seq_read+0x242/0x680
 full_proxy_read+0x95/0xc0
 vfs_read+0xc2/0x1b0
 ksys_read+0xc4/0x160
 do_syscall_64+0x63/0x290
 entry_SYSCALL_64_after_hwframe+0x49/0xb3
 stack backtrace:
CPU: 5 PID: 1439 Comm: cat Not tainted 5.7.0-rc2 torvalds#400
Hardware name: Intel(R) Client Systems NUC8i7BEH/NUC8BEB, BIOS BECFL357.86A.0056.2018.1128.1717 11/28/2018
Call Trace:
 dump_stack+0x97/0xe0
 check_irq_usage.cold+0x428/0x434
 ? check_usage_forwards+0x2c0/0x2c0
 ? class_equal+0x11/0x20
 ? __bfs+0xd2/0x2d0
 ? in_any_class_list+0xa0/0xa0
 ? check_path+0x22/0x40
 ? check_noncircular+0x150/0x2b0
 ? print_circular_bug.isra.0+0x1b0/0x1b0
 ? mark_lock+0x13d/0xc50
 ? __lock_acquire+0x1e32/0x39b0
 __lock_acquire+0x1e32/0x39b0
 ? timerqueue_add+0xc1/0x130
 ? register_lock_class+0xa60/0xa60
 ? mark_lock+0x13d/0xc50
 lock_acquire+0x175/0x4e0
 ? __zone_pcp_update+0x80/0x80
 ? check_flags.part.0+0x210/0x210
 ? mark_held_locks+0x65/0x90
 ? _raw_spin_unlock_irqrestore+0x32/0x40
 ? lockdep_hardirqs_on+0x190/0x290
 ? fwtable_read32+0x163/0x480
 ? mmio_diff_handler+0xc0/0x150
 fs_reclaim_acquire.part.0+0x20/0x30
 ? __zone_pcp_update+0x80/0x80
 kmem_cache_alloc_trace+0x2e/0x260
 mmio_diff_handler+0xc0/0x150
 ? vgpu_mmio_diff_open+0x30/0x30
 intel_gvt_for_each_tracked_mmio+0x7b/0x140
 vgpu_mmio_diff_show+0x111/0x2e0
 ? mmio_diff_handler+0x150/0x150
 ? rcu_read_lock_sched_held+0xa0/0xb0
 ? rcu_read_lock_bh_held+0xc0/0xc0
 ? kasan_unpoison_shadow+0x33/0x40
 ? __kasan_kmalloc.constprop.0+0xc2/0xd0
 seq_read+0x242/0x680
 ? debugfs_locked_down.isra.0+0x70/0x70
 full_proxy_read+0x95/0xc0
 vfs_read+0xc2/0x1b0
 ksys_read+0xc4/0x160
 ? kernel_write+0xb0/0xb0
 ? mark_held_locks+0x24/0x90
 do_syscall_64+0x63/0x290
 entry_SYSCALL_64_after_hwframe+0x49/0xb3
RIP: 0033:0x7ffbe3e6efb2
Code: c0 e9 c2 fe ff ff 50 48 8d 3d ca cb 0a 00 e8 f5 19 02 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
RSP: 002b:00007ffd021c08a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007ffbe3e6efb2
RDX: 0000000000020000 RSI: 00007ffbe34cd000 RDI: 0000000000000003
RBP: 00007ffbe34cd000 R08: 00007ffbe34cc010 R09: 0000000000000000
R10: 0000000000000022 R11: 0000000000000246 R12: 0000562b6f0a11f0
R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000
------------[ cut here ]------------
Acked-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Colin Xu <colin.xu@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20200601035556.19999-1-colin.xu@intel.com 
pull bot pushed a commit that referenced this pull request Jul 7, 2020
...kernel/git/kvmarm/kvmarm into kvm-master
KVM/arm fixes for 5.8, take #2
- Make sure a vcpu becoming non-resident doesn't race against the doorbell delivery
- Only advertise pvtime if accounting is enabled
- Return the correct error code if reset fails with SVE
- Make sure that pseudo-NMI functions are annotated as __always_inline
pull bot pushed a commit that referenced this pull request Jul 11, 2020
Jakub Sitnicki says:
====================
This patch set prepares ground for link-based multi-prog attachment for
future netns attach types, with BPF_SK_LOOKUP attach type in mind [0].
Two changes are needed in order to attach and run a series of BPF programs:
 1) an bpf_prog_array of programs to run (patch #2), and
 2) a list of attached links to keep track of attachments (patch #3).
Nothing changes for BPF flow_dissector. Just as before only one program can
be attached to netns.
In v3 I've simplified patch #2 that introduces bpf_prog_array to take
advantage of the fact that it will hold at most one program for now.
In particular, I'm no longer using bpf_prog_array_copy. It turned out to be
less suitable for link operations than I thought as it fails to append the
same BPF program.
bpf_prog_array_replace_item is also gone, because we know we always want to
replace the first element in prog_array.
Naturally the code that handles bpf_prog_array will need change once
more when there is a program type that allows multi-prog attachment. But I
feel it will be better to do it gradually and present it together with
tests that actually exercise multi-prog code paths.
[0] https://lore.kernel.org/bpf/20200511185218.1422406-1-jakub@cloudflare.com/
v2 -> v3:
- Don't check if run_array is null in link update callback. (Martin)
- Allow updating the link with the same BPF program. (Andrii)
- Add patch #4 with a test for the above case.
- Kill bpf_prog_array_replace_item. Access the run_array directly.
- Switch from bpf_prog_array_copy() to bpf_prog_array_alloc(1, ...).
- Replace rcu_deref_protected & RCU_INIT_POINTER with rcu_replace_pointer.
- Drop Andrii's Ack from patch #2. Code changed.
v1 -> v2:
- Show with a (void) cast that bpf_prog_array_replace_item() return value
 is ignored on purpose. (Andrii)
- Explain why bpf-cgroup cannot replace programs in bpf_prog_array based
 on bpf_prog pointer comparison in patch #2 description. (Andrii)
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
pull bot pushed a commit that referenced this pull request Jul 11, 2020
Ido Schimmel says:
====================
mlxsw: Various fixes
Fix two issues found by syzkaller.
Patch #1 removes inappropriate usage of WARN_ON() following memory
allocation failure. Constantly triggered when syzkaller injects faults.
Patch #2 fixes a use-after-free that can be triggered by 'devlink dev
info' following a failed devlink reload.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
pull bot pushed a commit that referenced this pull request Jul 12, 2020
In BRM_status_show(), if the condition "!ioc->is_warpdrive" tested on entry
to the function is true, a "goto out" is called. This results in unlocking
ioc->pci_access_mutex without this mutex lock being taken. This generates
the following splat:
[ 1148.539883] mpt3sas_cm2: BRM_status_show: BRM attribute is only for warpdrive
[ 1148.547184]
[ 1148.548708] =====================================
[ 1148.553501] WARNING: bad unlock balance detected!
[ 1148.558277] 5.8.0-rc3+ torvalds#827 Not tainted
[ 1148.562183] -------------------------------------
[ 1148.566959] cat/5008 is trying to release lock (&ioc->pci_access_mutex) at:
[ 1148.574035] [<ffffffffc070b7a3>] BRM_status_show+0xd3/0x100 [mpt3sas]
[ 1148.580574] but there are no more locks to release!
[ 1148.585524]
[ 1148.585524] other info that might help us debug this:
[ 1148.599624] 3 locks held by cat/5008:
[ 1148.607085] #0: ffff92aea3e392c0 (&p->lock){+.+.}-{3:3}, at: seq_read+0x34/0x480
[ 1148.618509] #1: ffff922ef14c4888 (&of->mutex){+.+.}-{3:3}, at: kernfs_seq_start+0x2a/0xb0
[ 1148.630729] #2: ffff92aedb5d7310 (kn->active#224){.+.+}-{0:0}, at: kernfs_seq_start+0x32/0xb0
[ 1148.643347]
[ 1148.643347] stack backtrace:
[ 1148.655259] CPU: 73 PID: 5008 Comm: cat Not tainted 5.8.0-rc3+ torvalds#827
[ 1148.665309] Hardware name: HGST H4060-S/S2600STB, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
[ 1148.678394] Call Trace:
[ 1148.684750] dump_stack+0x78/0xa0
[ 1148.691802] lock_release.cold+0x45/0x4a
[ 1148.699451] __mutex_unlock_slowpath+0x35/0x270
[ 1148.707675] BRM_status_show+0xd3/0x100 [mpt3sas]
[ 1148.716092] dev_attr_show+0x19/0x40
[ 1148.723664] sysfs_kf_seq_show+0x87/0x100
[ 1148.731193] seq_read+0xbc/0x480
[ 1148.737882] vfs_read+0xa0/0x160
[ 1148.744514] ksys_read+0x58/0xd0
[ 1148.751129] do_syscall_64+0x4c/0xa0
[ 1148.757941] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1148.766240] RIP: 0033:0x7f1230566542
[ 1148.772957] Code: Bad RIP value.
[ 1148.779206] RSP: 002b:00007ffeac1bcac8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 1148.790063] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f1230566542
[ 1148.800284] RDX: 0000000000020000 RSI: 00007f1223460000 RDI: 0000000000000003
[ 1148.810474] RBP: 00007f1223460000 R08: 00007f122345f010 R09: 0000000000000000
[ 1148.820641] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000000000
[ 1148.830728] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000
Fix this by returning immediately instead of jumping to the out label.
Link: https://lore.kernel.org/r/20200701085254.51740-1-damien.lemoal@wdc.com
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Acked-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
pull bot pushed a commit that referenced this pull request Jul 14, 2020
In pci_disable_sriov(), i.e.,
 # echo 0 > /sys/class/net/enp11s0f1np1/device/sriov_numvfs
iommu_release_device
 iommu_group_remove_device
 arm_smmu_domain_free
 kfree(smmu_domain)
Later,
iommu_release_device
 arm_smmu_release_device
 arm_smmu_detach_dev
 spin_lock_irqsave(&smmu_domain->devices_lock,
would trigger an use-after-free. Fixed it by call
arm_smmu_release_device() first before iommu_group_remove_device().
 BUG: KASAN: use-after-free in __lock_acquire+0x3458/0x4440
 __lock_acquire at kernel/locking/lockdep.c:4250
 Read of size 8 at addr ffff0089df1a6f68 by task bash/3356
 CPU: 5 PID: 3356 Comm: bash Not tainted 5.8.0-rc3-next-20200630 #2
 Hardware name: HPE Apollo 70 /C01_APACHE_MB , BIOS L50_5.13_1.11 06/18/2019
 Call trace:
 dump_backtrace+0x0/0x398
 show_stack+0x14/0x20
 dump_stack+0x140/0x1b8
 print_address_description.isra.12+0x54/0x4a8
 kasan_report+0x134/0x1b8
 __asan_report_load8_noabort+0x2c/0x50
 __lock_acquire+0x3458/0x4440
 lock_acquire+0x204/0xf10
 _raw_spin_lock_irqsave+0xf8/0x180
 arm_smmu_detach_dev+0xd8/0x4a0
 arm_smmu_detach_dev at drivers/iommu/arm-smmu-v3.c:2776
 arm_smmu_release_device+0xb4/0x1c8
 arm_smmu_disable_pasid at drivers/iommu/arm-smmu-v3.c:2754
 (inlined by) arm_smmu_release_device at drivers/iommu/arm-smmu-v3.c:3000
 iommu_release_device+0xc0/0x178
 iommu_release_device at drivers/iommu/iommu.c:302
 iommu_bus_notifier+0x118/0x160
 notifier_call_chain+0xa4/0x128
 __blocking_notifier_call_chain+0x70/0xa8
 blocking_notifier_call_chain+0x14/0x20
 device_del+0x618/0xa00
 pci_remove_bus_device+0x108/0x2d8
 pci_stop_and_remove_bus_device+0x1c/0x28
 pci_iov_remove_virtfn+0x228/0x368
 sriov_disable+0x8c/0x348
 pci_disable_sriov+0x5c/0x70
 mlx5_core_sriov_configure+0xd8/0x260 [mlx5_core]
 sriov_numvfs_store+0x240/0x318
 dev_attr_store+0x38/0x68
 sysfs_kf_write+0xdc/0x128
 kernfs_fop_write+0x23c/0x448
 __vfs_write+0x54/0xe8
 vfs_write+0x124/0x3f0
 ksys_write+0xe8/0x1b8
 __arm64_sys_write+0x68/0x98
 do_el0_svc+0x124/0x220
 el0_sync_handler+0x260/0x408
 el0_sync+0x140/0x180
 Allocated by task 3356:
 save_stack+0x24/0x50
 __kasan_kmalloc.isra.13+0xc4/0xe0
 kasan_kmalloc+0xc/0x18
 kmem_cache_alloc_trace+0x1ec/0x318
 arm_smmu_domain_alloc+0x54/0x148
 iommu_group_alloc_default_domain+0xc0/0x440
 iommu_probe_device+0x1c0/0x308
 iort_iommu_configure+0x434/0x518
 acpi_dma_configure+0xf0/0x128
 pci_dma_configure+0x114/0x160
 really_probe+0x124/0x6d8
 driver_probe_device+0xc4/0x180
 __device_attach_driver+0x184/0x1e8
 bus_for_each_drv+0x114/0x1a0
 __device_attach+0x19c/0x2a8
 device_attach+0x10/0x18
 pci_bus_add_device+0x70/0xf8
 pci_iov_add_virtfn+0x7b4/0xb40
 sriov_enable+0x5c8/0xc30
 pci_enable_sriov+0x64/0x80
 mlx5_core_sriov_configure+0x58/0x260 [mlx5_core]
 sriov_numvfs_store+0x1c0/0x318
 dev_attr_store+0x38/0x68
 sysfs_kf_write+0xdc/0x128
 kernfs_fop_write+0x23c/0x448
 __vfs_write+0x54/0xe8
 vfs_write+0x124/0x3f0
 ksys_write+0xe8/0x1b8
 __arm64_sys_write+0x68/0x98
 do_el0_svc+0x124/0x220
 el0_sync_handler+0x260/0x408
 el0_sync+0x140/0x180
 Freed by task 3356:
 save_stack+0x24/0x50
 __kasan_slab_free+0x124/0x198
 kasan_slab_free+0x10/0x18
 slab_free_freelist_hook+0x110/0x298
 kfree+0x128/0x668
 arm_smmu_domain_free+0xf4/0x1a0
 iommu_group_release+0xec/0x160
 kobject_put+0xf4/0x238
 kobject_del+0x110/0x190
 kobject_put+0x1e4/0x238
 iommu_group_remove_device+0x394/0x938
 iommu_release_device+0x9c/0x178
 iommu_release_device at drivers/iommu/iommu.c:300
 iommu_bus_notifier+0x118/0x160
 notifier_call_chain+0xa4/0x128
 __blocking_notifier_call_chain+0x70/0xa8
 blocking_notifier_call_chain+0x14/0x20
 device_del+0x618/0xa00
 pci_remove_bus_device+0x108/0x2d8
 pci_stop_and_remove_bus_device+0x1c/0x28
 pci_iov_remove_virtfn+0x228/0x368
 sriov_disable+0x8c/0x348
 pci_disable_sriov+0x5c/0x70
 mlx5_core_sriov_configure+0xd8/0x260 [mlx5_core]
 sriov_numvfs_store+0x240/0x318
 dev_attr_store+0x38/0x68
 sysfs_kf_write+0xdc/0x128
 kernfs_fop_write+0x23c/0x448
 __vfs_write+0x54/0xe8
 vfs_write+0x124/0x3f0
 ksys_write+0xe8/0x1b8
 __arm64_sys_write+0x68/0x98
 do_el0_svc+0x124/0x220
 el0_sync_handler+0x260/0x408
 el0_sync+0x140/0x180
 The buggy address belongs to the object at ffff0089df1a6e00
 which belongs to the cache kmalloc-512 of size 512
 The buggy address is located 360 bytes inside of
 512-byte region [ffff0089df1a6e00, ffff0089df1a7000)
 The buggy address belongs to the page:
 page:ffffffe02257c680 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff0089df1a1400
 flags: 0x7ffff800000200(slab)
 raw: 007ffff800000200 ffffffe02246b8c8 ffffffe02257ff88 ffff000000320680
 raw: ffff0089df1a1400 00000000002a000e 00000001ffffffff ffff0089df1a5001
 page dumped because: kasan: bad access detected
 page->mem_cgroup:ffff0089df1a5001
 Memory state around the buggy address:
 ffff0089df1a6e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff0089df1a6e80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 >ffff0089df1a6f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ^
 ffff0089df1a6f80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff0089df1a7000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
Fixes: a6a4c7e ("iommu: Add probe_device() and release_device() call-backs")
Signed-off-by: Qian Cai <cai@lca.pw>
Link: https://lore.kernel.org/r/20200704001003.2303-1-cai@lca.pw
Signed-off-by: Joerg Roedel <jroedel@suse.de>
pull bot pushed a commit that referenced this pull request Jul 16, 2020
devm_gpiod_get_index() doesn't return NULL but -ENOENT when the
requested GPIO doesn't exist, leading to the following messages:
[ 2.742468] gpiod_direction_input: invalid GPIO (errorpointer)
[ 2.748147] can't set direction for gpio #2: -2
[ 2.753081] gpiod_direction_input: invalid GPIO (errorpointer)
[ 2.758724] can't set direction for gpio #3: -2
[ 2.763666] gpiod_direction_output: invalid GPIO (errorpointer)
[ 2.769394] can't set direction for gpio #4: -2
[ 2.774341] gpiod_direction_input: invalid GPIO (errorpointer)
[ 2.779981] can't set direction for gpio #5: -2
[ 2.784545] ff000a20.serial: ttyCPM1 at MMIO 0xfff00a20 (irq = 39, base_baud = 8250000) is a CPM UART
Use devm_gpiod_get_index_optional() instead.
At the same time, handle the error case and properly exit
with an error.
Fixes: 97cbaf2 ("tty: serial: cpm_uart: Convert to use GPIO descriptors")
Cc: stable@vger.kernel.org
Cc: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/694a25fdce548c5ee8b060ef6a4b02746b8f25c0.1591986307.git.christophe.leroy@csgroup.eu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
pull bot pushed a commit that referenced this pull request Jul 31, 2020
This patch fixes a race condition that causes a use-after-free during
amdgpu_dm_atomic_commit_tail. This can occur when 2 non-blocking commits
are requested and the second one finishes before the first. Essentially,
this bug occurs when the following sequence of events happens:
1. Non-blocking commit #1 is requested w/ a new dm_state #1 and is
deferred to the workqueue.
2. Non-blocking commit #2 is requested w/ a new dm_state #2 and is
deferred to the workqueue.
3. Commit #2 starts before commit #1, dm_state #1 is used in the
commit_tail and commit #2 completes, freeing dm_state #1.
4. Commit #1 starts after commit #2 completes, uses the freed dm_state
1 and dereferences a freelist pointer while setting the context.
Since this bug has only been spotted with fast commits, this patch fixes
the bug by clearing the dm_state instead of using the old dc_state for
fast updates. In addition, since dm_state is only used for its dc_state
and amdgpu_dm_atomic_commit_tail will retain the dc_state if none is found,
removing the dm_state should not have any consequences in fast updates.
This use-after-free bug has existed for a while now, but only caused a
noticeable issue starting from 5.7-rc1 due to 3202fa6 ("slub: relocate
freelist pointer to middle of object") moving the freelist pointer from
dm_state->base (which was unused) to dm_state->context (which is
dereferenced).
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=207383
Fixes: bd200d1 ("drm/amd/display: Don't replace the dc_state for fast updates")
Reported-by: Duncan <1i5t5.duncan@cox.net>
Signed-off-by: Mazin Rezk <mnrzk@protonmail.com>
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
pull bot pushed a commit that referenced this pull request Aug 2, 2020
Ido Schimmel says:
====================
mlxsw fixes
This patch set contains various fixes for mlxsw.
Patches #1-#2 fix two trap related issues introduced in previous cycle.
Patches #3-#5 fix rare use-after-frees discovered by syzkaller. After
over a week of fuzzing with the fixes, the bugs did not reproduce.
Patch #6 from Amit fixes an issue in the ethtool selftest that was
recently discovered after running the test on a new platform that
supports only 1Gbps and 10Gbps speeds.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
pull bot pushed a commit that referenced this pull request Aug 2, 2020
I compiled with AddressSanitizer and I had these memory leaks while I
was using the tep_parse_format function:
 Direct leak of 28 byte(s) in 4 object(s) allocated from:
 #0 0x7fb07db49ffe in __interceptor_realloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dffe)
 #1 0x7fb07a724228 in extend_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:985
 #2 0x7fb07a724c21 in __read_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1140
 #3 0x7fb07a724f78 in read_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1206
 #4 0x7fb07a725191 in __read_expect_type /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1291
 #5 0x7fb07a7251df in read_expect_type /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1299
 #6 0x7fb07a72e6c8 in process_dynamic_array_len /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:2849
 #7 0x7fb07a7304b8 in process_function /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3161
 #8 0x7fb07a730900 in process_arg_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3207
 #9 0x7fb07a727c0b in process_arg /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1786
 #10 0x7fb07a731080 in event_read_print_args /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3285
 #11 0x7fb07a731722 in event_read_print /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3369
 #12 0x7fb07a740054 in __tep_parse_format /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6335
 #13 0x7fb07a74047a in __parse_event /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6389
 #14 0x7fb07a740536 in tep_parse_format /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6431
 #15 0x7fb07a785acf in parse_event ../../../src/fs-src/fs.c:251
 #16 0x7fb07a785ccd in parse_systems ../../../src/fs-src/fs.c:284
 #17 0x7fb07a786fb3 in read_metadata ../../../src/fs-src/fs.c:593
 #18 0x7fb07a78760e in ftrace_fs_source_init ../../../src/fs-src/fs.c:727
 #19 0x7fb07d90c19c in add_component_with_init_method_data ../../../../src/lib/graph/graph.c:1048
 #20 0x7fb07d90c87b in add_source_component_with_initialize_method_data ../../../../src/lib/graph/graph.c:1127
 #21 0x7fb07d90c92a in bt_graph_add_source_component ../../../../src/lib/graph/graph.c:1152
 #22 0x55db11aa632e in cmd_run_ctx_create_components_from_config_components ../../../src/cli/babeltrace2.c:2252
 #23 0x55db11aa6fda in cmd_run_ctx_create_components ../../../src/cli/babeltrace2.c:2347
 #24 0x55db11aa780c in cmd_run ../../../src/cli/babeltrace2.c:2461
 #25 0x55db11aa8a7d in main ../../../src/cli/babeltrace2.c:2673
 #26 0x7fb07d5460b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)
The token variable in the process_dynamic_array_len function is
allocated in the read_expect_type function, but is not freed before
calling the read_token function.
Free the token variable before calling read_token in order to plug the
leak.
Signed-off-by: Philippe Duplessis-Guindon <pduplessis@efficios.com>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lore.kernel.org/linux-trace-devel/20200730150236.5392-1-pduplessis@efficios.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
pull bot pushed a commit that referenced this pull request Aug 3, 2020
Dave hit this splat during testing btrfs/078:
 ======================================================
 WARNING: possible circular locking dependency detected
 5.8.0-rc6-default+ torvalds#1191 Not tainted
 ------------------------------------------------------
 kswapd0/75 is trying to acquire lock:
 ffffa040e9d04ff8 (&delayed_node->mutex){+.+.}-{3:3}, at: __btrfs_release_delayed_node.part.0+0x3f/0x310 [btrfs]
 but task is already holding lock:
 ffffffff8b0c8040 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30
 which lock already depends on the new lock.
 the existing dependency chain (in reverse order) is:
 -> #2 (fs_reclaim){+.+.}-{0:0}:
	 __lock_acquire+0x56f/0xaa0
	 lock_acquire+0xa3/0x440
	 fs_reclaim_acquire.part.0+0x25/0x30
	 __kmalloc_track_caller+0x49/0x330
	 kstrdup+0x2e/0x60
	 __kernfs_new_node.constprop.0+0x44/0x250
	 kernfs_new_node+0x25/0x50
	 kernfs_create_link+0x34/0xa0
	 sysfs_do_create_link_sd+0x5e/0xd0
	 btrfs_sysfs_add_devices_dir+0x65/0x100 [btrfs]
	 btrfs_init_new_device+0x44c/0x12b0 [btrfs]
	 btrfs_ioctl+0xc3c/0x25c0 [btrfs]
	 ksys_ioctl+0x68/0xa0
	 __x64_sys_ioctl+0x16/0x20
	 do_syscall_64+0x50/0xe0
	 entry_SYSCALL_64_after_hwframe+0x44/0xa9
 -> #1 (&fs_info->chunk_mutex){+.+.}-{3:3}:
	 __lock_acquire+0x56f/0xaa0
	 lock_acquire+0xa3/0x440
	 __mutex_lock+0xa0/0xaf0
	 btrfs_chunk_alloc+0x137/0x3e0 [btrfs]
	 find_free_extent+0xb44/0xfb0 [btrfs]
	 btrfs_reserve_extent+0x9b/0x180 [btrfs]
	 btrfs_alloc_tree_block+0xc1/0x350 [btrfs]
	 alloc_tree_block_no_bg_flush+0x4a/0x60 [btrfs]
	 __btrfs_cow_block+0x143/0x7a0 [btrfs]
	 btrfs_cow_block+0x15f/0x310 [btrfs]
	 push_leaf_right+0x150/0x240 [btrfs]
	 split_leaf+0x3cd/0x6d0 [btrfs]
	 btrfs_search_slot+0xd14/0xf70 [btrfs]
	 btrfs_insert_empty_items+0x64/0xc0 [btrfs]
	 __btrfs_commit_inode_delayed_items+0xb2/0x840 [btrfs]
	 btrfs_async_run_delayed_root+0x10e/0x1d0 [btrfs]
	 btrfs_work_helper+0x2f9/0x650 [btrfs]
	 process_one_work+0x22c/0x600
	 worker_thread+0x50/0x3b0
	 kthread+0x137/0x150
	 ret_from_fork+0x1f/0x30
 -> #0 (&delayed_node->mutex){+.+.}-{3:3}:
	 check_prev_add+0x98/0xa20
	 validate_chain+0xa8c/0x2a00
	 __lock_acquire+0x56f/0xaa0
	 lock_acquire+0xa3/0x440
	 __mutex_lock+0xa0/0xaf0
	 __btrfs_release_delayed_node.part.0+0x3f/0x310 [btrfs]
	 btrfs_evict_inode+0x3bf/0x560 [btrfs]
	 evict+0xd6/0x1c0
	 dispose_list+0x48/0x70
	 prune_icache_sb+0x54/0x80
	 super_cache_scan+0x121/0x1a0
	 do_shrink_slab+0x175/0x420
	 shrink_slab+0xb1/0x2e0
	 shrink_node+0x192/0x600
	 balance_pgdat+0x31f/0x750
	 kswapd+0x206/0x510
	 kthread+0x137/0x150
	 ret_from_fork+0x1f/0x30
 other info that might help us debug this:
 Chain exists of:
 &delayed_node->mutex --> &fs_info->chunk_mutex --> fs_reclaim
 Possible unsafe locking scenario:
	 CPU0 CPU1
	 ---- ----
 lock(fs_reclaim);
				 lock(&fs_info->chunk_mutex);
				 lock(fs_reclaim);
 lock(&delayed_node->mutex);
 *** DEADLOCK ***
 3 locks held by kswapd0/75:
 #0: ffffffff8b0c8040 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30
 #1: ffffffff8b0b50b8 (shrinker_rwsem){++++}-{3:3}, at: shrink_slab+0x54/0x2e0
 #2: ffffa040e057c0e8 (&type->s_umount_key#26){++++}-{3:3}, at: trylock_super+0x16/0x50
 stack backtrace:
 CPU: 2 PID: 75 Comm: kswapd0 Not tainted 5.8.0-rc6-default+ torvalds#1191
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
 Call Trace:
 dump_stack+0x78/0xa0
 check_noncircular+0x16f/0x190
 check_prev_add+0x98/0xa20
 validate_chain+0xa8c/0x2a00
 __lock_acquire+0x56f/0xaa0
 lock_acquire+0xa3/0x440
 ? __btrfs_release_delayed_node.part.0+0x3f/0x310 [btrfs]
 __mutex_lock+0xa0/0xaf0
 ? __btrfs_release_delayed_node.part.0+0x3f/0x310 [btrfs]
 ? __lock_acquire+0x56f/0xaa0
 ? __btrfs_release_delayed_node.part.0+0x3f/0x310 [btrfs]
 ? lock_acquire+0xa3/0x440
 ? btrfs_evict_inode+0x138/0x560 [btrfs]
 ? btrfs_evict_inode+0x2fe/0x560 [btrfs]
 ? __btrfs_release_delayed_node.part.0+0x3f/0x310 [btrfs]
 __btrfs_release_delayed_node.part.0+0x3f/0x310 [btrfs]
 btrfs_evict_inode+0x3bf/0x560 [btrfs]
 evict+0xd6/0x1c0
 dispose_list+0x48/0x70
 prune_icache_sb+0x54/0x80
 super_cache_scan+0x121/0x1a0
 do_shrink_slab+0x175/0x420
 shrink_slab+0xb1/0x2e0
 shrink_node+0x192/0x600
 balance_pgdat+0x31f/0x750
 kswapd+0x206/0x510
 ? _raw_spin_unlock_irqrestore+0x3e/0x50
 ? finish_wait+0x90/0x90
 ? balance_pgdat+0x750/0x750
 kthread+0x137/0x150
 ? kthread_stop+0x2a0/0x2a0
 ret_from_fork+0x1f/0x30
This is because we're holding the chunk_mutex while adding this device
and adding its sysfs entries. We actually hold different locks in
different places when calling this function, the dev_replace semaphore
for instance in dev replace, so instead of moving this call around
simply wrap it's operations in NOFS.
CC: stable@vger.kernel.org # 4.14+
Reported-by: David Sterba <dsterba@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
pull bot pushed a commit that referenced this pull request Aug 3, 2020
There's long existed a lockdep splat because we open our bdev's under
the ->device_list_mutex at mount time, which acquires the bd_mutex.
Usually this goes unnoticed, but if you do loopback devices at all
suddenly the bd_mutex comes with a whole host of other dependencies,
which results in the splat when you mount a btrfs file system.
======================================================
WARNING: possible circular locking dependency detected
5.8.0-0.rc3.1.fc33.x86_64+debug #1 Not tainted
------------------------------------------------------
systemd-journal/509 is trying to acquire lock:
ffff970831f84db0 (&fs_info->reloc_mutex){+.+.}-{3:3}, at: btrfs_record_root_in_trans+0x44/0x70 [btrfs]
but task is already holding lock:
ffff97083144d598 (sb_pagefaults){.+.+}-{0:0}, at: btrfs_page_mkwrite+0x59/0x560 [btrfs]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
 -> #6 (sb_pagefaults){.+.+}-{0:0}:
 __sb_start_write+0x13e/0x220
 btrfs_page_mkwrite+0x59/0x560 [btrfs]
 do_page_mkwrite+0x4f/0x130
 do_wp_page+0x3b0/0x4f0
 handle_mm_fault+0xf47/0x1850
 do_user_addr_fault+0x1fc/0x4b0
 exc_page_fault+0x88/0x300
 asm_exc_page_fault+0x1e/0x30
 -> #5 (&mm->mmap_lock#2){++++}-{3:3}:
 __might_fault+0x60/0x80
 _copy_from_user+0x20/0xb0
 get_sg_io_hdr+0x9a/0xb0
 scsi_cmd_ioctl+0x1ea/0x2f0
 cdrom_ioctl+0x3c/0x12b4
 sr_block_ioctl+0xa4/0xd0
 block_ioctl+0x3f/0x50
 ksys_ioctl+0x82/0xc0
 __x64_sys_ioctl+0x16/0x20
 do_syscall_64+0x52/0xb0
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
 -> #4 (&cd->lock){+.+.}-{3:3}:
 __mutex_lock+0x7b/0x820
 sr_block_open+0xa2/0x180
 __blkdev_get+0xdd/0x550
 blkdev_get+0x38/0x150
 do_dentry_open+0x16b/0x3e0
 path_openat+0x3c9/0xa00
 do_filp_open+0x75/0x100
 do_sys_openat2+0x8a/0x140
 __x64_sys_openat+0x46/0x70
 do_syscall_64+0x52/0xb0
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
 -> #3 (&bdev->bd_mutex){+.+.}-{3:3}:
 __mutex_lock+0x7b/0x820
 __blkdev_get+0x6a/0x550
 blkdev_get+0x85/0x150
 blkdev_get_by_path+0x2c/0x70
 btrfs_get_bdev_and_sb+0x1b/0xb0 [btrfs]
 open_fs_devices+0x88/0x240 [btrfs]
 btrfs_open_devices+0x92/0xa0 [btrfs]
 btrfs_mount_root+0x250/0x490 [btrfs]
 legacy_get_tree+0x30/0x50
 vfs_get_tree+0x28/0xc0
 vfs_kern_mount.part.0+0x71/0xb0
 btrfs_mount+0x119/0x380 [btrfs]
 legacy_get_tree+0x30/0x50
 vfs_get_tree+0x28/0xc0
 do_mount+0x8c6/0xca0
 __x64_sys_mount+0x8e/0xd0
 do_syscall_64+0x52/0xb0
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
 -> #2 (&fs_devs->device_list_mutex){+.+.}-{3:3}:
 __mutex_lock+0x7b/0x820
 btrfs_run_dev_stats+0x36/0x420 [btrfs]
 commit_cowonly_roots+0x91/0x2d0 [btrfs]
 btrfs_commit_transaction+0x4e6/0x9f0 [btrfs]
 btrfs_sync_file+0x38a/0x480 [btrfs]
 __x64_sys_fdatasync+0x47/0x80
 do_syscall_64+0x52/0xb0
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
 -> #1 (&fs_info->tree_log_mutex){+.+.}-{3:3}:
 __mutex_lock+0x7b/0x820
 btrfs_commit_transaction+0x48e/0x9f0 [btrfs]
 btrfs_sync_file+0x38a/0x480 [btrfs]
 __x64_sys_fdatasync+0x47/0x80
 do_syscall_64+0x52/0xb0
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
 -> #0 (&fs_info->reloc_mutex){+.+.}-{3:3}:
 __lock_acquire+0x1241/0x20c0
 lock_acquire+0xb0/0x400
 __mutex_lock+0x7b/0x820
 btrfs_record_root_in_trans+0x44/0x70 [btrfs]
 start_transaction+0xd2/0x500 [btrfs]
 btrfs_dirty_inode+0x44/0xd0 [btrfs]
 file_update_time+0xc6/0x120
 btrfs_page_mkwrite+0xda/0x560 [btrfs]
 do_page_mkwrite+0x4f/0x130
 do_wp_page+0x3b0/0x4f0
 handle_mm_fault+0xf47/0x1850
 do_user_addr_fault+0x1fc/0x4b0
 exc_page_fault+0x88/0x300
 asm_exc_page_fault+0x1e/0x30
other info that might help us debug this:
Chain exists of:
 &fs_info->reloc_mutex --> &mm->mmap_lock#2 --> sb_pagefaults
Possible unsafe locking scenario:
 CPU0 CPU1
 ---- ----
 lock(sb_pagefaults);
 lock(&mm->mmap_lock#2);
 lock(sb_pagefaults);
 lock(&fs_info->reloc_mutex);
 *** DEADLOCK ***
3 locks held by systemd-journal/509:
 #0: ffff97083bdec8b8 (&mm->mmap_lock#2){++++}-{3:3}, at: do_user_addr_fault+0x12e/0x4b0
 #1: ffff97083144d598 (sb_pagefaults){.+.+}-{0:0}, at: btrfs_page_mkwrite+0x59/0x560 [btrfs]
 #2: ffff97083144d6a8 (sb_internal){.+.+}-{0:0}, at: start_transaction+0x3f8/0x500 [btrfs]
stack backtrace:
CPU: 0 PID: 509 Comm: systemd-journal Not tainted 5.8.0-0.rc3.1.fc33.x86_64+debug #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
Call Trace:
 dump_stack+0x92/0xc8
 check_noncircular+0x134/0x150
 __lock_acquire+0x1241/0x20c0
 lock_acquire+0xb0/0x400
 ? btrfs_record_root_in_trans+0x44/0x70 [btrfs]
 ? lock_acquire+0xb0/0x400
 ? btrfs_record_root_in_trans+0x44/0x70 [btrfs]
 __mutex_lock+0x7b/0x820
 ? btrfs_record_root_in_trans+0x44/0x70 [btrfs]
 ? kvm_sched_clock_read+0x14/0x30
 ? sched_clock+0x5/0x10
 ? sched_clock_cpu+0xc/0xb0
 btrfs_record_root_in_trans+0x44/0x70 [btrfs]
 start_transaction+0xd2/0x500 [btrfs]
 btrfs_dirty_inode+0x44/0xd0 [btrfs]
 file_update_time+0xc6/0x120
 btrfs_page_mkwrite+0xda/0x560 [btrfs]
 ? sched_clock+0x5/0x10
 do_page_mkwrite+0x4f/0x130
 do_wp_page+0x3b0/0x4f0
 handle_mm_fault+0xf47/0x1850
 do_user_addr_fault+0x1fc/0x4b0
 exc_page_fault+0x88/0x300
 ? asm_exc_page_fault+0x8/0x30
 asm_exc_page_fault+0x1e/0x30
RIP: 0033:0x7fa3972fdbfe
Code: Bad RIP value.
Fix this by not holding the ->device_list_mutex at this point. The
device_list_mutex exists to protect us from modifying the device list
while the file system is running.
However it can also be modified by doing a scan on a device. But this
action is specifically protected by the uuid_mutex, which we are holding
here. We cannot race with opening at this point because we have the
->s_mount lock held during the mount. Not having the
->device_list_mutex here is perfectly safe as we're not going to change
the devices at this point.
CC: stable@vger.kernel.org # 4.19+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ add some comments ]
Signed-off-by: David Sterba <dsterba@suse.com>
pull bot pushed a commit that referenced this pull request Aug 3, 2020
We are currently getting this lockdep splat in btrfs/161:
 ======================================================
 WARNING: possible circular locking dependency detected
 5.8.0-rc5+ #20 Tainted: G E
 ------------------------------------------------------
 mount/678048 is trying to acquire lock:
 ffff9b769f15b6e0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: clone_fs_devices+0x4d/0x170 [btrfs]
 but task is already holding lock:
 ffff9b76abdb08d0 (&fs_info->chunk_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0x6a/0x800 [btrfs]
 which lock already depends on the new lock.
 the existing dependency chain (in reverse order) is:
 -> #1 (&fs_info->chunk_mutex){+.+.}-{3:3}:
	 __mutex_lock+0x8b/0x8f0
	 btrfs_init_new_device+0x2d2/0x1240 [btrfs]
	 btrfs_ioctl+0x1de/0x2d20 [btrfs]
	 ksys_ioctl+0x87/0xc0
	 __x64_sys_ioctl+0x16/0x20
	 do_syscall_64+0x52/0xb0
	 entry_SYSCALL_64_after_hwframe+0x44/0xa9
 -> #0 (&fs_devs->device_list_mutex){+.+.}-{3:3}:
	 __lock_acquire+0x1240/0x2460
	 lock_acquire+0xab/0x360
	 __mutex_lock+0x8b/0x8f0
	 clone_fs_devices+0x4d/0x170 [btrfs]
	 btrfs_read_chunk_tree+0x330/0x800 [btrfs]
	 open_ctree+0xb7c/0x18ce [btrfs]
	 btrfs_mount_root.cold+0x13/0xfa [btrfs]
	 legacy_get_tree+0x30/0x50
	 vfs_get_tree+0x28/0xc0
	 fc_mount+0xe/0x40
	 vfs_kern_mount.part.0+0x71/0x90
	 btrfs_mount+0x13b/0x3e0 [btrfs]
	 legacy_get_tree+0x30/0x50
	 vfs_get_tree+0x28/0xc0
	 do_mount+0x7de/0xb30
	 __x64_sys_mount+0x8e/0xd0
	 do_syscall_64+0x52/0xb0
	 entry_SYSCALL_64_after_hwframe+0x44/0xa9
 other info that might help us debug this:
 Possible unsafe locking scenario:
	 CPU0 CPU1
	 ---- ----
 lock(&fs_info->chunk_mutex);
				 lock(&fs_devs->device_list_mutex);
				 lock(&fs_info->chunk_mutex);
 lock(&fs_devs->device_list_mutex);
 *** DEADLOCK ***
 3 locks held by mount/678048:
 #0: ffff9b75ff5fb0e0 (&type->s_umount_key#63/1){+.+.}-{3:3}, at: alloc_super+0xb5/0x380
 #1: ffffffffc0c2fbc8 (uuid_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0x54/0x800 [btrfs]
 #2: ffff9b76abdb08d0 (&fs_info->chunk_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0x6a/0x800 [btrfs]
 stack backtrace:
 CPU: 2 PID: 678048 Comm: mount Tainted: G E 5.8.0-rc5+ #20
 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./890FX Deluxe5, BIOS P1.40 05/03/2011
 Call Trace:
 dump_stack+0x96/0xd0
 check_noncircular+0x162/0x180
 __lock_acquire+0x1240/0x2460
 ? asm_sysvec_apic_timer_interrupt+0x12/0x20
 lock_acquire+0xab/0x360
 ? clone_fs_devices+0x4d/0x170 [btrfs]
 __mutex_lock+0x8b/0x8f0
 ? clone_fs_devices+0x4d/0x170 [btrfs]
 ? rcu_read_lock_sched_held+0x52/0x60
 ? cpumask_next+0x16/0x20
 ? module_assert_mutex_or_preempt+0x14/0x40
 ? __module_address+0x28/0xf0
 ? clone_fs_devices+0x4d/0x170 [btrfs]
 ? static_obj+0x4f/0x60
 ? lockdep_init_map_waits+0x43/0x200
 ? clone_fs_devices+0x4d/0x170 [btrfs]
 clone_fs_devices+0x4d/0x170 [btrfs]
 btrfs_read_chunk_tree+0x330/0x800 [btrfs]
 open_ctree+0xb7c/0x18ce [btrfs]
 ? super_setup_bdi_name+0x79/0xd0
 btrfs_mount_root.cold+0x13/0xfa [btrfs]
 ? vfs_parse_fs_string+0x84/0xb0
 ? rcu_read_lock_sched_held+0x52/0x60
 ? kfree+0x2b5/0x310
 legacy_get_tree+0x30/0x50
 vfs_get_tree+0x28/0xc0
 fc_mount+0xe/0x40
 vfs_kern_mount.part.0+0x71/0x90
 btrfs_mount+0x13b/0x3e0 [btrfs]
 ? cred_has_capability+0x7c/0x120
 ? rcu_read_lock_sched_held+0x52/0x60
 ? legacy_get_tree+0x30/0x50
 legacy_get_tree+0x30/0x50
 vfs_get_tree+0x28/0xc0
 do_mount+0x7de/0xb30
 ? memdup_user+0x4e/0x90
 __x64_sys_mount+0x8e/0xd0
 do_syscall_64+0x52/0xb0
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
This is because btrfs_read_chunk_tree() can come upon DEV_EXTENT's and
then read the device, which takes the device_list_mutex. The
device_list_mutex needs to be taken before the chunk_mutex, so this is a
problem. We only really need the chunk mutex around adding the chunk,
so move the mutex around read_one_chunk.
An argument could be made that we don't even need the chunk_mutex here
as it's during mount, and we are protected by various other locks.
However we already have special rules for ->device_list_mutex, and I'd
rather not have another special case for ->chunk_mutex.
CC: stable@vger.kernel.org # 4.19+
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
pull bot pushed a commit that referenced this pull request Aug 3, 2020
When running with -o enospc_debug you can get the following splat if one
of the dump_space_info's trip
 ======================================================
 WARNING: possible circular locking dependency detected
 5.8.0-rc5+ #20 Tainted: G OE
 ------------------------------------------------------
 dd/563090 is trying to acquire lock:
 ffff9e7dbf4f1e18 (&ctl->tree_lock){+.+.}-{2:2}, at: btrfs_dump_free_space+0x2b/0xa0 [btrfs]
 but task is already holding lock:
 ffff9e7e2284d428 (&cache->lock){+.+.}-{2:2}, at: btrfs_dump_space_info+0xaa/0x120 [btrfs]
 which lock already depends on the new lock.
 the existing dependency chain (in reverse order) is:
 -> #3 (&cache->lock){+.+.}-{2:2}:
	 _raw_spin_lock+0x25/0x30
	 btrfs_add_reserved_bytes+0x3c/0x3c0 [btrfs]
	 find_free_extent+0x7ef/0x13b0 [btrfs]
	 btrfs_reserve_extent+0x9b/0x180 [btrfs]
	 btrfs_alloc_tree_block+0xc1/0x340 [btrfs]
	 alloc_tree_block_no_bg_flush+0x4a/0x60 [btrfs]
	 __btrfs_cow_block+0x122/0x530 [btrfs]
	 btrfs_cow_block+0x106/0x210 [btrfs]
	 commit_cowonly_roots+0x55/0x300 [btrfs]
	 btrfs_commit_transaction+0x4ed/0xac0 [btrfs]
	 sync_filesystem+0x74/0x90
	 generic_shutdown_super+0x22/0x100
	 kill_anon_super+0x14/0x30
	 btrfs_kill_super+0x12/0x20 [btrfs]
	 deactivate_locked_super+0x36/0x70
	 cleanup_mnt+0x104/0x160
	 task_work_run+0x5f/0x90
	 __prepare_exit_to_usermode+0x1bd/0x1c0
	 do_syscall_64+0x5e/0xb0
	 entry_SYSCALL_64_after_hwframe+0x44/0xa9
 -> #2 (&space_info->lock){+.+.}-{2:2}:
	 _raw_spin_lock+0x25/0x30
	 btrfs_block_rsv_release+0x1a6/0x3f0 [btrfs]
	 btrfs_inode_rsv_release+0x4f/0x170 [btrfs]
	 btrfs_clear_delalloc_extent+0x155/0x480 [btrfs]
	 clear_state_bit+0x81/0x1a0 [btrfs]
	 __clear_extent_bit+0x25c/0x5d0 [btrfs]
	 clear_extent_bit+0x15/0x20 [btrfs]
	 btrfs_invalidatepage+0x2b7/0x3c0 [btrfs]
	 truncate_cleanup_page+0x47/0xe0
	 truncate_inode_pages_range+0x238/0x840
	 truncate_pagecache+0x44/0x60
	 btrfs_setattr+0x202/0x5e0 [btrfs]
	 notify_change+0x33b/0x490
	 do_truncate+0x76/0xd0
	 path_openat+0x687/0xa10
	 do_filp_open+0x91/0x100
	 do_sys_openat2+0x215/0x2d0
	 do_sys_open+0x44/0x80
	 do_syscall_64+0x52/0xb0
	 entry_SYSCALL_64_after_hwframe+0x44/0xa9
 -> #1 (&tree->lock#2){+.+.}-{2:2}:
	 _raw_spin_lock+0x25/0x30
	 find_first_extent_bit+0x32/0x150 [btrfs]
	 write_pinned_extent_entries.isra.0+0xc5/0x100 [btrfs]
	 __btrfs_write_out_cache+0x172/0x480 [btrfs]
	 btrfs_write_out_cache+0x7a/0xf0 [btrfs]
	 btrfs_write_dirty_block_groups+0x286/0x3b0 [btrfs]
	 commit_cowonly_roots+0x245/0x300 [btrfs]
	 btrfs_commit_transaction+0x4ed/0xac0 [btrfs]
	 close_ctree+0xf9/0x2f5 [btrfs]
	 generic_shutdown_super+0x6c/0x100
	 kill_anon_super+0x14/0x30
	 btrfs_kill_super+0x12/0x20 [btrfs]
	 deactivate_locked_super+0x36/0x70
	 cleanup_mnt+0x104/0x160
	 task_work_run+0x5f/0x90
	 __prepare_exit_to_usermode+0x1bd/0x1c0
	 do_syscall_64+0x5e/0xb0
	 entry_SYSCALL_64_after_hwframe+0x44/0xa9
 -> #0 (&ctl->tree_lock){+.+.}-{2:2}:
	 __lock_acquire+0x1240/0x2460
	 lock_acquire+0xab/0x360
	 _raw_spin_lock+0x25/0x30
	 btrfs_dump_free_space+0x2b/0xa0 [btrfs]
	 btrfs_dump_space_info+0xf4/0x120 [btrfs]
	 btrfs_reserve_extent+0x176/0x180 [btrfs]
	 __btrfs_prealloc_file_range+0x145/0x550 [btrfs]
	 cache_save_setup+0x28d/0x3b0 [btrfs]
	 btrfs_start_dirty_block_groups+0x1fc/0x4f0 [btrfs]
	 btrfs_commit_transaction+0xcc/0xac0 [btrfs]
	 btrfs_alloc_data_chunk_ondemand+0x162/0x4c0 [btrfs]
	 btrfs_check_data_free_space+0x4c/0xa0 [btrfs]
	 btrfs_buffered_write.isra.0+0x19b/0x740 [btrfs]
	 btrfs_file_write_iter+0x3cf/0x610 [btrfs]
	 new_sync_write+0x11e/0x1b0
	 vfs_write+0x1c9/0x200
	 ksys_write+0x68/0xe0
	 do_syscall_64+0x52/0xb0
	 entry_SYSCALL_64_after_hwframe+0x44/0xa9
 other info that might help us debug this:
 Chain exists of:
 &ctl->tree_lock --> &space_info->lock --> &cache->lock
 Possible unsafe locking scenario:
	 CPU0 CPU1
	 ---- ----
 lock(&cache->lock);
				 lock(&space_info->lock);
				 lock(&cache->lock);
 lock(&ctl->tree_lock);
 *** DEADLOCK ***
 6 locks held by dd/563090:
 #0: ffff9e7e21d18448 (sb_writers#14){.+.+}-{0:0}, at: vfs_write+0x195/0x200
 #1: ffff9e7dd0410ed8 (&sb->s_type->i_mutex_key#19){++++}-{3:3}, at: btrfs_file_write_iter+0x86/0x610 [btrfs]
 #2: ffff9e7e21d18638 (sb_internal#2){.+.+}-{0:0}, at: start_transaction+0x40b/0x5b0 [btrfs]
 #3: ffff9e7e1f05d688 (&cur_trans->cache_write_mutex){+.+.}-{3:3}, at: btrfs_start_dirty_block_groups+0x158/0x4f0 [btrfs]
 #4: ffff9e7e2284ddb8 (&space_info->groups_sem){++++}-{3:3}, at: btrfs_dump_space_info+0x69/0x120 [btrfs]
 #5: ffff9e7e2284d428 (&cache->lock){+.+.}-{2:2}, at: btrfs_dump_space_info+0xaa/0x120 [btrfs]
 stack backtrace:
 CPU: 3 PID: 563090 Comm: dd Tainted: G OE 5.8.0-rc5+ #20
 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./890FX Deluxe5, BIOS P1.40 05/03/2011
 Call Trace:
 dump_stack+0x96/0xd0
 check_noncircular+0x162/0x180
 __lock_acquire+0x1240/0x2460
 ? wake_up_klogd.part.0+0x30/0x40
 lock_acquire+0xab/0x360
 ? btrfs_dump_free_space+0x2b/0xa0 [btrfs]
 _raw_spin_lock+0x25/0x30
 ? btrfs_dump_free_space+0x2b/0xa0 [btrfs]
 btrfs_dump_free_space+0x2b/0xa0 [btrfs]
 btrfs_dump_space_info+0xf4/0x120 [btrfs]
 btrfs_reserve_extent+0x176/0x180 [btrfs]
 __btrfs_prealloc_file_range+0x145/0x550 [btrfs]
 ? btrfs_qgroup_reserve_data+0x1d/0x60 [btrfs]
 cache_save_setup+0x28d/0x3b0 [btrfs]
 btrfs_start_dirty_block_groups+0x1fc/0x4f0 [btrfs]
 btrfs_commit_transaction+0xcc/0xac0 [btrfs]
 ? start_transaction+0xe0/0x5b0 [btrfs]
 btrfs_alloc_data_chunk_ondemand+0x162/0x4c0 [btrfs]
 btrfs_check_data_free_space+0x4c/0xa0 [btrfs]
 btrfs_buffered_write.isra.0+0x19b/0x740 [btrfs]
 ? ktime_get_coarse_real_ts64+0xa8/0xd0
 ? trace_hardirqs_on+0x1c/0xe0
 btrfs_file_write_iter+0x3cf/0x610 [btrfs]
 new_sync_write+0x11e/0x1b0
 vfs_write+0x1c9/0x200
 ksys_write+0x68/0xe0
 do_syscall_64+0x52/0xb0
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
This is because we're holding the block_group->lock while trying to dump
the free space cache. However we don't need this lock, we just need it
to read the values for the printk, so move the free space cache dumping
outside of the block group lock.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
pull bot pushed a commit that referenced this pull request Aug 4, 2020
When, at probe time, an SCMI communication failure inhibits the capacity
to query power domains states, such domains should be skipped.
Registering partially initialized SCMI power domains with genpd will
causes kernel panic.
 arm-scmi timed out in resp(caller: scmi_power_state_get+0xa4/0xd0)
 scmi-power-domain scmi_dev.2: failed to get state for domain 9
 Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
 Mem abort info:
 ESR = 0x96000006
 EC = 0x25: DABT (current EL), IL = 32 bits
 SET = 0, FnV = 0
 EA = 0, S1PTW = 0
 Data abort info:
 ISV = 0, ISS = 0x00000006
 CM = 0, WnR = 0
 user pgtable: 4k pages, 48-bit VAs, pgdp=00000009f3691000
 [0000000000000000] pgd=00000009f1ca0003, p4d=00000009f1ca0003, pud=00000009f35ea003, pmd=0000000000000000
 Internal error: Oops: 96000006 [#1] PREEMPT SMP
 CPU: 2 PID: 381 Comm: bash Not tainted 5.8.0-rc1-00011-gebd118c2cca8 #2
 Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development Platform, BIOS EDK II Jan 3 2020
 Internal error: Oops: 96000006 [#1] PREEMPT SMP
 pstate: 80000005 (Nzcv daif -PAN -UAO BTYPE=--)
 pc : of_genpd_add_provider_onecell+0x98/0x1f8
 lr : of_genpd_add_provider_onecell+0x48/0x1f8
 Call trace:
 of_genpd_add_provider_onecell+0x98/0x1f8
 scmi_pm_domain_probe+0x174/0x1e8
 scmi_dev_probe+0x90/0xe0
 really_probe+0xe4/0x448
 driver_probe_device+0xfc/0x168
 device_driver_attach+0x7c/0x88
 bind_store+0xe8/0x128
 drv_attr_store+0x2c/0x40
 sysfs_kf_write+0x4c/0x60
 kernfs_fop_write+0x114/0x230
 __vfs_write+0x24/0x50
 vfs_write+0xbc/0x1e0
 ksys_write+0x70/0xf8
 __arm64_sys_write+0x24/0x30
 el0_svc_common.constprop.3+0x94/0x160
 do_el0_svc+0x2c/0x98
 el0_sync_handler+0x148/0x1a8
 el0_sync+0x158/0x180
Do not register any power domain that failed to be queried with genpd.
Fixes: 898216c ("firmware: arm_scmi: add device power domain support using genpd")
Link: https://lore.kernel.org/r/20200619220330.12217-1-cristian.marussi@arm.com
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
pull bot pushed a commit that referenced this pull request Aug 4, 2020
...bin Murphy <robin.murphy@arm.com>:
Hi all,
Although Florian was concerned about a trivial inline check to deal with
shared IRQs adding overhead, the reality is that it would be so small as
to not be worth even thinking about unless the driver was already tuned
to squeeze out every last cycle. And a brief look over the code shows
that that clearly isn't the case.
This is an example of some of the easy low-hanging fruit that jumps out
just from code inspection. Based on disassembly and ARM1176 cycle
timings, patch #2 should save the equivalent of 2-3 shared interrupt
checks off the critical path in all cases, and patch #3 possibly up to
about 100x more. I don't have any means to test these patches, let alone
measure performance, so they're only backed by the principle that less
code - and in particular fewer memory accesses - is almost always
better.
There is almost certainly a *lot* more to be had from careful use of
relaxed I/O accessors, not doing a read-modify-write of CS at every
reset, tweaking the loops further to avoid unnecessary writebacks to
variables, and so on. However since I'm not invested in this personally
I'm not going to pursue it any further; I'm throwing these patches out
as more of a demonstration to back up my original drive-by review
comments, so if anyone want to pick them up and run with them then
please do so.
Robin.
Robin Murphy (3):
 spi: bcm3835: Tidy up bcm2835_spi_reset_hw()
 spi: bcm2835: Micro-optimise IRQ handler
 spi: bcm2835: Micro-optimise FIFO loops
 drivers/spi/spi-bcm2835.c | 45 +++++++++++++++++++--------------------
 1 file changed, 22 insertions(+), 23 deletions(-)
--
2.23.0.dirty
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel 
pull bot pushed a commit that referenced this pull request Aug 5, 2020
Like syscall entry all architectures have similar and pointlessly different
code to handle pending work before returning from a syscall to user space.
 1) One-time syscall exit work:
 - rseq syscall exit
 - audit
 - syscall tracing
 - tracehook (single stepping)
 2) Preparatory work
 - Exit to user mode loop (common TIF handling).
 - Architecture specific one time work arch_exit_to_user_mode_prepare()
 - Address limit and lockdep checks
 
 3) Final transition (lockdep, tracing, context tracking, RCU). Invokes
 arch_exit_to_user_mode() to handle e.g. speculation mitigations
Provide a generic version based on the x86 code which has all the RCU and
instrumentation protections right.
Provide a variant for interrupt return to user mode as well which shares
the above #2 and #3 work items.
After syscall_exit_to_user_mode() and irqentry_exit_to_user_mode() the
architecture code just has to return to user space. The code after
returning from these functions must not be instrumented.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20200722220519.613977173@linutronix.de 
pull bot pushed a commit that referenced this pull request Aug 5, 2020
The following deadlock was captured. The first process is holding 'kernfs_mutex'
and hung by io. The io was staging in 'r1conf.pending_bio_list' of raid1 device,
this pending bio list would be flushed by second process 'md127_raid1', but
it was hung by 'kernfs_mutex'. Using sysfs_notify_dirent_safe() to replace
sysfs_notify() can fix it. There were other sysfs_notify() invoked from io
path, removed all of them.
 PID: 40430 TASK: ffff8ee9c8c65c40 CPU: 29 COMMAND: "probe_file"
 #0 [ffffb87c4df37260] __schedule at ffffffff9a8678ec
 #1 [ffffb87c4df372f8] schedule at ffffffff9a867f06
 #2 [ffffb87c4df37310] io_schedule at ffffffff9a0c73e6
 #3 [ffffb87c4df37328] __dta___xfs_iunpin_wait_3443 at ffffffffc03a4057 [xfs]
 #4 [ffffb87c4df373a0] xfs_iunpin_wait at ffffffffc03a6c79 [xfs]
 #5 [ffffb87c4df373b0] __dta_xfs_reclaim_inode_3357 at ffffffffc039a46c [xfs]
 #6 [ffffb87c4df37400] xfs_reclaim_inodes_ag at ffffffffc039a8b6 [xfs]
 #7 [ffffb87c4df37590] xfs_reclaim_inodes_nr at ffffffffc039bb33 [xfs]
 #8 [ffffb87c4df375b0] xfs_fs_free_cached_objects at ffffffffc03af0e9 [xfs]
 #9 [ffffb87c4df375c0] super_cache_scan at ffffffff9a287ec7
 #10 [ffffb87c4df37618] shrink_slab at ffffffff9a1efd93
 #11 [ffffb87c4df37700] shrink_node at ffffffff9a1f5968
 #12 [ffffb87c4df37788] do_try_to_free_pages at ffffffff9a1f5ea2
 #13 [ffffb87c4df377f0] try_to_free_mem_cgroup_pages at ffffffff9a1f6445
 #14 [ffffb87c4df37880] try_charge at ffffffff9a26cc5f
 #15 [ffffb87c4df37920] memcg_kmem_charge_memcg at ffffffff9a270f6a
 #16 [ffffb87c4df37958] new_slab at ffffffff9a251430
 #17 [ffffb87c4df379c0] ___slab_alloc at ffffffff9a251c85
 #18 [ffffb87c4df37a80] __slab_alloc at ffffffff9a25635d
 #19 [ffffb87c4df37ac0] kmem_cache_alloc at ffffffff9a251f89
 #20 [ffffb87c4df37b00] alloc_inode at ffffffff9a2a2b10
 #21 [ffffb87c4df37b20] iget_locked at ffffffff9a2a4854
 #22 [ffffb87c4df37b60] kernfs_get_inode at ffffffff9a311377
 #23 [ffffb87c4df37b80] kernfs_iop_lookup at ffffffff9a311e2b
 #24 [ffffb87c4df37ba8] lookup_slow at ffffffff9a290118
 #25 [ffffb87c4df37c10] walk_component at ffffffff9a291e83
 #26 [ffffb87c4df37c78] path_lookupat at ffffffff9a293619
 #27 [ffffb87c4df37cd8] filename_lookup at ffffffff9a2953af
 #28 [ffffb87c4df37de8] user_path_at_empty at ffffffff9a295566
 #29 [ffffb87c4df37e10] vfs_statx at ffffffff9a289787
 #30 [ffffb87c4df37e70] SYSC_newlstat at ffffffff9a289d5d
 #31 [ffffb87c4df37f18] sys_newlstat at ffffffff9a28a60e
 #32 [ffffb87c4df37f28] do_syscall_64 at ffffffff9a003949
 #33 [ffffb87c4df37f50] entry_SYSCALL_64_after_hwframe at ffffffff9aa001ad
 RIP: 00007f617a5f2905 RSP: 00007f607334f838 RFLAGS: 00000246
 RAX: ffffffffffffffda RBX: 00007f6064044b20 RCX: 00007f617a5f2905
 RDX: 00007f6064044b20 RSI: 00007f6064044b20 RDI: 00007f6064005890
 RBP: 00007f6064044aa0 R8: 0000000000000030 R9: 000000000000011c
 R10: 0000000000000013 R11: 0000000000000246 R12: 00007f606417e6d0
 R13: 00007f6064044aa0 R14: 00007f6064044b10 R15: 00000000ffffffff
 ORIG_RAX: 0000000000000006 CS: 0033 SS: 002b
 PID: 927 TASK: ffff8f15ac5dbd80 CPU: 42 COMMAND: "md127_raid1"
 #0 [ffffb87c4df07b28] __schedule at ffffffff9a8678ec
 #1 [ffffb87c4df07bc0] schedule at ffffffff9a867f06
 #2 [ffffb87c4df07bd8] schedule_preempt_disabled at ffffffff9a86825e
 #3 [ffffb87c4df07be8] __mutex_lock at ffffffff9a869bcc
 #4 [ffffb87c4df07ca0] __mutex_lock_slowpath at ffffffff9a86a013
 #5 [ffffb87c4df07cb0] mutex_lock at ffffffff9a86a04f
 #6 [ffffb87c4df07cc8] kernfs_find_and_get_ns at ffffffff9a311d83
 #7 [ffffb87c4df07cf0] sysfs_notify at ffffffff9a314b3a
 #8 [ffffb87c4df07d18] md_update_sb at ffffffff9a688696
 #9 [ffffb87c4df07d98] md_update_sb at ffffffff9a6886d5
 #10 [ffffb87c4df07da8] md_check_recovery at ffffffff9a68ad9c
 #11 [ffffb87c4df07dd0] raid1d at ffffffffc01f0375 [raid1]
 #12 [ffffb87c4df07ea0] md_thread at ffffffff9a680348
 #13 [ffffb87c4df07f08] kthread at ffffffff9a0b8005
 #14 [ffffb87c4df07f50] ret_from_fork at ffffffff9aa00344
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
@imgbot imgbot bot force-pushed the imgbot branch 2 times, most recently from 3ea2bb6 to d3c7352 Compare August 9, 2020 22:27
pull bot pushed a commit that referenced this pull request Aug 11, 2020
https://bugzilla.kernel.org/show_bug.cgi?id=208565
PID: 257 TASK: ecdd0000 CPU: 0 COMMAND: "init"
 #0 [<c0b420ec>] (__schedule) from [<c0b423c8>]
 #1 [<c0b423c8>] (schedule) from [<c0b459d4>]
 #2 [<c0b459d4>] (rwsem_down_read_failed) from [<c0b44fa0>]
 #3 [<c0b44fa0>] (down_read) from [<c044233c>]
 #4 [<c044233c>] (f2fs_truncate_blocks) from [<c0442890>]
 #5 [<c0442890>] (f2fs_truncate) from [<c044d408>]
 #6 [<c044d408>] (f2fs_evict_inode) from [<c030be18>]
 #7 [<c030be18>] (evict) from [<c030a558>]
 #8 [<c030a558>] (iput) from [<c047c600>]
 #9 [<c047c600>] (f2fs_sync_node_pages) from [<c0465414>]
 #10 [<c0465414>] (f2fs_write_checkpoint) from [<c04575f4>]
 #11 [<c04575f4>] (f2fs_sync_fs) from [<c0441918>]
 #12 [<c0441918>] (f2fs_do_sync_file) from [<c0441098>]
 #13 [<c0441098>] (f2fs_sync_file) from [<c0323fa0>]
 #14 [<c0323fa0>] (vfs_fsync_range) from [<c0324294>]
 #15 [<c0324294>] (do_fsync) from [<c0324014>]
 #16 [<c0324014>] (sys_fsync) from [<c0108bc0>]
This can be caused by flush_dirty_inode() in f2fs_sync_node_pages() where
iput() requires f2fs_lock_op() again resulting in livelock.
Reported-by: Zhiguo Niu <Zhiguo.Niu@unisoc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
pull bot pushed a commit that referenced this pull request Aug 11, 2020
... set
We received an error report that perf-record caused 'Segmentation fault'
on a newly system (e.g. on the new installed ubuntu).
 (gdb) backtrace
 #0 __read_once_size (size=4, res=<synthetic pointer>, p=0x14) at /root/0-jinyao/acme/tools/include/linux/compiler.h:139
 #1 atomic_read (v=0x14) at /root/0-jinyao/acme/tools/include/asm/../../arch/x86/include/asm/atomic.h:28
 #2 refcount_read (r=0x14) at /root/0-jinyao/acme/tools/include/linux/refcount.h:65
 #3 perf_mmap__read_init (map=map@entry=0x0) at mmap.c:177
 #4 0x0000561ce5c0de39 in perf_evlist__poll_thread (arg=0x561ce68584d0) at util/sideband_evlist.c:62
 #5 0x00007fad78491609 in start_thread (arg=<optimized out>) at pthread_create.c:477
 #6 0x00007fad7823c103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
The root cause is, evlist__add_bpf_sb_event() just returns 0 if
HAVE_LIBBPF_SUPPORT is not defined (inline function path). So it will
not create a valid evsel for side-band event.
But perf-record still creates BPF side band thread to process the
side-band event, then the error happpens.
We can reproduce this issue by removing the libelf-dev. e.g.
1. apt-get remove libelf-dev
2. perf record -a -- sleep 1
 root@test:~# ./perf record -a -- sleep 1
 perf: Segmentation fault
 Obtained 6 stack frames.
 ./perf(+0x28eee8) [0x5562d6ef6ee8]
 /lib/x86_64-linux-gnu/libc.so.6(+0x46210) [0x7fbfdc65f210]
 ./perf(+0x342e74) [0x5562d6faae74]
 ./perf(+0x257e39) [0x5562d6ebfe39]
 /lib/x86_64-linux-gnu/libpthread.so.0(+0x9609) [0x7fbfdc990609]
 /lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7fbfdc73b103]
 Segmentation fault (core dumped)
To fix this issue,
1. We either install the missing libraries to let HAVE_LIBBPF_SUPPORT
 be defined.
 e.g. apt-get install libelf-dev and install other related libraries.
2. Use this patch to skip the side-band event setup if HAVE_LIBBPF_SUPPORT
 is not set.
Committer notes:
The side band thread is not used just with BPF, it is also used with
--switch-output-event, so narrow the ifdef to the BPF specific part.
Fixes: 23cbb41 ("perf record: Move side band evlist setup to separate routine")
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20200805022937.29184-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
*Total -- 2,034.75kb -> 1,545.99kb (24.02%)
/Documentation/RCU/Design/Memory-Ordering/rcu_node-lock.svg -- 6.37kb -> 1.72kb (72.95%)
/Documentation/RCU/Design/Memory-Ordering/TreeRCU-callback-invocation.svg -- 16.36kb -> 8.54kb (47.8%)
/Documentation/userspace-api/media/v4l/subdev-image-processing-crop.svg -- 8.35kb -> 4.64kb (44.37%)
/Documentation/RCU/Design/Data-Structures/TreeMapping.svg -- 8.99kb -> 5.17kb (42.49%)
/Documentation/RCU/Design/Data-Structures/BigTreeClassicRCU.svg -- 12.41kb -> 7.24kb (41.72%)
/Documentation/RCU/Design/Data-Structures/TreeMappingLevel.svg -- 11.37kb -> 6.78kb (40.33%)
/Documentation/RCU/Design/Data-Structures/nxtlist.svg -- 11.45kb -> 6.87kb (40.04%)
/Documentation/RCU/Design/Memory-Ordering/TreeRCU-callback-registry.svg -- 23.37kb -> 14.04kb (39.92%)
/Documentation/RCU/Design/Data-Structures/TreeLevel.svg -- 22.98kb -> 14.10kb (38.65%)
/Documentation/RCU/Design/Data-Structures/HugeTreeClassicRCU.svg -- 24.77kb -> 15.24kb (38.45%)
/Documentation/userspace-api/media/v4l/nv12mt.svg -- 13.79kb -> 8.54kb (38.09%)
/Documentation/RCU/Design/Data-Structures/blkd_task.svg -- 20.36kb -> 12.80kb (37.13%)
/Documentation/RCU/Design/Expedited-Grace-Periods/Funnel0.svg -- 10.33kb -> 6.56kb (36.48%)
/Documentation/RCU/Design/Expedited-Grace-Periods/Funnel1.svg -- 10.33kb -> 6.56kb (36.47%)
/Documentation/RCU/Design/Expedited-Grace-Periods/ExpRCUFlow.svg -- 32.03kb -> 20.36kb (36.44%)
/Documentation/RCU/Design/Expedited-Grace-Periods/ExpSchedFlow.svg -- 31.91kb -> 20.31kb (36.34%)
/Documentation/RCU/Design/Expedited-Grace-Periods/Funnel2.svg -- 10.83kb -> 6.91kb (36.22%)
/Documentation/RCU/Design/Data-Structures/BigTreePreemptRCUBHdyntickCB.svg -- 22.47kb -> 14.42kb (35.8%)
/Documentation/RCU/Design/Expedited-Grace-Periods/Funnel3.svg -- 12.39kb -> 7.96kb (35.8%)
/Documentation/RCU/Design/Expedited-Grace-Periods/Funnel4.svg -- 12.39kb -> 7.96kb (35.79%)
/Documentation/userspace-api/media/v4l/subdev-image-processing-scaling-multi-source.svg -- 14.71kb -> 9.45kb (35.75%)
/Documentation/RCU/Design/Expedited-Grace-Periods/Funnel8.svg -- 11.85kb -> 7.62kb (35.73%)
/Documentation/RCU/Design/Expedited-Grace-Periods/Funnel5.svg -- 12.90kb -> 8.31kb (35.59%)
/Documentation/RCU/Design/Expedited-Grace-Periods/Funnel6.svg -- 12.91kb -> 8.32kb (35.57%)
/Documentation/RCU/Design/Expedited-Grace-Periods/Funnel7.svg -- 13.42kb -> 8.67kb (35.39%)
/Documentation/RCU/Design/Requirements/GPpartitionReaders1.svg -- 16.87kb -> 10.93kb (35.24%)
/Documentation/i2c/i2c_bus.svg -- 54.70kb -> 35.52kb (35.07%)
/Documentation/RCU/Design/Memory-Ordering/TreeRCU-dyntick.svg -- 25.07kb -> 16.36kb (34.74%)
/Documentation/RCU/Design/Requirements/ReadersPartitionGP1.svg -- 29.35kb -> 19.33kb (34.12%)
/Documentation/userspace-api/media/v4l/subdev-image-processing-full.svg -- 20.06kb -> 13.24kb (33.98%)
/Documentation/RCU/Design/Memory-Ordering/TreeRCU-hotplug.svg -- 28.02kb -> 18.64kb (33.49%)
/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp-init-1.svg -- 23.51kb -> 15.75kb (33.01%)
/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp-init-3.svg -- 22.62kb -> 15.48kb (31.55%)
/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp-init-2.svg -- 23.82kb -> 16.59kb (30.36%)
/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp-cleanup.svg -- 42.46kb -> 29.65kb (30.17%)
/Documentation/RCU/Design/Memory-Ordering/TreeRCU-qs.svg -- 43.21kb -> 30.37kb (29.73%)
/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp-fqs.svg -- 49.63kb -> 34.93kb (29.61%)
/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp.svg -- 208.62kb -> 148.33kb (28.9%)
/Documentation/userspace-api/media/v4l/vbi_hsync.svg -- 18.03kb -> 12.94kb (28.24%)
/Documentation/userspace-api/media/v4l/crop.svg -- 17.82kb -> 12.94kb (27.38%)
/Documentation/doc-guide/svg_image.svg -- 0.57kb -> 0.42kb (25.34%)
/Documentation/userspace-api/media/v4l/fieldseq_bt.svg -- 169.89kb -> 127.26kb (25.09%)
/Documentation/userspace-api/media/v4l/nv12mt_example.svg -- 44.88kb -> 33.62kb (25.08%)
/Documentation/userspace-api/media/v4l/fieldseq_tb.svg -- 171.23kb -> 128.29kb (25.08%)
/Documentation/userspace-api/media/v4l/constraints.svg -- 7.58kb -> 5.93kb (21.86%)
/Documentation/userspace-api/media/v4l/vbi_625.svg -- 58.37kb -> 46.08kb (21.05%)
/Documentation/userspace-api/media/v4l/vbi_525.svg -- 53.88kb -> 42.76kb (20.64%)
/Documentation/userspace-api/media/dvb/dvbstb.svg -- 10.02kb -> 8.30kb (17.17%)
/Documentation/admin-guide/blockdev/drbd/DRBD-data-packets.svg -- 17.02kb -> 14.43kb (15.24%)
/Documentation/admin-guide/blockdev/drbd/DRBD-8.3-data-packets.svg -- 21.70kb -> 18.39kb (15.21%)
/Documentation/userspace-api/media/v4l/bayer.svg -- 19.40kb -> 17.73kb (8.63%)
/Documentation/userspace-api/media/typical_media_device.svg -- 80.82kb -> 73.98kb (8.46%)
/Documentation/input/interactive.svg -- 3.32kb -> 3.22kb (2.97%)
/Documentation/userspace-api/media/v4l/selection.svg -- 204.45kb -> 199.05kb (2.64%)
/Documentation/input/shape.svg -- 5.55kb -> 5.41kb (2.52%)
/Documentation/admin-guide/media/ipu3_rcb.svg -- 75.50kb -> 73.70kb (2.39%)
/Documentation/networking/tls-offload-reorder-good.svg -- 6.38kb -> 6.28kb (1.55%)
/Documentation/networking/tls-offload-reorder-bad.svg -- 6.38kb -> 6.28kb (1.55%)
/Documentation/networking/tls-offload-layers.svg -- 49.03kb -> 48.87kb (0.33%)
/Documentation/logo.gif -- 15.95kb -> 15.92kb (0.23%)
Signed-off-by: ImgBotApp <ImgBotHelp@gmail.com>
pull bot pushed a commit that referenced this pull request Aug 14, 2020
Yonghong Song says:
====================
Andrii raised a concern that current uapi for bpf iterator map
element is a little restrictive and not suitable for future potential
complex customization. This is a valid suggestion, considering people
may indeed add more complex custimization to the iterator, e.g.,
cgroup_id + user_id, etc. for task or task_file. Another example might
be map_id plus additional control so that the bpf iterator may bail
out a bucket earlier if a bucket has too many elements which may hold
lock too long and impact other parts of systems.
Patch #1 modified uapi with kernel changes. Patch #2
adjusted libbpf api accordingly.
Changelogs:
 v3 -> v4:
 . add a forward declaration of bpf_iter_link_info in
 tools/lib/bpf/bpf.h in case that libbpf is built against
 not-latest uapi bpf.h.
 . target the patch set to "bpf" instead of "bpf-next"
 v2 -> v3:
 . undo "not reject iter_info.map.map_fd == 0" from v1.
 In the future map_fd may become optional, so let us use map_fd == 0
 indicating the map_fd is not set by user space.
 . add link_info_len to bpf_iter_attach_opts to ensure always correct
 link_info_len from user. Otherwise, libbpf may deduce incorrect
 link_info_len if it uses different uapi header than the user app.
 v1 -> v2:
 . ensure link_create target_fd/flags == 0 since they are not used. (Andrii)
 . if either of iter_info ptr == 0 or iter_info_len == 0, but not both,
 return error to user space. (Andrii)
 . do not reject iter_info.map.map_fd == 0, go ahead to use it trying to
 get a map reference since the map_fd is required for map_elem iterator.
 . use bpf_iter_link_info in bpf_iter_attach_opts instead of map_fd.
 this way, user space is responsible to set up bpf_iter_link_info and
 libbpf just passes the data to the kernel, simplifying libbpf design.
 (Andrii)
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
pull bot pushed a commit that referenced this pull request Aug 15, 2020
Freeing chip on error may lead to an Oops at the next time
the system goes to resume. Fix this by removing all
snd_echo_free() calls on error.
Fixes: 47b5d02 ("ALSA: Echoaudio - Add suspend support #2")
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Link: https://lore.kernel.org/r/20200813074632.17022-1-dinghao.liu@zju.edu.cn
Signed-off-by: Takashi Iwai <tiwai@suse.de>
pull bot pushed a commit that referenced this pull request Aug 16, 2020
loop_rw_iter() does not check whether the file has a read or
write function. This can lead to NULL pointer dereference
when the user passes in a file descriptor that does not have
read or write function.
The crash log looks like this:
[ 99.834071] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 99.835364] #PF: supervisor instruction fetch in kernel mode
[ 99.836522] #PF: error_code(0x0010) - not-present page
[ 99.837771] PGD 8000000079d62067 P4D 8000000079d62067 PUD 79d8c067 PMD 0
[ 99.839649] Oops: 0010 [#2] SMP PTI
[ 99.840591] CPU: 1 PID: 333 Comm: io_wqe_worker-0 Tainted: G D 5.8.0 #2
[ 99.842622] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014
[ 99.845140] RIP: 0010:0x0
[ 99.845840] Code: Bad RIP value.
[ 99.846672] RSP: 0018:ffffa1c7c01ebc08 EFLAGS: 00010202
[ 99.848018] RAX: 0000000000000000 RBX: ffff92363bd67300 RCX: ffff92363d461208
[ 99.849854] RDX: 0000000000000010 RSI: 00007ffdbf696bb0 RDI: ffff92363bd67300
[ 99.851743] RBP: ffffa1c7c01ebc40 R08: 0000000000000000 R09: 0000000000000000
[ 99.853394] R10: ffffffff9ec692a0 R11: 0000000000000000 R12: 0000000000000010
[ 99.855148] R13: 0000000000000000 R14: ffff92363d461208 R15: ffffa1c7c01ebc68
[ 99.856914] FS: 0000000000000000(0000) GS:ffff92363dd00000(0000) knlGS:0000000000000000
[ 99.858651] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 99.860032] CR2: ffffffffffffffd6 CR3: 000000007ac66000 CR4: 00000000000006e0
[ 99.861979] Call Trace:
[ 99.862617] loop_rw_iter.part.0+0xad/0x110
[ 99.863838] io_write+0x2ae/0x380
[ 99.864644] ? kvm_sched_clock_read+0x11/0x20
[ 99.865595] ? sched_clock+0x9/0x10
[ 99.866453] ? sched_clock_cpu+0x11/0xb0
[ 99.867326] ? newidle_balance+0x1d4/0x3c0
[ 99.868283] io_issue_sqe+0xd8f/0x1340
[ 99.869216] ? __switch_to+0x7f/0x450
[ 99.870280] ? __switch_to_asm+0x42/0x70
[ 99.871254] ? __switch_to_asm+0x36/0x70
[ 99.872133] ? lock_timer_base+0x72/0xa0
[ 99.873155] ? switch_mm_irqs_off+0x1bf/0x420
[ 99.874152] io_wq_submit_work+0x64/0x180
[ 99.875192] ? kthread_use_mm+0x71/0x100
[ 99.876132] io_worker_handle_work+0x267/0x440
[ 99.877233] io_wqe_worker+0x297/0x350
[ 99.878145] kthread+0x112/0x150
[ 99.878849] ? __io_worker_unuse+0x100/0x100
[ 99.879935] ? kthread_park+0x90/0x90
[ 99.880874] ret_from_fork+0x22/0x30
[ 99.881679] Modules linked in:
[ 99.882493] CR2: 0000000000000000
[ 99.883324] ---[ end trace 4453745f4673190b ]---
[ 99.884289] RIP: 0010:0x0
[ 99.884837] Code: Bad RIP value.
[ 99.885492] RSP: 0018:ffffa1c7c01ebc08 EFLAGS: 00010202
[ 99.886851] RAX: 0000000000000000 RBX: ffff92363acd7f00 RCX: ffff92363d461608
[ 99.888561] RDX: 0000000000000010 RSI: 00007ffe040d9e10 RDI: ffff92363acd7f00
[ 99.890203] RBP: ffffa1c7c01ebc40 R08: 0000000000000000 R09: 0000000000000000
[ 99.891907] R10: ffffffff9ec692a0 R11: 0000000000000000 R12: 0000000000000010
[ 99.894106] R13: 0000000000000000 R14: ffff92363d461608 R15: ffffa1c7c01ebc68
[ 99.896079] FS: 0000000000000000(0000) GS:ffff92363dd00000(0000) knlGS:0000000000000000
[ 99.898017] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 99.899197] CR2: ffffffffffffffd6 CR3: 000000007ac66000 CR4: 00000000000006e0
Fixes: 3296061 ("io_uring: correctly handle non ->{read,write}_iter() file_operations")
Cc: stable@vger.kernel.org
Signed-off-by: Guoyu Huang <hgy5945@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
pull bot pushed a commit that referenced this pull request Aug 21, 2020
Ext4 uses blkdev_get_by_dev() to get the block_device for journal device
which does check to see if the read-only block device was opened
read-only.
As a result ext4 will hapily proceed mounting the file system with
external journal on read-only device. This is bad as we would not be
able to use the journal leading to errors later on.
Instead of simply failing to mount file system in this case, treat it in
a similar way we treat internal journal on read-only device. Allow to
mount with -o noload in read-only mode.
This can be reproduced easily like this:
mke2fs -F -O journal_dev $JOURNAL_DEV 100M
mkfs.$FSTYPE -F -J device=$JOURNAL_DEV $FS_DEV
blockdev --setro $JOURNAL_DEV
mount $FS_DEV $MNT
touch $MNT/file
umount $MNT
leading to error like this
[ 1307.318713] ------------[ cut here ]------------
[ 1307.323362] generic_make_request: Trying to write to read-only block-device dm-2 (partno 0)
[ 1307.331741] WARNING: CPU: 36 PID: 3224 at block/blk-core.c:855 generic_make_request_checks+0x2c3/0x580
[ 1307.341041] Modules linked in: ext4 mbcache jbd2 rfkill intel_rapl_msr intel_rapl_common isst_if_commd
[ 1307.419445] CPU: 36 PID: 3224 Comm: jbd2/dm-2 Tainted: G W I 5.8.0-rc5 #2
[ 1307.427359] Hardware name: Dell Inc. PowerEdge R740/01KPX8, BIOS 2.3.10 08/15/2019
[ 1307.434932] RIP: 0010:generic_make_request_checks+0x2c3/0x580
[ 1307.440676] Code: 94 03 00 00 48 89 df 48 8d 74 24 08 c6 05 cf 2b 18 01 01 e8 7f a4 ff ff 48 c7 c7 50e
[ 1307.459420] RSP: 0018:ffffc0d70eb5fb48 EFLAGS: 00010286
[ 1307.464646] RAX: 0000000000000000 RBX: ffff9b33b2978300 RCX: 0000000000000000
[ 1307.471780] RDX: ffff9b33e12a81e0 RSI: ffff9b33e1298000 RDI: ffff9b33e1298000
[ 1307.478913] RBP: ffff9b7b9679e0c0 R08: 0000000000000837 R09: 0000000000000024
[ 1307.486044] R10: 0000000000000000 R11: ffffc0d70eb5f9f0 R12: 0000000000000400
[ 1307.493177] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000
[ 1307.500308] FS: 0000000000000000(0000) GS:ffff9b33e1280000(0000) knlGS:0000000000000000
[ 1307.508396] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1307.514142] CR2: 000055eaf4109000 CR3: 0000003dee40a006 CR4: 00000000007606e0
[ 1307.521273] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1307.528407] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1307.535538] PKRU: 55555554
[ 1307.538250] Call Trace:
[ 1307.540708] generic_make_request+0x30/0x340
[ 1307.544985] submit_bio+0x43/0x190
[ 1307.548393] ? bio_add_page+0x62/0x90
[ 1307.552068] submit_bh_wbc+0x16a/0x190
[ 1307.555833] jbd2_write_superblock+0xec/0x200 [jbd2]
[ 1307.560803] jbd2_journal_update_sb_log_tail+0x65/0xc0 [jbd2]
[ 1307.566557] jbd2_journal_commit_transaction+0x2ae/0x1860 [jbd2]
[ 1307.572566] ? check_preempt_curr+0x7a/0x90
[ 1307.576756] ? update_curr+0xe1/0x1d0
[ 1307.580421] ? account_entity_dequeue+0x7b/0xb0
[ 1307.584955] ? newidle_balance+0x231/0x3d0
[ 1307.589056] ? __switch_to_asm+0x42/0x70
[ 1307.592986] ? __switch_to_asm+0x36/0x70
[ 1307.596918] ? lock_timer_base+0x67/0x80
[ 1307.600851] kjournald2+0xbd/0x270 [jbd2]
[ 1307.604873] ? finish_wait+0x80/0x80
[ 1307.608460] ? commit_timeout+0x10/0x10 [jbd2]
[ 1307.612915] kthread+0x114/0x130
[ 1307.616152] ? kthread_park+0x80/0x80
[ 1307.619816] ret_from_fork+0x22/0x30
[ 1307.623400] ---[ end trace 27490236265b1630 ]---
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Link: https://lore.kernel.org/r/20200717090605.2612-1-lczerner@redhat.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
pull bot pushed a commit that referenced this pull request Aug 26, 2020
We have a number of "uart.port->desc.lock vs desc.lock->uart.port"
lockdep reports coming from 8250 driver; this causes a bit of trouble
to people, so let's fix it.
The problem is reverse lock order in two different call paths:
chain #1:
 serial8250_do_startup()
 spin_lock_irqsave(&port->lock);
 disable_irq_nosync(port->irq);
 raw_spin_lock_irqsave(&desc->lock)
chain #2:
 __report_bad_irq()
 raw_spin_lock_irqsave(&desc->lock)
 for_each_action_of_desc()
 printk()
 spin_lock_irqsave(&port->lock);
Fix this by changing the order of locks in serial8250_do_startup():
 do disable_irq_nosync() first, which grabs desc->lock, and grab
 uart->port after that, so that chain #1 and chain #2 have same lock
 order.
Full lockdep splat:
 ======================================================
 WARNING: possible circular locking dependency detected
 5.4.39 #55 Not tainted
 ======================================================
 swapper/0/0 is trying to acquire lock:
 ffffffffab65b6c0 (console_owner){-...}, at: console_lock_spinning_enable+0x31/0x57
 but task is already holding lock:
 ffff88810a8e34c0 (&irq_desc_lock_class){-.-.}, at: __report_bad_irq+0x5b/0xba
 which lock already depends on the new lock.
 the existing dependency chain (in reverse order) is:
 -> #2 (&irq_desc_lock_class){-.-.}:
 _raw_spin_lock_irqsave+0x61/0x8d
 __irq_get_desc_lock+0x65/0x89
 __disable_irq_nosync+0x3b/0x93
 serial8250_do_startup+0x451/0x75c
 uart_startup+0x1b4/0x2ff
 uart_port_activate+0x73/0xa0
 tty_port_open+0xae/0x10a
 uart_open+0x1b/0x26
 tty_open+0x24d/0x3a0
 chrdev_open+0xd5/0x1cc
 do_dentry_open+0x299/0x3c8
 path_openat+0x434/0x1100
 do_filp_open+0x9b/0x10a
 do_sys_open+0x15f/0x3d7
 kernel_init_freeable+0x157/0x1dd
 kernel_init+0xe/0x105
 ret_from_fork+0x27/0x50
 -> #1 (&port_lock_key){-.-.}:
 _raw_spin_lock_irqsave+0x61/0x8d
 serial8250_console_write+0xa7/0x2a0
 console_unlock+0x3b7/0x528
 vprintk_emit+0x111/0x17f
 printk+0x59/0x73
 register_console+0x336/0x3a4
 uart_add_one_port+0x51b/0x5be
 serial8250_register_8250_port+0x454/0x55e
 dw8250_probe+0x4dc/0x5b9
 platform_drv_probe+0x67/0x8b
 really_probe+0x14a/0x422
 driver_probe_device+0x66/0x130
 device_driver_attach+0x42/0x5b
 __driver_attach+0xca/0x139
 bus_for_each_dev+0x97/0xc9
 bus_add_driver+0x12b/0x228
 driver_register+0x64/0xed
 do_one_initcall+0x20c/0x4a6
 do_initcall_level+0xb5/0xc5
 do_basic_setup+0x4c/0x58
 kernel_init_freeable+0x13f/0x1dd
 kernel_init+0xe/0x105
 ret_from_fork+0x27/0x50
 -> #0 (console_owner){-...}:
 __lock_acquire+0x118d/0x2714
 lock_acquire+0x203/0x258
 console_lock_spinning_enable+0x51/0x57
 console_unlock+0x25d/0x528
 vprintk_emit+0x111/0x17f
 printk+0x59/0x73
 __report_bad_irq+0xa3/0xba
 note_interrupt+0x19a/0x1d6
 handle_irq_event_percpu+0x57/0x79
 handle_irq_event+0x36/0x55
 handle_fasteoi_irq+0xc2/0x18a
 do_IRQ+0xb3/0x157
 ret_from_intr+0x0/0x1d
 cpuidle_enter_state+0x12f/0x1fd
 cpuidle_enter+0x2e/0x3d
 do_idle+0x1ce/0x2ce
 cpu_startup_entry+0x1d/0x1f
 start_kernel+0x406/0x46a
 secondary_startup_64+0xa4/0xb0
 other info that might help us debug this:
 Chain exists of:
 console_owner --> &port_lock_key --> &irq_desc_lock_class
 Possible unsafe locking scenario:
 CPU0 CPU1
 ---- ----
 lock(&irq_desc_lock_class);
 lock(&port_lock_key);
 lock(&irq_desc_lock_class);
 lock(console_owner);
 *** DEADLOCK ***
 2 locks held by swapper/0/0:
 #0: ffff88810a8e34c0 (&irq_desc_lock_class){-.-.}, at: __report_bad_irq+0x5b/0xba
 #1: ffffffffab65b5c0 (console_lock){+.+.}, at: console_trylock_spinning+0x20/0x181
 stack backtrace:
 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.39 #55
 Hardware name: XXXXXX
 Call Trace:
 <IRQ>
 dump_stack+0xbf/0x133
 ? print_circular_bug+0xd6/0xe9
 check_noncircular+0x1b9/0x1c3
 __lock_acquire+0x118d/0x2714
 lock_acquire+0x203/0x258
 ? console_lock_spinning_enable+0x31/0x57
 console_lock_spinning_enable+0x51/0x57
 ? console_lock_spinning_enable+0x31/0x57
 console_unlock+0x25d/0x528
 ? console_trylock+0x18/0x4e
 vprintk_emit+0x111/0x17f
 ? lock_acquire+0x203/0x258
 printk+0x59/0x73
 __report_bad_irq+0xa3/0xba
 note_interrupt+0x19a/0x1d6
 handle_irq_event_percpu+0x57/0x79
 handle_irq_event+0x36/0x55
 handle_fasteoi_irq+0xc2/0x18a
 do_IRQ+0xb3/0x157
 common_interrupt+0xf/0xf
 </IRQ>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Fixes: 768aec0 ("serial: 8250: fix shared interrupts issues with SMP and RT kernels")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Reported-by: Raul Rangel <rrangel@google.com>
BugLink: https://bugs.chromium.org/p/chromium/issues/detail?id=1114800
Link: https://lore.kernel.org/lkml/CAHQZ30BnfX+gxjPm1DUd5psOTqbyDh4EJE=2=VAMW_VDafctkA@mail.gmail.com/T/#u
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20200817022646.1484638-1-sergey.senozhatsky@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

No reviews

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

1 participant

AltStyle によって変換されたページ (->オリジナル) /