bug-readline
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-readline] Pasting long lines is slow


From: Ole Laursen
Subject: Re: [Bug-readline] Pasting long lines is slow
Date: 2015年5月15日 20:33:37 +0200

2015年04月30日 17:03 GMT+02:00 Ole Laursen <address@hidden>:
> I'm seeing what looks to me like quadratic behaviour on pasting long
> lines in readline applications like bash, python, psql.
>
> To reproduce: open a new terminal with say bash, make a really long
> line (> 20k characters, e.g. with python -c 'print "a" * 20000'), then
> paste it into the terminal. The first characters are inserted quickly,
> but it gradually slows down. With enough data it ends up being so slow
> as to be unusable.
I did a profile now on examples/fileman with perf:
https://perf.wiki.kernel.org/index.php/Tutorial#Sampling_with_perf_record
Here are the results from pasting 5000 x "aaa " with my default locale
da_DK.UTF-8. As is evident most of the time is spent measuring the
widths of the a's:
 21,22% fileman libc-2.19.so [.] __gconv_transform_utf8_internal
 20,60% fileman libc-2.19.so [.] wcwidth
 19,80% fileman libc-2.19.so [.] __mbrtowc
 15,45% fileman fileman [.] rl_redisplay
 12,44% fileman libc-2.19.so [.] __ctype_get_mb_cur_max
 2,74% fileman libc-2.19.so [.] __x86.get_pc_thunk.bx
 2,63% fileman libc-2.19.so [.] _dl_mcount_wrapper_check
 1,42% fileman libc-2.19.so [.] __x86.get_pc_thunk.cx
 1,34% fileman fileman [.] _rl_find_prev_mbchar
 1,00% fileman fileman [.] 0x000014c0
 0,57% fileman fileman [.] 0x00001520
 0,12% fileman fileman [.] _rl_col_width
 0,08% fileman libc-2.19.so [.] __memcmp_sse4_2
 0,05% fileman fileman [.] update_line
 0,03% fileman [kernel.kallsyms] [k] _raw_spin_lock_irqsave
 0,02% fileman [vdso] [.] __kernel_vsyscall
 0,02% fileman libc-2.19.so [.] __memset_sse2_rep
 0,01% fileman fileman [.] address@hidden
 0,01% fileman [kernel.kallsyms] [k] enqueue_task_fair
 0,01% fileman [i915] [k] gen6_read32
 0,01% fileman [kernel.kallsyms] [k] mutex_unlock
 0,01% fileman [kernel.kallsyms] [k] native_write_msr_safe
 0,01% fileman libc-2.19.so [.] _IO_file_sync@@GLIBC_2.1
 0,01% fileman libc-2.19.so [.] new_do_write
 0,01% fileman [kernel.kallsyms] [k] resched_curr
 0,01% fileman [kernel.kallsyms] [k] restore_nocheck
 0,01% fileman [kernel.kallsyms] [k] insert_work
 0,01% fileman [kernel.kallsyms] [k] timekeeping_update.constprop.7
 0,01% fileman [kernel.kallsyms] [k] __queue_work
 0,01% fileman [kernel.kallsyms] [k] __schedule
 0,01% fileman [kernel.kallsyms] [k] n_tty_read
 0,01% fileman [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
 0,01% fileman [kernel.kallsyms] [k] task_waking_fair
 0,01% fileman [kernel.kallsyms] [k] tty_paranoia_check
 0,01% fileman [kernel.kallsyms] [k] sysenter_past_esp
 0,01% fileman [kernel.kallsyms] [k] vfs_read
 0,01% fileman [kernel.kallsyms] [k] queue_work_on
 0,01% fileman [kernel.kallsyms] [k] timerqueue_del
 0,01% fileman libc-2.19.so [.] _IO_fwrite
 0,01% fileman fileman [.] _rl_move_cursor_relative
 0,01% fileman [kernel.kallsyms] [k] __copy_to_user_ll
 0,01% fileman [kernel.kallsyms] [k] update_sd_lb_stats
 0,01% fileman [kernel.kallsyms] [k] down_read
 0,01% fileman [kernel.kallsyms] [k] wake_up_process
 0,01% fileman [kernel.kallsyms] [k] native_sched_clock
 0,01% fileman fileman [.] _rl_internal_char_cleanup
 0,01% fileman [kernel.kallsyms] [k] security_file_permission
 0,01% fileman fileman [.] _rl_dispatch_subseq
 0,00% fileman [kernel.kallsyms] [k] cpuacct_charge
 0,00% fileman [kernel.kallsyms] [k] memcpy
 0,00% fileman [kernel.kallsyms] [k] apic_timer_interrupt
 0,00% fileman libc-2.19.so [.] __GI___mempcpy
 0,00% fileman fileman [.] rl_insert_text
Compare this to a similar profile with LANG=C which is much faster:
 61,71% fileman libc-2.19.so [.] __ctype_get_mb_cur_max
 23,44% fileman fileman [.] rl_redisplay
 5,01% fileman libc-2.19.so [.] __x86.get_pc_thunk.cx
 4,87% fileman fileman [.] update_line
 3,35% fileman fileman [.] 0x000014c0
 0,07% fileman [kernel.kallsyms] [k] enqueue_task_fair
 0,06% fileman [vdso] [.] __kernel_vsyscall
 0,06% fileman [kernel.kallsyms] [k] _raw_spin_lock_irqsave
 0,05% fileman [kernel.kallsyms] [k] insert_work
 0,05% fileman [kernel.kallsyms] [k] native_write_msr_safe
 0,04% fileman [kernel.kallsyms] [k] restore_nocheck
 0,04% fileman [kernel.kallsyms] [k] sysenter_past_esp
 0,03% fileman fileman [.] _rl_dispatch_subseq
 0,03% fileman [kernel.kallsyms] [k] fsnotify
 0,03% fileman [kernel.kallsyms] [k] n_tty_write
 0,03% fileman [kernel.kallsyms] [k] down_read
 0,03% fileman [kernel.kallsyms] [k] _raw_spin_lock
 0,03% fileman [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
 0,03% fileman libc-2.19.so [.] new_do_write
 0,03% fileman fileman [.] readline_internal_char
 0,03% fileman [kernel.kallsyms] [k] atomic64_read_cx8
 0,02% fileman [kernel.kallsyms] [k] wake_up_worker
 0,02% fileman [kernel.kallsyms] [k] __queue_work
 0,02% fileman fileman [.] _rl_internal_char_cleanup
 0,02% fileman [kernel.kallsyms] [k] rb_erase
 0,02% fileman [kernel.kallsyms] [k] cpuacct_charge
 0,02% fileman [kernel.kallsyms] [k] copy_from_read_buf
 0,02% fileman [kernel.kallsyms] [k] apic_timer_interrupt
 0,02% fileman [kernel.kallsyms] [k] update_curr
 0,02% fileman fileman [.] rl_insert_text
I'm assuming that readline does a redisplay on each inserted char? In
this redisplay, it goes through all characters on the line inserted so
far, so the end result is O(n^2).
In a multi-byte case, examining each character is obviously much more expensive.
Notice that in even with LANG=C, __ctype_get_mb_cur_max accounts for
61.7% of the time. As far as I can tell, this comes from MB_CUR_MAX
which despite the naming is not a constant:
http://man7.org/linux/man-pages/man3/MB_CUR_MAX.3.html
As a quickfix for that, I'm attaching a patch to only compute
MB_CUR_MAX once in the top of the function. With this patch, the
LANG=C profile is noticeable faster and looks like this
 87,55% fileman fileman [.] rl_redisplay
 9,54% fileman fileman [.] update_line
 0,41% fileman libc-2.19.so [.] __ctype_get_mb_cur_max
 0,12% fileman [vdso] [.] __kernel_vsyscall
 0,09% fileman [kernel.kallsyms] [k] sysenter_past_esp
 0,07% fileman [kernel.kallsyms] [k] n_tty_read
 0,06% fileman [kernel.kallsyms] [k] try_to_wake_up
 0,06% fileman [kernel.kallsyms] [k] restore_nocheck
 0,06% fileman [kernel.kallsyms] [k] resched_curr
 0,06% fileman [kernel.kallsyms] [k] _raw_spin_lock_irqsave
 0,06% fileman [kernel.kallsyms] [k] atomic64_read_cx8
 0,05% fileman [kernel.kallsyms] [k] tty_paranoia_check
 0,05% fileman fileman [.] rl_insert_text
This helps a bit with the UTF 8 case, but not nearly enough.
I suppose one could try to make the loop content faster somehow, but
to really fix this to not be O(n^2) one should only do the work
necessary to take the single added character into account instead of
starting off from scratch each time. Or perhaps try to tackle this
case of copy-pasting a bucket-load of characters by batching them?
What do you think?
Ole

Attachment: mb-cur-max-fix.patch
Description: Binary data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]

AltStyle によって変換されたページ (->オリジナル) /