KernelNewbies : Last updated at 2017年12月30日 01:30:22

Linux 4.2 has been released on 30 Aug 2015

Summary: This release adds a new amdgpu driver for modern AMD Radeon hardware, a virtio GPU driver to use the host GPU capabilities inside guests, the new atomic modesetting graphics API has been declared stable, support for stacking of security modules, a faster and more scalable spinlock implementation, cgroup writeback support, and reintroduction of the H8/300 architecture.There are also new drivers and many other small improvements.

1. Prominent features

1.1. New driver amdgpu for modern AMD Radeon hardware

This release includes the amdgpu driver, a new driver for VI+ AMD asics. It currently supports Tonga, Iceland, and Carrizo and also contains an option to build support for CI parts for testing. All major functionality is supported (displays, gfx, compute, dma, video decode/encode, etc.). Power management is working on Carrizo, but is still being worked on for Tonga and Iceland.

The purpose of this driver is to unify AMD's Linux offerings: the functionality that it's kept as private code in the Catalyst driver will be either ported to this driver or will run as a user-space private blob that uses the new driver.

Code: (merge)

1.2. Add virtio gpu driver

Virtio drivers are "fake" drivers that are used to make communication between virtualization guests and host faster, because emulating real hardware is complicated and ineficcient.

This release adds a virtual GPU driver for virtio. It can be used with QEMU based VMMs (like KVM or Xen). For now it supports kernel-modesetting: The xorg modesetting driver can handle the device just fine, the framebuffer for fbcon is there too. Qemu patches for the host side are under review currently. This initial revision has only 2d support, 3d (virgl) support requires some more work on the qemu side and will be added later.

Code: commit

1.3. Atomic modesetting API enabled by default

This release finally completes the atomic modesetting API and enables it by default. For details about the atomic modesetting API and why it is neccesary, read these recommended LWN articles: Atomic mode setting design overview, part 1, and part 2

Code: commit

1.4. Stacking of security modules

There are several security modules in the Linux kernel, but only one can be used. For a very long time, developers have wanted to be able to be able to use more than one at the same time ("stacking"). This release adds support for stacking of linux security modules

For more details, read Progress in security module stacking

Code: commit, commit, commit, commit, commit, commit, commit

1.5. Queued spinlocks become the default spinlock implementation

This release adds support in the x86 architecture for queue-based spinlocks that can replace the default ticket spinlock without increasing the size of the spinlock data structure.

The queue spinlock has slightly better performance than the ticket spinlock in uncontended case, and its performance can be much better with moderate to heavy contention. It is especially suitable for NUMA machines with at least 2 sockets. Though even at the 2-socket level, there can be significant speedup depending on the workload. It can also improve the performance of an I/O and interrupt intensive stress test with a lot of spinlock contention on a 2-socket system by up to 20%.

For more details, read this LWN article: MCS locks and qspinlocks

Code: commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit

1.6. cgroup writeback support

The Linux kernel can throttle processes than are trying to write too many pages to the disk ("writeback"). But this control is global and doesn't allow per-cgroup limits. This release adds support for writeback control of processes inside a cgroup.

For more details, read this recommended LWN article: Writeback and control groups

Documentation: Updates to Documentation/cgroups/blkio-controller.txt

Code: (merge)

1.7. Reintroduction of the H8/300 architecture

Linux added support for the H8/300 architecture in Linux 2.5.68. But it got removed in Linux 3.13 due to lack of maintainance.

In this release, the H8/300 architecture has got support again, and it's being reincorporated to the tree

Code: arch/h8300

2. Drivers and architectures

3. Core (various)

  • futex: Implement lockless wakeups. At the lowest level,it can reduce latency of a single thread attempting to acquire hb->lock in highly contended scenarios up to 2x commit

  • mqueue: Implement lockless pipelined wakeups commit

  • Allow drivers request for their probe functions to be called asynchronously during driver and device registration (manual binding is still synchronous) commit, commit, commit

  • printk: implement support for extended console drivers commit

  • rcu: Provide diagnostic option to slow down grace-period scans commit

  • task scheduler
    • Replace spinlocks with atomics in thread_group_cputimer(), to improve scalability commit

    • debug: Add sum_sleep_runtime to /proc/<pid>/sched when CONFIG_SCHEDSTATS is enabled commit

    • debug: Replace vruntime with wait_sum in /proc/sched_debug commit

    • numa: Show numa_group ID in /proc/sched_debug task listings to see how the numa groups have spread across the system commit

    • Implement lockless wake-queues API commit

  • Export the CPU list that actually got isolated in /sys/devices/system/cpu/isolated. This can be used by system management tools like libvirt, openstack, and others to ensure proper placement of tasks commit

  • Export the CPU list running in nohz_full mode in /sys/devices/system/cpu/nohz_full. This can be used by system management tools like libvirt, openstack, and others to ensure proper placement of tasks commit

* Allow the watchdog to run by default only on the housekeeping cores when nohz_full is in effect; this seems to be a good compromise short of turning it off completely (since the nohz_full cores can't tolerate a watchdog). To provide customizability, a /proc/sys/kernel/watchdog_cpumask file is added so that the set of cores running the watchdog can be tuned to different values after bootup commit, commit, commit

  • Add an escape sequence to specify the current console's cursor blink interval. The interval is specified as a number of milliseconds until the next cursor display state toggle, from 50 to 65535 commit

4. File systems

  • BTRFS
    • Let btree defrag work in SSD mode commit

    • Allow unprivileged query of the root id of the containing subvolume with the INO_LOOKUP ioctl commit

    • Show "subvol=" and "subvolid=" in /proc/mounts commit

  • FUSE
    • Allow an open fuse device to be "cloned" with FUSE_DEV_IOC_CLONE ioctl commit

  • CIFS
    • Add (minimal) support for the new protocol dialect, SMB3.1.1 commit, commit, commit, commit

    • Add reflink copy support (cp --reflink ) over SMB3.11 commit

    • Add ioctl CIFS_IOC_SET_INTEGRITY to set integrity. Set integrity increases reliability of files stored on SMB3 servers commit

  • EXT2
    • Enable cgroup writeback support commit

  • EXT4
    • Add support FALLOC_FL_INSERT_RANGE for fallocate(2), which allows to insert a hole within the file without overwriting any existing data commit

  • F2FS
  • XFS
  • OVERLAYFS
    • Allow distributed filesystems as lower layer commit

  • GFS2
    • Add support for rename2 and RENAME_EXCHANGE commit

  • NFS
    • NFSv4.2 LAYOUTSTATS functionality for pnfs flexfiles (merge)

  • NILFS2
  • UDF
  • FS-Cache
    • Count culled objects and objects rejected due to lack of space, and the number of initialised operations, and show them in /proc/fs/fscache/stats commit, commit

5. Memory management

  • memcg: add per cgroup dirty page accounting, which provides the ability for each memory cgroup to have independent dirty/writeback page statistics which can provide information for per-cgroup direct reclaim or some. The new memcg stat is visible in the per memcg memory.stat cgroupfs file. The new accounting supports future efforts to add per cgroup dirty page throttling and writeback commit

  • Try to allocate all boot time kernel data structures from mirrored memory to have a recovery path for unrecoverable memory errors encountered during kernel code execution commit, commit, commit

  • zswap: Support runtime enable/disable of zswap commit

6. Block layer

  • The libnvdimm sub-system introduces, in addition to the libnvdimm-core, 4 drivers / enabling modules (common code to all: (merge))

    • NFIT: Add ACPI NVDIMM Firmware Interface Table (NFIT) support: It adds infrastructure to probe ACPI 6 compliant platforms for NVDIMMs (NFIT) and register a libnvdimm device tree. In addition to storage devices this also enables libnvdimm to pass ACPI._DSM messages for platform/dimm configuration.
    • PMEM: Initially merged in v4.1 this driver for contiguous spans of persistent memory address ranges is re-worked to drive PMEM-namespaces emitted by the libnvdimm-core. In this update the PMEM driver, on x86, gains the ability to assert that writes to persistent memory have been flushed all the way through the caches and buffers in the platform to persistent media.
    • BLK: This new driver enables access to persistent memory media through "Block Data Windows" as defined by the NFIT. The primary difference of this driver to PMEM is that only a small window of persistent memory is mapped into system address space at any given point in time. Per-NVDIMM windows are reprogrammed at run time, per-I/O, to access different portions of the media. BLK-mode, by definition, does not support DAX
    • BTT: This is a library, optionally consumed by either PMEM or BLK, that converts a byte-accessible namespace into a disk with atomic sector update semantics (prevents sector tearing on crash or power loss)
  • Make CFQ default to IOPS mode on SSDs commit

  • zram: add dynamic device add/remove functionality commit

  • Addition of policy specific data to blkcg for block cgroups commit

  • Add support for DAX reads/writes to block devices commit

  • UBI: Dynamically allocate minor numbers commit

  • Device mapper
    • dm raid: Add dm-raid access to the MD RAID0 personality to enable single zone striping commit

    • dm cache: Add fail io mode and needs_check flag: If a cache metadata operation fails (e.g. transaction commit) the cache's metadata device will abort the current transaction, set a new needs_check flag, and the cache will transition to "read-only" mode commit

    • dm cache: add stochastic-multi-queue (smq) policy, make it default: The stochastic-multi-queue (smq) policy addresses some of the problems with the current multiqueue (mq) policy (memory usage, level balancing, adaptability, performance) commit, commit

    • dm stats: add support for request-based DM devices (eg. multipath) commit

    • dm stats: add option to dm statistics to collect and report a histogram of IO latencies commit

    • dm stats: Make it possible to use precise timestamps with nanosecond granularity commit

    • dm thin: range discard support commit

  • rbd: queue_depth map option commit

7. Cryptography

  • Add jitterentropy RNG. The CPU Jitter RNG provides a source of good entropy by collecting CPU executing time jitter. The entropy in the CPU execution time jitter is magnified by the CPU Jitter Random Number Generator commit

  • drbg: use Jitter RNG to obtain seed commit

  • rng: Make DRBG the default crypto api RNG commit

  • New chacha20 cipher. ChaCha20 is a 256-bit high-speed stream cipher designed by Daniel J.Bernstein and further specified in RFC7539 for use in IETF protocols commit

  • Add Poly1305 authenticator algorithm. Poly1305 is an authenticator algorithm designed by Daniel J. Bernstein, it is used for the ChaCha20-Poly1305 AEAD, specified in RFC7539 for use in IETF protocols commit, commit, commit

  • rsa: add a new rsa generic implementation commit

  • Added support for SEC1 hardware to talitos commit, commit, commit, commit, commit, commit, commit, commit, commit, commit

  • echainiv: Add Encrypted Chain IV Generator, which generates an IV based on the encryption of a sequence number xored with a salt. This is the default algorithm for CBC commit

  • seqiv: Add a new IV generator seqniv which is identical to seqiv except that it skips the IV when authenticating. This is intended to be used by algorithms such as rfc4106 that does the IV authentication implicitly commit

8. Security

  • selinux: enable genfscon labeling for sysfs and pstore files commit

  • selinux: enable per-file labeling for debugfs files. commit

  • Smack: allow multiple labels in onlycap commit

  • evm: permit the labeling of existing files on pseudo files systems commit

  • ima: add support for new "euid" policy condition commit

  • ima: extend "mask" policy matching support commit

  • ima: update builtin policies commit

9. Tracing and perf tool

  • Allow disabling/enabling events dynamically in 'perf top': a 'perf top' session can instantly become a 'perf report' one, i.e. going from dynamic analysis to a static one, returning to a dynamic one is possible, to toogle the modes, just press 'f' to 'freeze/unfreeze' the sampling commit, commit

  • Add Instruction Tracing support (--itrace) commit, commit, commit, commit, commit

  • perf probe: Accept multiple filter options. Each filters are combined by logical-or. E.g. --filter abc* --filter *def is same as --filter abc*|*def commit

  • perf kmem: Add --live option for current allocation stat commit

  • perf kmem: Add kmem.default config option to select the default value ('page' or 'slab') commit

  • perf kmem: Implement stat --page --caller, it shows caller statistics for page commit

  • perf kmem: Add new sort keys for page: page, order, migtype, gfp commit

  • perf probe: allows the user to pass the filter pattern directly to the --funcs option commit and --list option commit and --del option commit

  • perf record: Add AUX area tracing Snapshot Mode support (--snapshot) commit, commit

  • perf bench futex: A new benchmark 'wake-parallel' is added to measure parallel waker threads commit

  • perf probe: Add --no-inlines option to avoid searching inline functions commit

  • perf probe: Support $params special probe argument. $params is similar to $vars but matches only function parameters not local variables. Thus, this is useful for tracing function parameter changing or tracing function call with parameters commit

  • perf probe: Support glob wildcards for function name when adding new probes. This will allow us to build caches of function-entry level information with $params commit

  • perf probe: Add --range option to show a variable's location range commit

  • perf sched: Add option to merge like comms to lat output commit

  • perf record: Add a new branch sampling type support for indirect jumps: perf record -j ind_jmp .......It enables analysis of indirect jumps targets commit

  • perf tools: Make Ctrl-C stop processing on TUI commit

  • perf annotate: Display total number of samples with --show-total-period commit

  • perf probe: Speed up perf probe --list by caching debuginfo commit

  • perf tools: The time out to limit the individual proc map processing was hard code to 500ms. This patch introduce a new option --proc-map-timeout to make the time limit configurable commit

  • perf stat: Currently all the -p option PID arguments tasks values get aggregated and printed as single values. Adding --per-tasks option to print values per task commit

  • BPF
    • BPF based latency tracing commit

    • Allow BPF programs access skb->skb_iif and skb->dev->ifindex fields commit

    • Allow bpf programs to tail-call other bpf programs commit

    • Allow programs to write to certain skb fields commit

    • Disallow bpf tc programs access current->pid,uid commit

  • tracing: add trace event for memory-failure commit

10. Virtualization

  • KVM
    • Implement multiple address spaces commit

  • Hyper-V
    • file copy service: full handshake support commit

    • vmbus: Implement NUMA aware CPU affinity for channels commit

    • vmbus: Implement the protocol for tearing down vmbus state commit

    • vss: full handshake support commit

    • Tools: kvp: use misc char device to communicate with kernel commit, vss: use misc char device to communicate with kernel commit

  • user mode linux: Remove hppfs (honeypot procfs) was an attempt to use UML as honeypot. It was never stable nor in heavy use commit

  • vhost: add max_mem_regions module parameter to specify the maximum number of memory regions in memory map (default 64) commit

  • vhost: allow vhost to support guests with a different byte ordering from host while using legacy virtio commit

  • vmxnet3: Make the driver understand adapter version 2 commit

  • xen: block: add multi-page ring support, so that more requests can be issued by using more than one pages as the request ring between blkfrontand backend. As a result, the performance can get improved significantly. If using 64 pages as the ring, the IOPS increased about 15 times for the throughput testing and above doubled for the latency testing commit

11. Networking

  • TCP: Add CAIA Delay-Gradient (CDG) congestion control. CDG modifies the TCP sender in order to: -Use the delay gradient as a congestion signal; -Back off with an average probability that is independent of the RTT; - Coexist with flows that use loss-based congestion control, i.e., flows that are unresponsive to the delay signal; Tolerate packet loss unrelated to congestion.(disabled by default. Its FreeBSD implementation was presented for the ICCRG in July 2012 commit

  • Add IP_BIND_ADDRESS_NO_PORT to overcome bind(0) limitations: When an application needs to force a source IP on an active TCP socket it has to use bind(IP, port=x). As most applications do not want to deal with already used ports, x is often set to 0, meaning the kernel is in charge to find an available port. But kernel does not know yet if this socket is going to be a listener or be connected. This patch adds a new SOL_IP socket option, asking kernel to ignore the 0 port provided by application in bind(IP, port=0) and only remember the given IP address. The port will be automatically chosen at connect() time, in a way that allows sharing a source port as long as the 4-tuples are unique commit

  • Introduce programable flow dissector commit

  • Introduce Flower classifier, which can classify packets based on a configurable combination of packet keys and masks commit

  • TCP: Add rfc3168, section 6.1.1.1. fallback for outgoing ECN connections. In other words, this work adds a retry with a non-ECN setup SYN packet, as suggested from the RFC on the first timeout. For users explicitly not wanting this which can be in DC use case, a net.ipv4.tcp_ecn_fallback knob is added that allows for disabling the fallback commit

  • switchdev: Add VLAN dump support to switchdev port's bridge_getlink. iproute2 "bridge vlan show" cmd already knows how to show the vlans installed on the bridge and the device , but (until now) no one implemented the port vlan part of the netlink message (merge)

  • Add a netdev driver for GENEVE (GEneric NEtwork Virtualization Encapsulation) tunnels. It allows one to create geneve virtual interfaces that provide Layer 2 Networks over Layer 3 Networks. GENEVE is often used to tunnel virtual network infrastructure in virtualized environments. For more information see http://tools.ietf.org/html/draft-gross-geneve-02 (merge)

  • ieee802154: adds transmission power setting support for IEEE-802.15.4 devices via nl802154 commit

  • Export the value of the linkdown sysctl to netconf commit

  • IPv6: IPv6 flow labels have been an unmitigated disappointment thus far. Support in HW devices to use them for ECMP is lacking, and OSes don't turn them on by default. This release splits the flow label space into two ranges: 0-7ffff is reserved for flow label manager, 80000-fffff will be used for creating auto flow labels (per RFC6438). This should give Linux a path to enabling auto flow labels by default for all IPv6 packets. It can be disabled with sysctl flowlabel_state_ranges commit

  • RDS
    • Add getsockopt/setsockopt support for SO_RDS_TRANSPORT commit, commit

  • unix sockets
    • Support SCM_SECURITY for stream sockets commit

    • Implement splice for stream af_unix sockets commit

    • Implement stream sendpage support commit

  • IPv4: sysctl option (ignore_routes_with_linkdown) to ignore routes when nexthop link is down commit

  • net scheduler: run ingress qdisc without locks commit

  • net scheduler :gred: Add a TCA_GRED_LIMIT attribute to set the GRED queue limit, in bytes, during qdisc setup commit

  • Implement extended console support commit

  • packet: add rollover statistics commit

  • vlan: Add GRO support for non hardware accelerated vlan commit

  • netfilter
    • Add netfilter ingress hook, this allows to classify packets from ingress using the Netfilter infrastructure commit

    • nf_tables: add netdev table. It allows to create netdev tables that contain ingress chains. It provides access to the existing nf_tables features from the ingress hook commit

    • xt_MARK: Add ARP support commit

  • nl802154
    • Add support for dumping information about the current cca ed level commit

    • Add support for dump phy capabilities commit

    • Add support to set the cca ed level commit

  • openvswitch: If new optional attribute OVS_USERSPACE_ATTR_ACTIONS is added to an OVS_ACTION_ATTR_USERSPACE action, then include the datapath actions
    • in the upcall to userspace commit

  • pktgen: introduce xmit_mode '<start_xmit|netif_receive>' commit, add benchmark script pktgen_bench_xmit_mode_netif_receive.sh commit, add sample script pktgen_sample01_simple.sh commit, add sample script pktgen_sample02_multiqueue.sh commit, add sample script pktgen_sample03_burst_single_flow.sh commit,

  • tipc: add broadcast link window set/get to nl api commit

  • tipc: improve link congestion algorithm commit

  • bonding
    • Add netlink support for sys prio, actor sys mac, and port key, until now they were only exported via bond's proc entry commit

    • Allow userspace to set actors' macaddr in an AD-system. commit

    • Allow userspace to set actors' system_priority in AD system commit

    • Implement user key part of port_key in an AD system. commit

  • bridge: allow setting hash_max + multicast_router if interface is down commit

  • Adds an optional ce_threshold to codel & fq_codel qdiscs, so that DCTCP can have feedback from queuing in the host commit

  • Add TCPWinProbe and TCPKeepAlive SNMP counters commit

  • wireless
    • Add TX fastpath: add a "fast-xmit" cache that will cache the data frame 802.11 header and other data to be able to build the frame more quickly commit, commit, commit, commit, commit, commit, commit

    • cfg80211: allow the plink state blocking for user managed mesh commit

  • Bluetooth
    • hci_core/mgmt: Introduce multi-adv list commit

    • mgmt: program multi-adv on power on commit

  • NFC: netlink: Implement vendor-specific command support commit

12. List of pull requests

13. Other news sites

KernelNewbies: Linux_4.2 (last edited 2017年12月30日 01:30:22 by localhost)

AltStyle によって変換されたページ (->オリジナル) /