Libvirt: Use the virtlogd deamon
If the *serial console* feature is enabled on a compute node with ``[serial_console].enabled = True`` it deactivates the logging of the boot messages. From a REST API perspective, this means that the two APIs ``os-getConsoleOutput`` and ``os-getSerialConsole`` are mutually exclusive. Both APIs can be valuable for cloud operators in the case when something goes wrong during the launch of an instance. This blueprint wants to lift the XOR relationship between those two REST APIs. blueprint libvirt-virtlogd Change-Id: I9a1fbf005b0f48df90093a346579d9ddc64f7846
This commit is contained in:
1 changed files with 310 additions and 0 deletions
310
specs/newton/approved/libvirt-virtlogd.rst
Normal file
310
specs/newton/approved/libvirt-virtlogd.rst
Normal file
@@ -0,0 +1,310 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
================================
Libvirt: Use the virtlogd deamon
================================
https://blueprints.launchpad.net/nova/+spec/libvirt-virtlogd
If the *serial console* feature is enabled on a compute node with
``[serial_console].enabled = True`` it deactivates the logging of the
boot messages. From a REST API perspective, this means that the two APIs
``os-getConsoleOutput`` and ``os-getSerialConsole`` are mutually exclusive.
Both APIs can be valuable for cloud operators in the case when something
goes wrong during the launch of an instance. This blueprint wants to lift
the XOR relationship between those two REST APIs.
Problem description
===================
The problem can be seen in the method ``_create_serial_console_devices``
in the libvirt driver. The simplified logic is::
def _create_serial_console_devices(self, guest, instance, flavor,
image_meta):
if CONF.serial_console.enabled:
console = vconfig.LibvirtConfigGuestSerial()
console.type = "tcp"
guest.add_device(console)
else:
consolelog = vconfig.LibvirtConfigGuestSerial()
consolelog.type = "file"
guest.add_device(consolelog)
This ``if-else`` establishes the XOR relationship between having a log of
the guest's boot messages or getting a handle to the guest's serial console.
From a driver point of view, this means getting valid return values for the
method ``get_serial_console`` or ``get_console_output`` which are used to
satisfy the two REST APIs ``os-getConsoleOutput`` and ``os-getSerialConsole``.
Use Cases
----------
From an end user point of view, this means that, with the current state, it
is possible to get the console output of an instance on host A (serial console
is not enabled) but after a rebuild on host B (serial console is enabled) it
is not possible to get the console output. As an end user is not aware of the
host's configuration, this could be a confusing experience. Written that down
I'm wondering why the serial console was designed with a compute node scope
and not with an instance scope, but that's another discussion I don't want to
do here.
After the implementation, deployers will have both means by hand if there is
something wrong during the launch of an instance. The persisted log in case
the instance crashed AND the serial console in case the instance launched but
has issues, for example a failed establishing of networking so that SSH access
is not possible. Also, they will be impacted with a new dependency on the
hosts (see `Dependencies`_).
Developers won't be impacted.
Proposed change
===============
I'd like to switch from the log file to the ``virtlogd`` deamon. This logging
deamon was announced on the libvirt ML [1] and is available with libvirt
version 1.3.3 and Qemu 2.6.0. This logging deamon handles the output from the
guest's console and writes it into the file
``/var/log/libvirt/qemu/guestname-serial0.log`` on the host but
truncates/rotates that log so that it doesn't exhaust the hosts disk space
(this would solve an old bug [3]).
Nova would generate::
<serial type="tcp">
<source mode="connect" host="0.0.0.0" service="2445"/>
<log file="/var/log/libvirt/qemu/guestname-serial0.log" append="on"/>
<protocol type="raw"/>
<target port="1"/>
</serial>
For providing the console log data, nova would need to read the console
log file from disk directly. As the log file gets rotated automatically
we have to ensure that all necessary rotated log files get read to satisfy
the upper limit of the ``get_console_output`` driver API contract.
FAQ
---
#. How is the migration/rebuild handled? The 4 cases which are possible
(based on the node's patch level):
#. ``N -> N``: Neither source nor target node is patched. That's what
we have today. Nothing to do.
#. ``N -> N+1``: The target node is patched, which means it can make
use of the output from *virtlogd*. Can we "import" the existing log
of the source node into the *virtlogd* logs of the target node?
A: The guest will keep its configuration from the source host
and don't make use of the *virtlogd* service until it gets rebuilt.
#. ``N+1 -> N``: The source node is patched and the instance gets
migrated to a target node which cannot utilize the *virtlogd*
output. If the serial console is enable on the target node, do
we throw away the log because we cannot update it on the target
node
A: In the case of migration to an old host, we try to copy the
existing log file across, and configure the guest with the
``type=tcp`` backend. This provides ongoing support for interactive
console. The log file will remain unchanged if possible. A failed
copy operation should not prevent the migration of the guest.
#. ``N+1 -> N+1``: Source and target node are patched. Will libvirt
migrate the existing log from the source node too, which would
solve another open bug [4].
#. Q: Could a stalling of the guest happen if *nova-compute* is reading the
log file and *virtlogd* tries to write to the file but is blocked?
A: No, *virtlogd* will ensure things are fully parallelized
#. Q: The *virtlogd* deamon has a ``1:1`` relationship to a compute node.
It would be interesting how well it performs when, for example,
hundreds of instances are running on one compute node.
A: We could add a I/O rate limit to *virtlogd* so it refuses to read data
too quickly from a single guest. This prevents a single guest DOS'ing
the host.
#. Q: Are there architecture dependencies? Right now, a nova-compute node on a
s390 architecture depends on the *serial console* feature because it
cannot provide the other console types (VNC, SPICE, RDP). Which means it
would benefit from having both.
A: No architecture dependencies.
#. Q: How are restarts of the *virtlogd* deamon handled? Do we lose
information in the timeframe between stop and start?
A: The *virtlogd* daemon will be able to re-exec() itself while keeping
file handles open. This will ensure no data loss during restart of
*virtlogd*.
#. Q: Do we need a version check of libvirt to detect if the *virtlodg* is
available on the host? Or is it sufficient to check if the folder
``/var/log/virtlogd/`` is present?
A: We will do a version number check on libvirt to figure out if it is
capable to use it.
Alternatives
------------
#. In case where the *serial console* is enabled, we could establish a
connection to the guest with it and execute ``tail /var/log/dmesg.log``
and return that output in the driver's ``get_console_output`` method which
is used to satisfy the ``os-getConsoleOutput`` REST API.
**Counter-arguments:** We would need to save the authentication data to
the guest, which would not be technically challenging but the customers
could be unhappy that Nova can access their guests at any time. A second
argument is, that the serial console access is blocking, which means
if user A uses the serial console of an instance, user B is not able to do
the same.
#. We could remove the ``if-else`` and create both devices.
**Counter-arguments:** This was tried in [2] and stopped because this could
introduce a backwards incompatibility which could prevent the rebuild
of an instance. The root cause for this was, that there is an upper bound
of 4 serial devices on a guest, and this upper bound could be exceeded if
an instance which already has 4 serial devices gets rebuilt on a compute
node which would have patch [2].
Data model impact
-----------------
None
REST API impact
---------------
None
Security impact
---------------
None
Notifications impact
--------------------
None
Other end user impact
---------------------
None
Performance Impact
------------------
None
Other deployer impact
---------------------
* The *virtlogd* service has to run for this functionality and should be
monitored.
* This would also solve a long-running bug which can cause a host disc space
exhaustion (see [3]).
Developer impact
----------------
None
Implementation
==============
Assignee(s)
-----------
Primary assignee:
Markus Zoeller (https://launchpad.net/~mzoeller)
Work Items
----------
* (optional) get a gate job running which has the *serial console* activated
* add version check if libvirt supports the *virtlogd* functionality
* add "happy path" which creates a guest device which uses *virtlogd*
* ensure "rebuild" uses the new functionality when migrated from an old host
* add reconfiguration of the guest when migrating from N+1 -> N hosts
to keep backwards compatibility
Dependencies
============
* Libvirt 1.3.3 which brings the *libvirt virtlod logging deamon* as
described in [1].
* Qemu 2.6.0
Testing
=======
The tempest tests which are annotated with
``CONF.compute_feature_enabled.console_output`` will have to work with
a setup which
* has the dependency to the *virtlogd deamon* resolved.
* AND has the serial console feature enabled (AFAIK there is not job right
now which has this enabled)
* A new functional test for the live-migration case has to be added
Documentation Impact
====================
None
References
==========
[1] libvirt ML, "[libvirt] RFC: Building a virtlogd daemon":
http://www.redhat.com/archives/libvir-list/2015-January/msg00762.html
[2] Gerrit; "libvirt: use log file and serial console at the same time":
https://review.openstack.org/#/c/188058/
[3] Launchpad; " console.log grows indefinitely ":
https://bugs.launchpad.net/nova/+bug/832507
[4] Launchpad; "live block migration results in loss of console log":
https://bugs.launchpad.net/nova/+bug/1203193
[5] A set of patches on the libvirt/qemu ML:
* [PATCH 0/5] Initial patches to introduce a virtlogd daemon
* [PATCH 1/5] util: add API for writing to rotating files
* [PATCH 2/5] Import stripped down virtlockd code as basis of virtlogd
* [PATCH 3/5] logging: introduce log handling protocol
* [PATCH 4/5] logging: add client for virtlogd daemon
* [PATCH 5/5] qemu: add support for sending QEMU stdout/stderr to virtlogd
[6] libvirt ML, "[libvirt] [PATCH v2 00/13] Introduce a virtlogd daemon":
https://www.redhat.com/archives/libvir-list/2015-November/msg00412.html
History
=======
.. list-table:: Revisions
:header-rows: 1
* - Release Name
- Description
* - Newton
- Introduced
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.