Making Linux a World-Class Enterprise Server OS
Chuck Lever, Netscape Communications Corp.
chuckl@netscape.com
$Id: enterprise.html,v 1.4 1999年11月12日 20:12:54 cel Exp $
Abstract
We provide a list of issues that remain to be resolved
which will help Linux become a world-class enterprise
server OS.
These items could help make up Netscape's technical agenda for
Linux.
This document is Copyright © 1999
Netscape Communications Corp.,
all rights reserved.
Trademarked material referenced in this document is copyright
by its respective owner.
Introduction
Linux is an open-source POSIX-compliant operating system that runs
on commodity Intel PC hardware.
Linux also happens to be very stable, crashing far less than other
operating systems in its class.
As such, it is one of the most widely deployed operating systems
supporting network services such as mail and web servers.
Despite high acclaim and ultra-stability, Linux remains a work in progress.
There are several aspects of Linux that can be improved to better
position it for enterprise service.
In this document, we provide a list of issues that remain to be resolved
which will help Linux become a world-class enterprise server OS.
These items could help make up Netscape's technical agenda for
Linux.
The Issues
The general areas in which we are interested are:
-
Reliability -
improving system recovery mechanisms, backup/restore, and fault-tolerance.
-
Performance -
how close applications running on Linux can get to optimal hardware speed,
especially applications that require high performance; support for
high-performance hardware like RAID and gigabit networking such as ATM
and Gbit ethernet.
-
Scalability -
improving system throughput, overload characteristics, relieving
architectural constraints, enhancing administration of large installations.
-
Security -
improving imperviousness to network and local attacks, reducing or
eliminating the risk of buffer overflows, continuous security testing
of all bundled applications and utilities.
-
Standards compliance -
network implementations should be well-behaved; useful and common APIs
should maintain standards compliance (e.g. POSIX).
-
Quality Assurance -
reducing defect rate and defect re-introduction.
Specific Technical Suggestions
Issue
Explanation
Support for large memory configurations
Proper support for 1G to 4G physical RAM (Intel CPU limitations aside,
the kernel can automatically do the *correct* limited thing);
in the long term, full OS support for 36-bit addresses on hardware
that supports them. This isn't an issue for hardware architectures
that already support 64-bit addresses.
Improved SMP scalability
Scheduler should scale to large number of threads/processes.
System should scale well with additional CPUs.
64-bit file lengths
Internal and application programming interfaces need to be
ready for 64-bit file lengths. File system administration
and backup tools need to handle 64-bit file lengths.
SCSI I/O throughput
Improved reliability for tapes and esoteric devices;
support for wide range of modern SCSI chipsets and devices.
SCSI drivers need to support greater level of concurrent I/O.
TCP throughput and standards compliance
TCP is a complex and ever-evolving protocol.
Bugs and performance problems are easily introduced
during the development process.
high performance asynchronous I/O APIs
Current RT signal API used to support async I/O does not integrate
well with threads.
A queued I/O API or kernel-mediated event dispatching system that
supports UI events as well as file descriptor wakeup would be very cool.
Functional and performance validation testing
Regular functional testing and performance validation would
help catch significant bugs early, and would also allow maintenance
of a history of system performance improvements for catching performance
creep.
High performance/reliability file system
RAID-enabled support for high data resiliency with high performance
in file systems used by network servers.
System performance monitoring
This includes support for tools like iostat, vtop, and sard.
Fault toleration
Support for loss of CPU in SMP config;
proper reconfiguration during card or memory failures.
Large site administration
Support for things such as secure java consoles,
remote administration like cfengine, security features like TripWire;
creating coherent and integrated system documentation.
Constraint relief
Support for 32-bit UIDs, large number of file descriptors and fdsets,
large number of threads/processes, size of shared segments and swap areas,
kernel I/O concurrency, and so on.
Quality assurance and piloting arenas
Informal now, but can we demonstrate that a more formal process
can have positive effect?
What is, for example, Red Hat doing in this regard?
Internationalization
Internationalization of kernel and utilities
(I know, it's a difficult and not very sexy job, but somebody's
got to do it).
Security, security, security
Eliminating buffer overflows, eliminating TCP vulnerabilities;
including ssh in popular Linux distributions;
including good security documentation.
High-end networking and disk subsystems
Support Gigabit ethernet and ATM; explore zero-copy I/O;
look at support for RAID and beyond.
Servicability enhancements
Support for IPMI and other advanced hardware monitoring;
Improving Oops and dump tracing, and configuration snapshotting
for "first-time capture" of significant system problems.
Constraint relief for large number of users
Support for 32-bit UIDs, high efficiency password lookup;
look into quota support, utmp, utmpx, wtmp.
Complete support for serial console
Linux has much of this already, but work needs to be done
to get server hardware ready to support this.
Support for Name Service caching
Domain Name Service performance and reliability is critical
to network server performance. Solaris, for example,
uses nscd to help ensure that DNS and yp/NIS+ performs well.
Support for "direct I/O"
For example, locking down user pages, then do scatter/gather DMA I/O
without another copy, or support for PCI->PCI I/O through main memory
without CPU intervention;
closer to zero-copy behavior across kernel and device drivers.
See IO-Lite or McVoy's splice() paper.
Prioritization of standards compliance
Standards compliance is important,
but not to the exclusion of alternatives that can co-exist
in the system API and provide better functionality and performance.
Someone Else's Agenda
For completeness, we list some examples of areas that don't effect
the enterprise-worthiness of Linux.
These are on someone else's agenda.
-
Small system performance
-
Support for embedded systems
and esoteric processors
-
Interactive improvements (e.g. interactive responsiveness, GUI improvements
like GNOME, KDE, or other alternative window management)
-
Sound support
-
Support for VFAT, HPFS, and other legacy workstation file systems
-
Support for low-bandwidth networking like packet radio, ISDN, PPP, and SLIP
-
Support for portable computers (e.g. IrDA, DHCP client, or roaming support)
This document was written as part of the Linux Scalability Project.
For more information, see
our home page.
If you have comments or suggestions, email
linux-scalability@citi.umich.edu