A lot of the most valuable information for engineers comes from experience. The basics are easily found in data sheets, applications notes, and text books. The hard things are learned by failures. While unfortunate, they will happen. What is critical, is that they do not happen again. Indeed, that is part of the motivation for the Office of Logic Design (OLD) and forms the basis for a lot of the work that is published and disseminated.
Below is the link to the NASA Lessons Learned Information System. It is quite general. This web site focuses on the design, analysis, verification, and test of digital systems, a relatively narrow focus.
You can help by submitting the lessons that you have learned on your job, helping to prevent an accident later. When knowledge isn't shared across our organizations, accidents do happen. NASA, in 1999, lost a small satellite mission (75,000,000ドル.02) since the engineers were not aware of an already learned lesson.
If you can contribute a lesson, please take a few minutes to do so. A formal write-up isn't necessary as we will work with whatever information you supply and produce an application note, white paper, or whatever is appropriate. Of course, care will be taken to respect those who submit the lessons and they will not be identified, unless they wish to be, as the important thing is the what, how, and why, not the who.
Lessons on any digital engineering topic will be accepted. These include hardware such as processors, memories, FPGAs, ASICs, PALs, or other microcircuits and their packaging and application. Also, bugs, "work arounds," or other considerations for various computer aided engineering software tools are valuable. Checklists, review criteria, and similar material to aid in analyses and reviews are also desired. Lastly, DPA results, reliability studies, and related items are of use to this community.
To submit information, please contact:
Richard B. Katz
NASA Goddard Space Flight Center
Head, Office of Logic Design
richard.b.katz@nasa.gov
Tel: (301) 286-9705
Fax: (301) 286-0220
Apollo 13 Guidance, Navigation, and Control Challenges
John L. Goodman, United Space Alliance
AIAA Space 2009 Conference & Exposition
September 14-17, 2009
Pasadena, California
AIAA 2009-6455
Abstract: Combustion and rupture
of a liquid oxygen tank during the Apollo 13 mission provides lessons and
insights for future spacecraft designers and operations personnel who may
never, during their careers, have participated in saving a vehicle and crew
during a spacecraft emergency. Guidance, Navigation, and Control (GNC)
challenges were the reestablishment of attitude control after the oxygen
tank incident, re-establishment of a free return trajectory, resolution of a
ground tracking conflict between the LM and the Saturn V S-IVB stage,
Inertial Measurement Unit (IMU) alignments, maneuvering to burn attitudes,
attitude control during burns, and performing manual GNC tasks with most
vehicle systems powered down. Debris illuminated by the Sun and gaseous
venting from the Service Module (SM) complicated crew attempts to identify
stars and prevented execution of nominal IMU alignment procedures. Sightings
on the Sun, Moon, and Earth were used instead. Near continuous
communications with Mission Control enabled the crew to quickly perform time
critical procedures. Overcoming these challenges required the modification
of existing contingency procedures.
Best Practices for Researching and Documenting Lessons Learned
NASA/CR–2008–214777
John Goodman
United Space Alliance
Houston, Texas 77058
Introduction (excerpt)
Identification, resolution, and avoidance of technical
and programmatic issues are important for ensuring safe and successful space
missions. Although the importance of applying lessons learned to reduce risk
is frequently stressed, there is little material available to help technical
and management personnel research and document lessons learned. Collecting,
researching, identifying, and documenting lessons learned that will be
useful to current and future management and engineering personnel is not
always a straightforward task. This white paper presents lessons learned and
best practices concerning the research and documentation of technical and
organizational lessons learned. It is intended to enable organizations to
initiate or improve lessons learned research and documentation efforts.
The content of this white paper is based on four
technical lessons learned projects:
Learning from Other People's Mistakes
Paul Cheng and Patrick Smith
Abstract
Most satellite mishaps stem from engineering mistakes. To prevent the same
errors from being repeated, Aerospace has compiled lessons that the space
community should heed.
Lessons Learned From Seven Space Shuttle Missions
NASA/CR–2007–213697 Introduction (excerpt)
John Goodman
United Space Alliance
Houston, Texas 77058
Incidents resulting in loss of life or loss of spacecraft drive thorough investigation by independent boards and publication of accident reports. Much can be learned from well-written descriptions of the technical and organizational factors that lead to an accident (Challenger, Columbia). Subsequent analysis by third parties of investigation reports and associated evidence collected during the investigations can lead to additional insight.3-7 Much can also be learned from documented close calls that do not result in loss of life or a spacecraft, such as the Mars Exploration Rover Spirit software anomaly, the SOHO mission interruption, and the NEAR burn anomaly. Seven space shuttle incidents discussed in this paper fall into the latter category:
Rendezvous Target Failure On STS-41B
Rendezvous Radar Anomaly and Trajectory Dispersion On STS-32
Rendezvous Lambert Targeting Anomaly on STS-49
Rendezvous Lambert Targeting Anomaly Before STS-51
Zero Doppler Steering Maneuver Anomaly Before STS-59
Excessive Propellant Consumption During Rendezvous On STS-69
Global Positioning System Receiver and Associated Shuttle Flight Software Anomalies on STS-91
Three Years of Global Positioning System Experience on International Space
Station
Susan Gomez
NASA Johnson Space Center
NASA/TP–2006–213168
August 2006
Abstract
The International Space Station Global Positioning System (GPS)
receiver was activated in April 2002. Since that time, numerous software
anomalies surfaced that had to be worked around. Some of the software
problems required waivers, such as the time function, while others required
extensive operator intervention, such as numerous power cycles. Eventually,
enough anomalies surfaced that the three pieces of code included in the GPS
unit have been rewritten and the GPS units were upgraded. The technical
aspects of the problems are discussed, as well as the underlying causes that
led to the delivery of a product that has had numerous problems. The
technical aspects of the problems included physical phenomena that were not
well understood, such as the affect that the ionosphere would have on the
GPS measurements. The underlying causes were traced to inappropriate use of
legacy software, changing requirements, inadequate software processes,
unrealistic schedules, incorrect contract type, and unclear ownership
responsibilities.
Maintainability of Unmanned Planetary Spacecraft: A JPL Perspective
P. Kobele, JPL
AIAA/NASA Symposium on the Maintainability of Aerospace Systems
July 26-27, 1989, Anaheim, CA
AIAA-89-5070
kobele_1989.pdf
Abstract
The requirements for mission success in unattended environments which do not
allow direct repair of spacecraft faults have posed significant challenges
in the areas of spacecraft design and mission operations. These challenges
have resulted in innovative design requirements and implementation
approaches intended to maximize the likelihood of being able to reconfigure
the spacecraft to accommodate any of a myriad of spacecraft faults.
Autonomous fault detection and correction algorithms and the mission
operations elements of recent JPL interplanetary projects have been able to
utilize these design features in their operational strategies to recover the
spacecraft from what might have been mission terminating occurrences and to
allow continuation of essentially undegraded missions.
NASA Advisory NA-GSFC-2006-01
January 12, 2006
Summary
In both FPGA and EEPROM device
applications, the realization of past parts issues was delayed, since the
failure rate was low. Failures in non-flight parts are not always treated with
the same rigor as failures in flight qualified devices. Additionally,
proprietary and stove-piped information barriers, along with a cultural
resistance to discussing failures, prevent the user community from pooling their
data collectively, observing trends, and “connecting the dots.” Together,
this had led to delays in manufacturers improving their parts, processes, and
software.
NASA GSFC kindly requests other NASA and
non-NASA programs and projects to share with the Advisory Technical Point of
Contact (see block 13) all DPA and Failure Reports on FPGAs and non-volatile
memory devices, from both flight and engineering model usage along with lessons
learned that can benefit the community. Note that prior to dissemination
on the NASA Office
of Logic Design web site, appropriate care (i.e. deleting items such
as contractor names) will be taken.
Jonathan F. Binkley, Paul G. Cheng, Patrick L. Smith, and William F.
Tosney
The Aerospace Corporation
First International Forum on Integrated
System Health Engineering and Management in Aerospace
November 7-10, 2005
Napa, California, USA
Abstract
The Aerospace Corporation extracts lessons learned from launch vehicle
and satellite anomalies to help the space community avoid repetition of mishaps.
Incorporated in reports to industry, program reviews, and journal publications,
the lessons lend themselves to influence new acquisition guidelines and military
specifications. Government and the commercial space communities, which share a
common interest in quality improvement, should work together to establish more
comprehensive and effective approaches to developing and disseminating lessons
learned.
Knowledge Capture and
Management for Space Flight Systems
John L. Goodman
United Space Alliance
October 2005
NASA/CR-2005-213692
Introduction
The incorporation of knowledge capture and knowledge management
strategies early in the development phase of an exploration program is necessary
for safe and successful missions of human and robotic exploration vehicles over
the life of a program. Following the transition from the development to the
flight phase, loss of underlying theory and rationale governing design and
requirements occur through a number of mechanisms. This degrades the quality of
engineering work resulting in increased life cycle costs and risk to mission
success and safety of flight. Due to budget constraints, concerned personnel in
legacy programs often have to improvise methods for knowledge capture and
management using existing, but often sub-optimal, information technology and
archival resources. Application of advanced information technology to perform
knowledge capture and management would be most effective if program wide
requirements are defined at the beginning of a program.
John L. Goodman
United Space Alliance
November 2005
NASA/CR-2005-213693
Preface (excerpt)
This document is a collection of writings concerning the application of Global
Positioning System (GPS) technology to the International Space Station (ISS),
Space Shuttle, and X-38 vehicles. An overview of how GPS technology was applied
is given for each vehicle, including rationale behind the integration
architecture, and rationale governing the use (or non-use) of GPS data during
flight. For the convenience of the reader, who may not be interested in specific
details of the ISS, Shuttle and X-38 applications, the lessons learned chapter
is at the beginning of the document. Most of this material can be understood
without reading the sections specific to the ISS, Shuttle and X-38.
George M. Low
NASA Manned Spacecraft Center
AIAA 6th Anual Meeting and Technical Display
Anaheim, California, October 20-24, 1969
Abstract
The flawless performance of the five manned Apollo flights is attributed to
reliable hardware; thoroughly planned and executed flight operations; and
skilled, superbly trained crews. Major factors contributing to spacecraft
reliability are simplicity and redundancy in design; major emphasis on tests; a
disciplined system of change control; and closeout of all discrepancies.
In the Apollo design, the elimination of complex interfaces between major
hardware elements was also an important consideration. The use of man, in
flying and operating the spacecraft, evolved during the course of the program,
with a tendency to place more reliance on automatic systems; however, the
capability for monitoring and manual takeover was always maintained. The
spacecraft test effort was increased during the 18 months preceding the first
manned flight with emphasis on environmental acceptance testing. This test
method screened out a large number of faulty components prior to installation.
Knowledge Capture and Management - Key To Ensuring Flight Safety and Mission
Success
John L. Goodman
United Space Alliance
AIAA Space 2005 Conference
Long Beach, CA, August 30 - September 1, 2005.
Copyright ゥ 2005 by United Space Alliance,
LLC. These materials are sponsored by the National Aeronautics and Space
Administration under Contract NAS9-20000. The U.S. Government retains a
paidup, nonexclusive, irrevocable worldwide license in such materials to
reproduce, prepare derivative works, distribute copies to the public, and
perform publicly and display publicly, by or on behalf of the U.S.
Government. All other rights are reserved by the copyright owner.
Abstract
The incorporation of knowledge capture and knowledge management strategies early
in the development phase of an exploration program is necessary for safe and
successful missions of human and robotic exploration vehicles over the life of a
program. Following the transition from the development to the flight phase, loss
of underlying theory and rationale governing design and requirements occur
through a number of mechanisms. This degrades the quality of engineering work
resulting in increased life cycle costs and risk to mission success and safety
of flight. Due to budget constraints, concerned personnel in legacy programs
often have to improvise methods for knowledge capture and management using
existing, but often sub-optimal, information technology and archival resources.
Application of advanced information technology to perform knowledge capture and
management would be most effective if program wide requirements are defined at
the beginning of a program.
Product Assurance Program Planning -
Some Lessons Learned from Apollo
Gerald Sandler, Grumman Aerospace Corporation
AIAA Paper No. 72-247
AIAA Man's Role in Space Conference
Cocoa Beach, Florida, March 27-28, 1972
Summary
Over the past decade we have developed the technical and programmatic approaches needed to provide the levels of reliability required for manned space missions. The combination of design, test and control or assurance programs used on Apollo have proven very effective. In the design approach we have learned how to minimize the number of potential single-point failures that could result in mission failure. In test and product assurance areas, screens and controls were established that effectively prevented a. latent defect from filtering through the system and occurring in flight. The cost of these combined efforts, however, have been a large percentage of total program costs. The challenge of this decade, I believe, is how to achieve the same or improved levels of reliability at lower program costs.
The area of primary concentration, at this time, should be failures that are "human oriented" rather than "design oriented". Our engineering techniques have gone a long way in reducing the latter problem area. on Apollo half or more of the failures that occurred in the test programs were classified as workmanship, procedural or quality-oriented problems. We have learned how to screen them out by test; what we have to do now is to prevent them from occurring or catch them earlier. In addition, recognizing that failures will always occur in our test programs, the cost challenge is to design units and systems for maintain- ability, rework and proper isolation, so that we can minimize the extent of retesting for adequate confidence.
REVIEW OF LESSONS LEARNED IN THE MERCURY PROGRAM RELATIVE TO SPACECRAFT
DESIGN AND OPERATIONS
F.J. Bailey, Jr., NASA Manned Space
Center
AIAA Space Flight Testing Conference
Cocoa Beach, Florida, March 18-20, 1963
Introduction
The papers presented so far in this session have described specific
measures taken in preparing the launch vehicle and spacecraft for Mercury
missions. the purpose of the present paper is to review, in somewhat
more general terms, some of the more significant lessons learned in the
Mercury program, to see where changes or additional measures may be
desirable in future programs.
The lessons that have been learned fall broadly into two main
areas, the first applying to program planning, the second to detailed
design.
When Spacecraft Won't Point XL
Christopher D. Hall
2003 AAS/AIAA Astrodynamics Specialists Conference
Big Sky, Montana, August 2003
Paper # AAS 03-505
The Spacecraft Attitude Dynamics and Control course at Virginia Tech is primarily taken by juniors as an alternative to the aircraft stability and control course. Such a course can be taught in many different ways. On one extreme, one could invoke the powerful machinery of geometric mechanics, including the momentum map, so(3), SO(3), cotangent bundles and symplectic manifolds. At the other extreme, one might use a handbook with convenient sizing formulas for designing ADCS hardware. Somewhere in between these extreme approaches lie the approaches used in most courses. In any case, students can better appreciate the significance of the selected topics covered if they are provided with concrete examples. One particularly interesting type of example is the ADCS failure or anomaly, especially where a failure is caused by the same type of error that the students are being asked to understand and not make.
ACTS PYRO Separation Band Anomaly (Shuttle Orbiter)
NASA PLSS #0312
Abstract
Minor damage to the Shuttle was caused when the firing of the primary
explosive cord to deploy the payload from the cargo bay also triggered the
backup cord. End-to-end system tests had validated the erroneous design
rather than the end function. Document electrical-mechanical interfaces,
protect hazardous systems against any possible unintended operation, and
consider use of a single cord configuration.
Collective Knowledge Gained from Gemini
Charles W. Mathews
NASA Manned Spacecraft Center
AIAA Paper No. 66-1027
AIAA Third Annual Meeting, Boston, MA, Nov. 29-Dec. 2, 1966.
Summary
The Gemini Program has comprised 12 space flights, 10 of which were manned
operations. The information gained is difficult to summarize within
a brief paper, but more detailed information has and will continue to be
made available to those who have an interest in it. With minor
exceptions, the objectives of the program were met, having been expanded
well beyond original concepts and examined in considerably more depth than
expected. Gemini leaves a legacy of results that, hopefully, will
further accelerate man's efforts to explore and utilize the frontier of
space.
Summary of Gemini
Rendezvous Experience
Glynn S. Lunney
NASA Manned Spacecraft Center
AIAA Paper No. 67-272
AIAA Flight Test, Simulation and Support Conference
Cocoa Beach, Florida, Feb. 6-8, 1967
Abstract
A significant portion of the Gemini program was devoted to the
rendezvous problem. One of the major objectives was to establish a base
of operational experience and confidence in the required techniques. In
this paper, the planning and flight test cycle is reviewed to provide an
outline of the Gemini results. Many various considerations were studied
and several of the more important factors are discussed as to their
influence on the different choices and subsequent operations. The flight
test results are summarized according to technique and performance such
as propellant costs, satisfaction of conditions, et cetera. Overall, the
conclusion is that the base of experience has been established, the
rendezvous sequence is practical, the systems and the management of
these systems have been satisfactory in accuracy and performance.
Further study and a continued, detailed preparation will be the key to
the future uses of rendezvous.
NASA TECHNICAL MEMORANDUM
NASA TM X-64860
July 1974
Abstract
Key lessons learned during the Skylab Program that could have impact
on on-going and future programs are presented. They present early and
sometimes subjective opinions; however, they give insights into key areas of
concern. These experiences from a complex space program management and space
flight serve as an early assessment to provide the most advantage to
programs underway. References to other more detailed reports are provided
for the individual's specific area of interest.
Lessons learned on the Skylab program (JSC) - 1974
NASA-TM-X-72920 FOREWORD
The lessons learned in the Skylab Program are described in five basic documents prepared by and representing the experience of NASA Headquarters, the Lyndon B. Johnson Space Center, the John F. Kennedy Space Center, and the Skylab and Saturn Program Offices at the George C. Marshall Space Flight Center. The documents are intended primarily for use by persons who are familiar with the disciplines covered and who are involved in other programs. Thus, the individual lessons are brief rather than detailed.
Authors of the lessons have been encouraged to be candid. The reader may detect apparent differences in approach in some areas, illustrating that equally effective management action in a particular area frequently can be accomplished by several approaches.
The recommendations and actions described are not necessarily the only or the best approaches, but they reflect Skylab experience that must be tailored to other situations and should be accepted by the reader as one input to the management decision making process. As such, these recommendations, which are based on approaches that were found to be effective in the Skylab Program, should be used to help identify potential problems of future space programs. Many of the lessons are subjective and represent individual opinions and should not be interpreted as official statements of NASA positions or policies.
In addition to the Skylab Lessons Learned documents, Skylab Mission Evaluation Reports are being issued by the previously mentioned NASA agencies to provide detailed evaluation results. The results of the scientific experiments will be disseminated by the Principal Investigators.
Gemini: Mercury Experience Applied
Jerome B. Hammack and Walter J. Kapryan
NASA - Manned Spacecraft Center
Houston, Texas
gemini_mercury_experience.pdf
Introduction
It is the intent of this paper to show how the Gemini
program has attempted to draw upon and profit from Mercury experience.
The Gemini Project has evolved as a NASA space program
with its prime mission of providing a flexible space system that will enable
us to gain proficiency in manned space flight and to develop new techniques
for advanced flights, including rendezvous. To achieve these
objectives, we must have a space vehicle with substantially greater
capability than the Mercury spacecraft. This increased capability will
include provisions for two men, instead of one, as in the Mercury spacecraft
and for space missions of up to two weeks' duration. It is the intent
of the Gemini Project to build upon the experience gained from Mercury so
that most of the energies of the new program can be devoted to the solution
of the problems associated with achieving its primary mission objectives and
not have to fight its way through a swelter of old problems.
Lessons Learned but Forgotten from the Space Shuttle Challenger Accident
Allan J. McDonald, ATK Thiokol Propulsion (Retired)
Space 2004 Conference and Exhibit
September 28-30, 2004, San Diego, California
AIAA 2004-5830
macdonald_2004.pdf
Abstract
At the time of the Challenger accident, I was the Director of the
Space Shuttle Solid Rocket Motor Project for Morton Thiokol Inc.. The cause
of the failure and the controversy surrounding the decision to launch the
Challenger in such cold weather is discussed in detail in the Presidential
Commission's Report on the Challenger Accident. The Challenger was launched
at 16:38:00:010 GMT on January 28th, 1986 from the Kennedy Space Center
(KSC). I was in the Launch Control Center (LCC) at the time of the launch.
The Mission Management Teams’ (MMT) decision to launch the Challenger was
flawed because of the lack of communication both horizontally and vertically
within the NASA organizational structure. The Columbia accident suffered
from a similar breakdown in communications along with failure to consider
the seriousness of engineers' concerns much like the Challenger. This paper
will discuss the details leading to the failure of the Challenger and the
lessons learned from the accident. The paper will also show how the mistakes
from the Challenger accident in 1986, the 25th flight of the Space Shuttle,
were repeated in the loss of the Columbia in 2003, some 17 years and 88
flights later.
The Fluids and Combustion Facility (FCF) project at the NASA Glenn Research Center subjected the Digital Signal Processor (DSP) based Data Acquisition board to ionizing radiation testing to simulate the International Space Station US Lab radiation environment. Components on the board were irradiated by a 200 MeV proton beam and were exposed to a ten year equivalent dose (600 Rads with a 1.5 Safety margin) of ionizing radiation. All exposures resulted in destructive events in the DSP chips on board.
The Digital Signal Processor (DSP) based Data Acquisition board is a commercial off-the-shelf product that was not designed for space applications. There are four identical DSP chips on the board. The DSP chips are commercial microcircuits, also not intended for space applications. The DSP chips are utilized for image acquisition from FCF Serial Data Link (SDL) supported cameras.
No other devices on the Data Acquisition boards were observed to fail, however the boards were not tested beyond about 1-2% of the total intended proton fluence when the specific SHARC DSP chips were exposed directly.
MER Spirit Flash Memory Anomaly (2004)
NASA Public Lessons Learned System (PLLS) Database
Abstract:
Shortly after the commencement of science activities on Mars, an MER
rover lost the ability to execute any task that requested memory from the
flight computer. The cause was incorrect configuration parameters in two
operating system software modules that control the storage of files in
system memory and flash memory. Seven recommendations cover enforcing design
guidelines for COTS software, verifying assumptions about software behavior,
maintaining a list of lower priority action items, testing flight software
internal functions, creating a comprehensive suite of tests and automated
analysis tools, providing downlinked data on system resources, and avoiding
the problematic file system and complex directory structure.
Managing the Moon Program:
Lessons Learned From Project Apollo
Monographs in Aerospace History, No. 14, 1999 Moderator John M. Logsdon. Participants: Howard W. Tindall, George E. Mueller, Owen W. Morris, Maxime A. Faget, Robert A. Gilruth, and Christopher C. Kraft.
Lessons Learned From Flights of Off the Shelf Aviation Navigation Units on
the Space Shuttle
John L. Goodman
NASA Johnson Space Center, United Space Alliance, LLC
shuttle_gps_lessons_learned_may_02.pdf
Abstract
The Space Shuttle program began flying atmospheric flight navigation units in
1993, in support of Shuttle avionics upgrades. In the early 1990s, it was anticipated that
proven in-production navigation units would greatly reduce integration, certification and
maintenance costs. However, technical issues arising from ground and flight tests
resulted in a slip in the Shuttle GPS certification date. A number of lessons were
learned concerning the adaptation of atmospheric flight navigation units for use in
low-Earth orbit. They are applicable to any use of a navigation unit in an application
significantly different from the one for which it was originally designed. Flight
experience has shown that atmospheric flight navigation units are not adequate to support
anticipated space applications of GPS, such as autonomous operation, rendezvous, formation
flying and replacement of ground tracking systems.
The Space Shuttle and GPS A Safety-Critical Navigation Upgrade
John L. Goodman
NASA Johnson Space Center, United Space Alliance, LLC
shuttle_gps_n_cots.pdf
Abstract
In 1993, the Space Shuttle Program selected an off-the-shelf Global Positioning
System (GPS) receiver to eventually replace the three Tactical Air Navigation units on
each space shuttle orbiter. A proven, large production base GPS receiver was believed to
be the key to reducing integration, certification, and maintenance costs. More GPS
software changes, shuttle flight software changes, and flight and ground testing were
required than anticipated. This resulted in a 3-year slip in the shuttle GPS certification
date. A close relationship with the GPS vendor, open communication among team members,
Independent Verification and Validation of source code, and GPS receiver design insight
were keys to successful certification of GPS for operational use by the space shuttle.
A Software Perspective on GNSS Receiver Integration and Operation
John L. Goodman
NASA Johnson Space Center, United Space Alliance, LLC
sw_perspective_on_gnss_rcvr_int_op.pdf
Abstract
The GNSS industry is focusing on potential threats to satellite navigation
integrity, such as intentional and unintentional interference, signal-in-space (satellite)
and ground support infrastructure anomalies, shared spectrum issues, and multipath. The
experience of the International Space Station (ISS) program, the Space Shuttle program,
the Crew Return Vehicle (CRV) program and other users of GNSS indicate that navigation
outages due to receiver software issues may pose as great a risk, if not more, to the user
than threats currently under study. The improvement in GNSS receiver tracking
capability and navigation accuracy has been accompanied by an increase in software
quantity and complexity. Current and future GNSS receivers will interface with multiple
systems that will further increase software complexity. Rather than viewing GNSS receivers
as plug and play devices, they should be regarded as complex computers that
interface with other complex computers, sometimes in safety critical applications. The
high cost of meeting strict software quality standards, and the proprietary nature of GNSS
receiver software, makes it more difficult to ensure quality software for safety-critical
applications. Lack of integrator and user insight into GNSS software complicates the
integration and test process, leading to cost and schedule issues.
Beyond Normal Accidents and High Reliability Organizations: The Need for an Alternative
Approach to Safety in Complex Systems
Karen Marais, Nicolas Dulac, and Nancy Leveson
MIT
Engineering Systems Symposium, March 24, 2004
marais-b.pdf
Introduction
Organizational factors play a role in almost all accidents and are a critical part
of understanding and preventing them. Two prominent sociological schools of thought have
addressed the organizational aspects of safety: Normal Accident Theory (NAT) and High
Reliability Organizations (HRO). In this paper, we argue that the conclusions of HRO
researchers (labeled HRO in the rest of this paper) are limited in their applicability and
usefulness for complex, high-risk systems. HRO oversimplifies the problems faced by
engineers and organizations building safety-critical systems and following some of the
recommendations could lead to accidents. NAT, on the other hand, does recognize the
difficulties involved but is unnecessarily pessimistic about the possibility of
effectively dealing with them. An alternative systems approach to safety in described,
which avoids the limitations of NAT and HRO. While this paper uses the Space Shuttle,
particularly the Columbia accident, as the primary example, the conclusions apply to most
high-tech, complex systems.
Lessons from the Shuttle Independent Assessment
Dr. Tina L. Panontin
Chief Engineer, NASA Ames Research Center
RMC III, September 19, 2002
rmc_siat.pdf Outline
Satellite GN&C Anomaly Trends
Brent Robertson*, Eric Stoneking*
NASA Goddard Space Flight Center
satellite_anomaly_br.pdf
Abstract
On-orbit anomaly records for satellites launched from 1990 through 2001 are
reviewed to determine recent trends of unmanned space mission critical failures. Anomalies
categorized by subsystems show that Guidance, Navigation and Control (GN&C) subsystems
have a high number of anomalies that result in a mission critical failure when compared to
other subsystems. A mission critical failure is defined as a premature loss of a satellite
or loss of its ability to perform its primary mission during its design life. The majority
of anomalies are shown to occur early in the mission, usually within one year from launch.
GN&C anomalies are categorized by cause and equipment type involved. A statistical
analysis of the data is presented for all anomalies compared with the GN&C anomalies
for various mission types, orbits and time periods. Conclusions and recommendations are
presented for improving mission success and reliability.
Conclusion (excerpt)
A study of past on-orbit anomalies was undertaken to assess how future satellite
program resources might be best spent to ensure mission success. Spacecraft anomaly trends
were surveyed over the last decade, with the hope of learning ways to improve the process
of GN&C system development, to reduce the failure rate of future missions. One
conclusion that was apparent during the data survey was that industry-wide data is not
shared on a routine basis. It is difficult to learn from history if anomaly records are
kept out of the public domain.
Propulsion Lessons Learned from the Loss of Mars Observer
Carl S.Guernsey
Jet Propulsion Laboratory Pasadena,CA
AIAA 2001-3630
37th AIAA/ASME/SAE/ASEE Joint Propulsion Conference
8-11 July 2001
Salt Lake City,Utah
guernsey_a01-34322.pdf
Abstract
Contact with the Mars Observer (MO) spacecraft was
lost in August 1993, three days before it was to have entered orbit around the planet
Mars. The spacecraft's transmitter had been turned off in preparation for
pressurization of the propulsion system, and no signal was ever detected from the vehicle
again. Due to the lack of telemetry, it was never possible to determine with
certainty what caused the loss of the spacecraft, and review boards from JPL, the Naval
Research Laboratory (NRL), and the spacecraft contractor were only able to narrow the
probable cause of the failure to a handful of credible failure modes. This paper
presents an overview of the potential failure modes identified by the JPL review board and
presents evidence, discovered after the failure reviews were complete, that the loss was
very likely due to the use of an incompatible braze material in the flow restriction
orifice of the pressure regulator. Lessons learned and design practices to avoid
this and other propulsion failure modes considered candidates for the loss of MO are
discussed.
The NEAR Discovery Mission: Lessons Learned
R. H. Maurer and A. G. Santo
JHU/APL
The 10th Annual AIAA/ Utah State University Conference on Small Satellites
neardis.pdf
Abstract
Under a contract from NASA The Johns Hopkins University Applied Physics
Laboratory built and launched a spacecraft that will rendezvous and orbit the near earth
asteroid 433 Eros. The Near Earth Asteroid Rendezvous (NEAR) spacecraft is the first under
NASAs Discovery Program, which is a series of low cost solar system missions. While
in orbit around Eros the spacecraft will measure the bulk, surface, and internal
properties of the asteroid for 10 months. This paper describes the lessons learned from
design, test, and fabrication that are appropriate to other programs in quick development,
or of an interplanetary nature.
SPACE SYSTEMS ENGINEERING LESSONS LEARNED SYSTEM
Paul G. Cheng, Douglas D. Chism, Wayne H. Goodman, Patrick L. Smith, and William F. Tosney The Aerospace Corporation
Colonel Michael S. Giblin
USAF Space and Missile Systems Center
AIAA-2001-4796
cheng_2001.doc
Abstract
The space community has long held it vital to learn from past
experiences and avoid the repetition of mishaps. Procedures to collect and disseminate
these "lessons learned" have been set up to serve this need. However, existing
lessons-learned systems have several drawbacks: they are confined to particular technical
areas, are difficult to access, or are not enforceable. The 1999 U.S. Air Force Broad Area
Review (BAR) of launch vehicles recognized these deficiencies, and recommended the
creation of an improved lessons-sharing mechanism. The U.S. Air Force Space and Missile
Systems Center (SMC), with The Aerospace Corporations support, has implemented a
"Space Systems Engineering Lessons Learned" system based on the BAR
recommendation. This new procedure for information sharing has a broad scope as well as an
active dissemination mechanism. It spans all facets of program development, including
systems engineering, design, software, manufacturing, test, launch, and on-orbit
operations. The lessons are electronically available to the U.S. space community, and
adoption of best practices espoused in these lessons would improve SMCs Operational
Safety, Suitability, and Effectiveness (OSS&E) process.
Qualification by Test: An Example with Clock Skew
clk_skew_and_qual_by_test.htm Conclusion
Showing design margins by test demonstrated on the ground can not be used to predict reliability on orbit for this class of circuit. For other classes of circuits, such as the change of propagation delay between two clock edges of a crystal clock oscillator, margin testing can have some value.
Showing design margin by logic simulation can not be used to predict reliability on orbit for this class of circuit. Most logic simulators switch models between runs -- min, typ, and max -- and are incapable of performing min-max analysis. The simulation algorithms assume that the variable parameters track. As seen in Figures 4 and 5, for example, showing the effects of life and antifuse resistance, this is not the case. Real radiation environments are also a concern. The "tracking" assumption is simply wrong and is no more than "engineering by arm waving."
(June 8, 2002)
How Software Errors Contribute to Satellite Failures - Lessons Learned
scsra.pdf (open version)
cheng_nasa_sw_lessons.pdf
cheng_nasa_sw_lessons.ppt
Dr. Paul G. Cheng
The Aerospace Corporation
Risk Assessment & Management Subdivision, Systems Engineering Division
April 24, 2002
The slides (NASA version) are not yet released for open distribution. E-mail me at rich.katz@gsfc.nasa.gov for a copy.
CEO draws quality lessons from design failures
By Peggy Aycinena, Integrated System
Design
March 22, 2000
CEO draws quality lessons from
design failures
(external link)
In a fast-moving keynote address at the first IEEE International Symposium on Quality Electronic Design (ISQED), John East, president and chief executive officer of Actel Corp., highlighted a number of widely-known design failures from the last several decades and offered a primer on the complexities of design quality. (April 16, 2002)
Lessons Learned from FPGA Developments
FPGA-001-01, Version 0.0
April 2002
Prepared by Sandi Habinc
esa_fpga_001_01-0-0.pdf
Introduction
This document is a compilation of problems encountered and lessons learned from the
usage of Field Programmable Gate Array (FPGA) devices in
European Space Agency (ESA) and National Aeronautics and Space Administration (NASA)
satellite missions. The objective has been to list the most common problems which can be
avoided by careful design and it is therefore not an exhaustive compilation of experienced
problems.
This document can also been seen as a set of guidelines to FPGA design for space flight applications. It provides a development method which outlines a development flow that is commonly considered as sufficient for FPGA design. The document also provides down to earth design methods and hints that should be considered by any FPGA designer. Emphasis has also been placed on development tool related problems, especially focusing on Single Event Upset (SEU) hardships in once-only-programmable FPGA devices. Discussions about re-programmable FPGA device will be covered only briefly since outside the scope of this document and will become the focus of a separate future technical report. (April 16, 2002)
JPL Common Threads Workshop Summary Report
May 31, 1996
JPL D-13776
Arthur F. Brown and John E. Koch
Introduction
A Common Threads (CT) workshop was held on 31 May1996 in the
Pasadena Technical Center. The objective of the workshop was to attempt to convey
some of the knowledge of seasoned Project Managers (PM) to the new generation of PMs.
The permise and the theme of the workshop was that "common threads" exist
which appear in program after program, in the form of similar flight and test failures and
failure mechanisms, recurring programmatic issues and sometimes serious oversights.
These problems are understood and often solved in some innovative way on one program, but
the knowledge is frequently not passed to another program with a similar problem, and the
cycle repeats.
The NASA Lessons Learned url link will take you directly to the LLIS Home Page.
Survey of NASA's Lessons Learned Process
September 5, 2001
Abstract
The National Aeronautics and Space Administration's (NASA) procedures and guidelines require that program and project managers review and apply lessons learned from the past throughout a program's or project's life cycle and to document and submit any significant lessons learned in a timely manner. Lessons learned systems are used by many military, commercial and government organizations to capture, store, disseminate, and share knowledge gained from past experiences. NASA's principal mechanism for collecting and sharing lessons learned from programs, projects, and missions agency wide is the Lessons Learned Information System (LLIS). The goal of LLIS is to ensure that NASA does not have to keep "relearning" the lessons of the past. NASA also shares lessons learned through revisions to its policies and guidance. Further, lessons learned from a mishap or operational event are captured in procedure and process documents. GAO surveyed all of NASA's program and project managers to obtain their perspectives on the mechanisms NASA has in place to ensure that past lessons learned from mission failures are being applied. GAO's survey highlighted fundamental weaknesses in the collection and sharing of lessons learned in NASA by program and project managers as well as in the agency's LLIS. While some lessons learning does take place, lessons are not routinely identified, collected, or shared by program and project managers. In addition, many respondents indicated that they are dissatisfied with NASA's lessons learned processes and systems. Respondents also identified challenges or barriers to the sharing of lessons learned as well as areas of improvement.
NASA: Better Mechanisms Needed for
Sharing Lessons Learned
GAO-02-195
January 30, 2002
Executive Summary
Purpose
In the early 1990s, the National Aeronautics and Space Administration (NASA) administrator
challenged the agency to complete projects faster, better, and cheaper. The intent was to
reduce costs, become more efficient, and increase scientific results by conducting more
and smaller missions in less time. Although NASA maintained a high success rate under the
faster, better, and cheaper strategy, a few significant mission failures also
occurredparticularly the loss of the Mars Polar Lander and Climate Orbiter
spacecraft. NASA investigations of these failures, as well as its review of other
programs, raised concern that lessons from past experiences were not being applied to
current programs and projects.
At the request of the Chairman and Ranking Minority Member, Subcommittee on Space and
Aeronautics, House Committee on Science, GAO assessed whether NASA has adequate mechanisms
in place to ensure that past lessons learned from mission failures are being applied.
Specifically, GAO (1) identified the policies, procedures, and systems NASA has in
place for lessons learning, (2) assessed how effectively these policies, procedures, and
systems facilitate lessons learning, and (3) determined whether further efforts are needed
to improve lessons learning.
Lessons Learned At JPL From the HESSI mishap
Abstract
Considerable newspaper and technical publication coverage was given to an overly-severe
March 21, 2000 vibration test in Room 144 of Building 100 at the Jet Propulsion
Laboratory, Pasadena, California. The over-test caused significant damage (over
1,000,000ドル) to the High Energy Solar Spectroscopic Imager (HESSI) satellite built by the
University of California at Berkeley (UCB). A Mishap Investigation Board (MIB) was
convened.
Thanks to Lisa Coe of NASA/MSFC for suggesting this page.
Home -
klabs.org
Last Revised:
February 03, 2010
http://twitter.com/klabsorg --
Web Grunt: Richard Katz