[Python-checkins] python/nondist/peps pep-0330.txt, NONE, 1.1 pep-0000.txt, 1.272, 1.273

goodger at users.sourceforge.net goodger at users.sourceforge.net
Thu May 27 21:45:37 EDT 2004


Update of /cvsroot/python/python/nondist/peps
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv4467
Modified Files:
	pep-0000.txt 
Added Files:
	pep-0330.txt 
Log Message:
added PEP 330, Python Bytecode Verification, by Michel Pelletier
--- NEW FILE: pep-0330.txt ---
PEP: 330
Title: Python Bytecode Verification
Version: $Revision: 1.1 $
Last-Modified: $Date: 2004年05月28日 01:45:34 $
Author: Michel Pelletier <michel at users.sourceforge.net>
Status: Draft 
Type: Standards Track
Content-Type: text/plain
Created: 17-Jun-2004
Python-Version: 2.6?
Post-History: 
Abstract
 If Python Virtual Machine (PVM) bytecode is not "well-formed" it
 is possible to crash or exploit the PVM by causing various errors
 such as under/overflowing the value stack or reading/writing into
 arbitrary areas of the PVM program space. Most of these kinds of
 errors can be eliminated by verifying that PVM bytecode does not
 violate a set of simple constraints before execution.
 This PEP proposes a set of constraints on the format and structure
 of Python Virtual Machine (PVM) bytecode and provides an
 implementation in Python of this verification process.
Motivation
 The Python Virtual Machine executes Python programs that have been
 compiled from the Python language into a bytecode representation.
 The PVM assumes that any bytecode being executed is "well-formed"
 with regard to a number implicit constraints. Some of these
 constraints are checked at run-time, but most of them are not due
 to the overhead they would create.
 When running in debug mode the PVM does do several run-time checks
 to ensure that any particular bytecode cannot violate these
 constraints that, to a degree, prevent bytecode from crashing or
 exploiting the interpreter. These checks add a measurable
 overhead to the interpreter, and are typically turned off in
 common use.
 Bytecode that is not well-formed and executed by a PVM not running
 in debug mode may create a variety of fatal and non-fatal errors.
 Typically, ill-formed code will cause the PVM to seg-fault and
 cause the OS to immediately and abruptly terminate the
 interpreter.
 Conceivably, ill-formed bytecode could exploit the interpreter and
 allow Python bytecode to execute arbitrary C-level machine
 instructions or to modify private, internal data structures in the
 interpreter. If used cleverly this could subvert any form of
 security policy an application may want to apply to it's objects.
 Practically, it would be difficult for a malicious user to
 "inject" invalid bytecode into a PVM for the purposes of
 exploitation, but not impossible. Buffer overflow and memory
 overwrite attacks are commonly understood, particularly when the
 exploit payload is transmitted unencrypted over a network or when
 a file or network security permission weakness is used as a
 foothold for further attacks.
 Ideally, no bytecode should ever be allowed to read or write
 underlying C-level data structures to subvert the operation of the
 PVM, whether the bytecode was maliciously crafted or not. A
 simple pre-execution verification step could ensure that bytecode
 cannot over/underflow the value stack or access other sensitive
 areas of PVM program space at run-time.
 This PEP proposes several validation steps that should be taken on
 Python bytecode before it is executed by the PVM so that it
 compiles with static and structure constraints on its instructions
 and their operands. These steps are simple and catch a large
 class of invalid bytecode that can cause crashes. There is also
 some possibility that some run-time checks can be eliminated up
 front by a verification pass.
 There is, of course, no way to verify that bytecode is "completely
 safe", for every definition of complete and safe. Even with
 bytecode verification, Python programs can and most likely in the
 future will seg-fault for a variety of reasons and continue to
 cause many different classes of run-time errors, fatal or not.
 The verification step proposed here simply plugs an easy hole that
 can cause a large class of fatal and subtle errors at the bytecode
 level.
 Currently, the Java Virtual Machine (JVM) verifies Java bytecode
 in a way very similar to what is proposed here. The JVM
 Specification version 2 [1], Sections 4.8 and 4.9 were therefore
 used as a basis for some of the constraints explained below. Any
 Python bytecode verification implementation at a minimum must
 enforce these constraints, but may not be limited to them.
Static Constraints on Bytecode Instructions
 1. The bytecode string must not be empty. (len(co_code) > 0).
 2. The bytecode string cannot exceed a maximum size
 (len(co_code) < sizeof(unsigned char) - 1).
 3. The first instruction in the bytecode string begins at index 0.
 4. Only valid byte-codes with the correct number of operands can
 be in the bytecode string.
Static Constraints on Bytecode Instruction Operands
 1. The target of a jump instruction must be within the code
 boundaries and must fall on an instruction, never between an
 instruction and its operands.
 2. The operand of a LOAD_* instruction must be an valid index into
 its corresponding data structure.
 3. The operand of a STORE_* instruction must be an valid index
 into its corresponding data structure.
Structural Constraints between Bytecode Instructions
 1. Each instruction must only be executed with the appropriate
 number of arguments in the value stack, regardless of the
 execution path that leads to its invocation.
 2. If an instruction can be executed along several different
 execution paths, the value stack must have the same depth prior
 to the execution of the instruction, regardless of the path
 taken.
 3. At no point during execution can the value stack grow to a
 depth greater than that implied by co_stacksize.
 4. Execution never falls off the bottom of co_code.
Implementation
 This PEP is the working document for an Python bytecode
 verification implementation written in Python. This
 implementation is not used implicitly by the PVM before executing
 any bytecode, but is to be used explicitly by users concerned
 about possibly invalid bytecode with the following snippet:
 import verify
 verify.verify(object)
 The `verify` module provides a `verify` function which accepts the
 same kind of arguments as `dis.dis`: classes, methods, functions,
 or code objects. It verifies that the object's bytecode is
 well-formed according to the specifications of this PEP.
 If the code is well-formed the call to `verify` returns silently
 without error. If an error is encountered, it throws a
 'VerificationError' whose argument indicates the cause of the
 failure. It is up to the programmer whether or not to handle the
 error in some way or execute the invalid code regardless.
 Phillip Eby has proposed a pseudo-code algorithm for bytecode
 stack depth verification used by the reference implementation.
Verification Issues
 This PEP describes only a small number of verifications. While
 discussion and analysis will lead to many more, it is highly
 possible that future verification may need to be done or custom,
 project-specific verifications. For this reason, it might be
 desirable to add a verification registration interface to the test
 implementation to register future verifiers. The need for this is
 minimal since custom verifiers can subclass and extend the current
 implementation for added behavior.
Required Changes
 Armin Rigo noted that several byte-codes will need modification in
 order for their stack effect to be statically analyzed. These are
 END_FINALLY, POP_BLOCK, and MAKE_CLOSURE. Armin and Guido have
 already agreed on how to correct the instructions. Currently the
 Python implementation punts on these instructions.
 This PEP does not propose to add the verification step to the
 interpreter, but only to provide the Python implementation in the
 standard library for optional use. Whether or not this
 verification procedure is translated into C, included with the PVM
 or enforced in any way is left for future discussion.
References
 [1] The Java Virtual Machine Specification 2nd Edition
 http://java.sun.com/docs/books/vmspec/2nd-edition/html/ClassFile.doc.html
Copyright
 This document has been placed in the public domain.

Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
End:
Index: pep-0000.txt
===================================================================
RCS file: /cvsroot/python/python/nondist/peps/pep-0000.txt,v
retrieving revision 1.272
retrieving revision 1.273
diff -C2 -d -r1.272 -r1.273
*** pep-0000.txt	19 May 2004 21:18:54 -0000	1.272
--- pep-0000.txt	28 May 2004 01:45:32 -0000	1.273
***************
*** 123,126 ****
--- 123,127 ----
 S 325 Resource-Release Support for Generators Pedroni
 S 327 Decimal Data Type Batista
+ S 330 Python Bytecode Verification Pelletier
 S 754 IEEE 754 Floating Point Special Values Warnes
 
***************
*** 352,355 ****
--- 353,357 ----
 SA 328 Imports: Multi-Line and Absolute/Relative Aahz
 SR 329 Treating Builtins as Constants in the Standard Library Hettinger 
+ S 330 Python Bytecode Verification Pelletier
 SR 666 Reject Foolish Indentation Creighton
 S 754 IEEE 754 Floating Point Special Values Warnes


More information about the Python-checkins mailing list

AltStyle によって変換されたページ (->オリジナル) /