I've just finished a small function that will reduce the size of code as much as possible without breaking anything. Obviously it makes everything a bit unreadable so it's not really for cleaning the code, more for if you want to send something but make it awkward for people to edit.
Basically, on the highest level it'll attempt to remove as many spaces and group as many lines together as possible without causing syntax errors with the output code. If there's anything I missed let me know.
I also tried to make it easy to change all the parts if needed, so for example you're not just limited to "
and '
to define what text is (you may want some other values to not get edited) and so on.
Anyway, I tested it on one bit of code I had that was 170,000 characters and 3500 lines. It reduced the number of characters down to 100,000 and knocked it down to 1200 lines.
import operator
def compactCode(input='',groupMaxSpaces=None,changeIndents=4,indentLevel=4,**kwargs):
#Check that grouping is not disabled, and set to 50 if it is not a number
if groupMaxSpaces not in (False, None) and type(groupMaxSpaces) not in (int, float):
groupMaxSpaces=50
#Auto set variables to the best efficiency if 'max' is given
try:
maxEfficiency=kwargs["max"]
except:
pass
else:
if maxEfficiency:
groupMaxSpaces=-1
changeIndents=1
#If text should also be affected
ignoreText = False
try:
ignoreText=kwargs["ignoreText"]
except:
pass
#Remove all triple quoted comments
input=input.replace('"""',"'''").split("'''");input=''.join(input[::2]);
possibleSuffixes=list("( :")
#Conditions that may have their contents on the same line
groupableNames=set(i+j for i in ('if','else','elif','try','except','finally','for','with','while') for j in possibleSuffixes)
#Conditions which can't be moved up a line
fixedNames={x:len(x) for x in set(i+j for i in ('class','def') for j in possibleSuffixes)|groupableNames|{'@staticmethod','@classmethod'}}
input = input.replace('\\','\\\\').replace('\r\n','\\r\\n')
removeSpace=list('+-*/=!<>%,.()[]{}:') #These items will have all spaces next to them removed
inLineTextMarker=";txt.{};"
textSymbols=["'",'"'] #Add to this to preserve text if text is defined by anything other than quotation marks and speech marks
if ignoreText:
removeSpace+=textSymbols
textSymbols=[]
indentMultiplier=float(changeIndents)/indentLevel
outputList=[]
for line in str(input).split('\n')+[';endoflist;']:
#Remove comments
line=line.split("#")[0]
#Replace text as to avoid it being affected
textStorage={}
lastSymbolFail=None
#Loop until all text is replaced
while True:
#Find the first symbol
symbolOccurrances={}
for symbol in textSymbols:
placeOfOccurrance = line.find(symbol)
#Only add to dictionary if there is more than one symbol
if placeOfOccurrance >= 0 and line.count(symbol)>1:
symbolOccurrances[symbol]=placeOfOccurrance
#Get the first occurance, or break loop if there is none
try:
symbol=sorted(symbolOccurrances.items(),key=operator.itemgetter(1))[0][0]
except:
break
textStorage[symbol]=[]
#Replace the text so it won't be cut down later
while symbol in line:
splitByText=line.split(symbol,1)
line=splitByText[0]+inLineTextMarker
if symbol in splitByText[1]:
textSplit=splitByText[1].split(symbol,1)
line+=textSplit[1]
textStorage[symbol].append(textSplit[0])
else:
line+=splitByText[1]
break
line=line.replace(inLineTextMarker,inLineTextMarker.format(ord(symbol)))
#Remove double spaces
stripLine=line.lstrip(' ')
leadingSpace=int((len(line)-len(stripLine))*indentMultiplier)
while ' ' in stripLine:
stripLine=stripLine.replace(' ',' ')
if stripLine:
#Remove unnecessary spaces
for i in removeSpace:
stripLine=stripLine.replace(' '+i,i).replace(i+' ',i)
#Replace the text markers with the actual text again
while True:
resultsExist={symbol:True for symbol in textSymbols}
for symbol in textSymbols:
currentTextMarker=inLineTextMarker.format(ord(symbol))
while currentTextMarker in stripLine:
stripLine=stripLine.replace(currentTextMarker,symbol+textStorage[symbol].pop(0)+symbol,1)
if currentTextMarker not in stripLine:
resultsExist[symbol]=False
if not any(x in stripLine for x in (inLineTextMarker.format(ord(symbol)) for symbol in textSymbols)):
break
#Group together lines
if groupMaxSpaces:
lastLine=None
try:
lastLine = outputList[-1]
except:
pass
if lastLine and stripLine!=';endoflist;':
lastLineLength = len(lastLine)
lastLineStripped = lastLine.lstrip()
lastLineStrippedLength = len(lastLineStripped)
lastIndent = lastLineLength-lastLineStrippedLength
lastLength = lastLineStrippedLength
#Make sure the last space is of the same indent, and doesn't mark the start of a loop
if leadingSpace == lastIndent:
if lastLineStrippedLength+len(stripLine)<groupMaxSpaces or groupMaxSpaces<0:
if all(x not in stripLine[:y] for x, y in fixedNames.iteritems()):
stripLine=lastLineStripped+';'+stripLine
outputList.pop(-1)
#Group to the conditional statements
oneLineAgo,twoLinesAgo=None,None
try:
twoLinesAgo,oneLineAgo=outputList[-2:]
except:
pass
if oneLineAgo and twoLinesAgo:
oneLineAgoStrip=oneLineAgo.lstrip()
twoLinesAgoStrip=twoLinesAgo.lstrip()
oneLineAgoIndentLevel = len(oneLineAgo)-len(oneLineAgoStrip)
#Check the current indent is less than the last line, and the last line indent is greater than the 2nd last line
if leadingSpace<oneLineAgoIndentLevel:
if int(oneLineAgoIndentLevel-indentLevel*indentMultiplier)==len(twoLinesAgo)-len(twoLinesAgoStrip):
#Make sure 2 lines ago was a statement, but the latest line wasn't
if any(x in twoLinesAgoStrip[:7] for x in groupableNames) and all(x not in oneLineAgoStrip[:7] for x in groupableNames):
outputList[-2] = twoLinesAgo+oneLineAgoStrip
outputList.pop(-1)
#Add the indent and repeat
line=' '*leadingSpace+stripLine
outputList.append(line.rstrip())
return '\r\n'.join(outputList[:-1])
Here's an example of how it works:
Messy input code:
'''
Some example code
'''
print "Testing "+ ( str( 1234 ) + '3' )*2
b = 7
c = 46
print ( b + c )/3
def myFunction( x ):
#Just a function
outputList = []
for i in range( x ):
outputList.append( i % 10 )
return outputList
print myFunction( b )
Basic:
>>>compactCode(input)
print "Testing "+(str(1234)+'3')*2
b=7
c=46
print(b+c)/3
def myFunction(x):
outputList=[]
for i in range(x):
outputList.append(i%10)
return outputList
print myFunction(b)
With line grouping (the -1 means lines can be any length, otherwise you pick a maximum number):
>>>compactCode(input,-1)
print "Testing "+(str(1234)+'3')*2;b=7;c=46;print(b+c)/3
def myFunction(x):
outputList=[]
for i in range(x):outputList.append(i%10)
return outputList
print myFunction(b)
With reduced indents but no line grouping:
>>>compactCode(input,0,1)
print "Testing "+(str(1234)+'3')*2
b=7
c=46
print(b+c)/3
def myFunction(x):
outputList=[]
for i in range(x):
outputList.append(i%10)
return outputList
print myFunction(b)
To avoid messing up print statements and stuff, the text will never be edited unless ignoreText
is passed as True. Also passing max
as True will automatically set the lines to infinite length and set all the indents to 1.
For a larger scale example with the code I was using. Here's the original, and here's the reduced version.
The one thing I didn't use in the examples was indentLevel
. It's just in case the code has something other than 4 spaces per indent.
1 Answer 1
First, stick to PEP 8.
- Spacing
snake_case
- Line lengths (more than 80 is OK, but 149 is too much)
Don't use kwargs
for optional arguments.
You do:
# Check that grouping is not disabled, and set to 50 if it is not a number
if group_max_spaces not in (False, None) and type(group_max_spaces) not in (int, float):
group_max_spaces=50
This is bad in several ways:
0 == False
, so your code conflates the two in several places- You ignore type errors
- You treat
None
andFalse
the same group_max_spaces
has another meaning when negative- There is no documentation!
Instead, I suggest a smaller mapping:
- If
None
, there is no maximum - Otherwise, the maximum is the integer given
- Default is
0
, for no grouping (not a special case, just a low number)
The check given can be discarded.
Your next check now looks much simpler, currently as
if max:
group_max_spaces = None
change_indents = 1
This is unfortunate; this argument overrides the others. I would personally instead create a separate convenience function:
def compact_code_max(input='', *args, **kwargs):
return compact_code(input, None, 1, *args, **kwargs)
This means there's no worry about conflicting arguments, like passing group_max_spaces=100
and max
at the same time.
You have
input=input.replace('"""',"'''").split("'''");input=''.join(input[::2]);
Honestly, this looks like it's been passed through your own function. Split it up (and remove that trailing semicolon)! Add spacing.
Now, this isn't correct either. Not all tripple quoted strings are doc comments. What about
description = """
MyFoo
Usage:
./my_foo.py --help
./my_foo.py eat <food>
./my_foo.py throw (chair|up)
"""
print description
This compresses to
description=
print description
!
Also this breaks for something like
def foo():
"""
Here is an example:
foo('''bar
bash''')
"""
...
This compresses to
def foo():
bar
bash
...
Oops!
There is no simple way of removing this safely, although AST introspection helps. I suggest just not doing this. However, your code seems to crash inside such strings, so this is no good either.
As such, you really should be doing this through an AST and proper parsing.
You do
possible_suffixes = list("( :")
It seems simpler to just
possible_suffixes = ["(", " ", ":"]
You only loop over it, though, so just loop over the string:
possible_suffixes = "( :"
I also think the prefix possible_
is redundant.
You have
groupable_names = set(i + j for i in ('if','else','elif','try','except','finally','for','with','while') for j in suffixes)
The keywords here should be on a separate line and you should use a set comprehension:
block_opening_keywords = 'if', 'else', 'elif', 'try', 'except', 'finally', 'for', 'with', 'while'
groupable_names = {i + j for i in block_opening_keywords for j in suffixes}
Personally this deprecates the comment which is less explanatory.
You then have
# Conditions which can't be moved up a line
fixed_names = {x: len(x) for x in {i + j for i in ('class', 'def') for j in suffixes} | groupable_names | {'@staticmethod', '@classmethod'}}
Split it up!
# Conditions which can't be moved up a line
fixed_names = {i + j for i in ('class', 'def') for j in suffixes}
fixed_names |= groupable_names
fixed_names |= {'@staticmethod', '@classmethod'}
fixed_names = {x: len(x) for x in fixed_names}
The cost of caling len
is low (constant time lookup of a C attribute), so I'd remove that last line and just call len
when need be.
You're mixing '
and "
somewhat haphazardly; stick to one. You seem to be using '
more, so I'll adjust others to that.
Now, I'm confused about what this actually does. Why separate out @classmethod
and @staticmethod
, as opposed to all other @decorator
calls?
Now, that's enough analysis of the code for now. Howver, there are some bugs. Here's one.
if False: pass
print(1)
gets converted to
if False:pass;print(1)
Really I think this strategy of string replacement is too hard to get right. Look at a proper AST transformer like given in the comments.
-
\$\begingroup\$ Sorry it took a while, but thanks so much for the feedback, it's been really useful reading through lol. I've started trying to document functions and classes now, and breaking up the longer lines (if they break up nicely - if it's a long string I'll leave it). The
len
part is needed later on to check only the start of the string, since otherwise, if I hadprint "if"
, it could still detect it as anif
. With mixing'
and"
, it was to try detect which was first, so it wouldn't affect the other one if it occured in a string. \$\endgroup\$Peter– Peter2015年04月03日 15:01:01 +00:00Commented Apr 3, 2015 at 15:01 -
\$\begingroup\$ With the bugs, thanks for pointing them out, I hadn't realised not all triple quoted bits were just comments, and likewise, I hadn't thought of putting
'''
inside"""
or vice versa. With pass, I think it'd solve that by adding it tofixedNames
(I forgot about it to be fair), plus I've only ever seen@classmethod
and@staticmethod
so the others didn't cross my mind, all I'd need to do is make sure anything starting with@
doesn't get moved :) As to the first point you made, the mixed capitals seems ok with PEP8, and I think they look a bit neater than underscores \$\endgroup\$Peter– Peter2015年04月03日 15:06:16 +00:00Commented Apr 3, 2015 at 15:06 -
1\$\begingroup\$ @Peter For the
len
part, you don't need it. Just calllen
in theall
. In fact, you can just doall(not strip_line.startswith(x) for x in fixed_names)
. That's the only place you'd use thelen
. // "With pass, I think it'd solve that by adding it tofixedNames
" → That doesn't solve it; it could be any other single statement instead. // "mixed capitals seems ok with PEP8" → PEP 8 says "mixedCase is allowed only in contexts where that's already the prevailing style (e.g. threading.py), to retain backwards compatibility." I'm certain that the convention issnake_case
. \$\endgroup\$Veedrac– Veedrac2015年04月03日 15:46:03 +00:00Commented Apr 3, 2015 at 15:46 -
\$\begingroup\$ Ahh nice one, I wasn't aware of
.startswith()
, that looks an awful lot more efficient to use. As to the mixed capitals, I think me using them comes from doing all the python inside Maya (and I'd missed the bit that said it's only for backwards compatibility). All the Maya commands are things like,pm.polyAverageVertex
,pm.defaultNavigation
etc, and I've only recently branched out into doing non-Maya things with python. What would your suggestion on that be then, bearing in mind I'll usually finish non Maya code like this, but then build a UI in Maya to make it easier to use? \$\endgroup\$Peter– Peter2015年04月06日 16:29:21 +00:00Commented Apr 6, 2015 at 16:29 -
\$\begingroup\$ @Peter In situations like that, it's pretty much whatever you think works best. Personally I would use
camelCase
when extending Maya (eg. superclasses or wrappers of their stuff) and usesnake_case
elsewhere but I've never seen Maya, never mind used it. \$\endgroup\$Veedrac– Veedrac2015年04月06日 17:02:36 +00:00Commented Apr 6, 2015 at 17:02
minipy
shrinks it from 170k to 108k, and with--rename
down to 83k. (Also,minipy
always removes comments.) I'd be grateful if you could open an issue with all the relevant details (Python version, operating system etc.) so I can figure out what went wrong. \$\endgroup\$minify
:) The reduce part is really cool though, if you don't mind me asking, how do you detect what is a variable, and which letters are already taken? \$\endgroup\$