3
\$\begingroup\$

I've just finished a small function that will reduce the size of code as much as possible without breaking anything. Obviously it makes everything a bit unreadable so it's not really for cleaning the code, more for if you want to send something but make it awkward for people to edit.

Basically, on the highest level it'll attempt to remove as many spaces and group as many lines together as possible without causing syntax errors with the output code. If there's anything I missed let me know.

I also tried to make it easy to change all the parts if needed, so for example you're not just limited to " and ' to define what text is (you may want some other values to not get edited) and so on.

Anyway, I tested it on one bit of code I had that was 170,000 characters and 3500 lines. It reduced the number of characters down to 100,000 and knocked it down to 1200 lines.

import operator
def compactCode(input='',groupMaxSpaces=None,changeIndents=4,indentLevel=4,**kwargs):
 #Check that grouping is not disabled, and set to 50 if it is not a number
 if groupMaxSpaces not in (False, None) and type(groupMaxSpaces) not in (int, float): 
 groupMaxSpaces=50
 #Auto set variables to the best efficiency if 'max' is given
 try:
 maxEfficiency=kwargs["max"]
 except:
 pass
 else:
 if maxEfficiency:
 groupMaxSpaces=-1
 changeIndents=1
 #If text should also be affected
 ignoreText = False
 try:
 ignoreText=kwargs["ignoreText"]
 except:
 pass
 #Remove all triple quoted comments
 input=input.replace('"""',"'''").split("'''");input=''.join(input[::2]);
 possibleSuffixes=list("( :")
 #Conditions that may have their contents on the same line
 groupableNames=set(i+j for i in ('if','else','elif','try','except','finally','for','with','while') for j in possibleSuffixes)
 #Conditions which can't be moved up a line
 fixedNames={x:len(x) for x in set(i+j for i in ('class','def') for j in possibleSuffixes)|groupableNames|{'@staticmethod','@classmethod'}}
 input = input.replace('\\','\\\\').replace('\r\n','\\r\\n')
 removeSpace=list('+-*/=!<>%,.()[]{}:') #These items will have all spaces next to them removed
 inLineTextMarker=";txt.{};"
 textSymbols=["'",'"'] #Add to this to preserve text if text is defined by anything other than quotation marks and speech marks
 if ignoreText: 
 removeSpace+=textSymbols
 textSymbols=[]
 indentMultiplier=float(changeIndents)/indentLevel
 outputList=[]
 for line in str(input).split('\n')+[';endoflist;']:
 #Remove comments
 line=line.split("#")[0]
 #Replace text as to avoid it being affected
 textStorage={}
 lastSymbolFail=None
 #Loop until all text is replaced
 while True:
 #Find the first symbol
 symbolOccurrances={}
 for symbol in textSymbols:
 placeOfOccurrance = line.find(symbol)
 #Only add to dictionary if there is more than one symbol
 if placeOfOccurrance >= 0 and line.count(symbol)>1:
 symbolOccurrances[symbol]=placeOfOccurrance
 #Get the first occurance, or break loop if there is none
 try:
 symbol=sorted(symbolOccurrances.items(),key=operator.itemgetter(1))[0][0]
 except:
 break
 textStorage[symbol]=[]
 #Replace the text so it won't be cut down later
 while symbol in line:
 splitByText=line.split(symbol,1)
 line=splitByText[0]+inLineTextMarker
 if symbol in splitByText[1]:
 textSplit=splitByText[1].split(symbol,1)
 line+=textSplit[1]
 textStorage[symbol].append(textSplit[0])
 else:
 line+=splitByText[1]
 break
 line=line.replace(inLineTextMarker,inLineTextMarker.format(ord(symbol)))
 #Remove double spaces
 stripLine=line.lstrip(' ')
 leadingSpace=int((len(line)-len(stripLine))*indentMultiplier)
 while ' ' in stripLine:
 stripLine=stripLine.replace(' ',' ')
 if stripLine:
 #Remove unnecessary spaces
 for i in removeSpace:
 stripLine=stripLine.replace(' '+i,i).replace(i+' ',i)
 #Replace the text markers with the actual text again
 while True:
 resultsExist={symbol:True for symbol in textSymbols}
 for symbol in textSymbols:
 currentTextMarker=inLineTextMarker.format(ord(symbol))
 while currentTextMarker in stripLine:
 stripLine=stripLine.replace(currentTextMarker,symbol+textStorage[symbol].pop(0)+symbol,1)
 if currentTextMarker not in stripLine:
 resultsExist[symbol]=False
 if not any(x in stripLine for x in (inLineTextMarker.format(ord(symbol)) for symbol in textSymbols)):
 break
 #Group together lines
 if groupMaxSpaces:
 lastLine=None
 try:
 lastLine = outputList[-1]
 except:
 pass
 if lastLine and stripLine!=';endoflist;':
 lastLineLength = len(lastLine)
 lastLineStripped = lastLine.lstrip()
 lastLineStrippedLength = len(lastLineStripped)
 lastIndent = lastLineLength-lastLineStrippedLength
 lastLength = lastLineStrippedLength
 #Make sure the last space is of the same indent, and doesn't mark the start of a loop
 if leadingSpace == lastIndent:
 if lastLineStrippedLength+len(stripLine)<groupMaxSpaces or groupMaxSpaces<0:
 if all(x not in stripLine[:y] for x, y in fixedNames.iteritems()):
 stripLine=lastLineStripped+';'+stripLine
 outputList.pop(-1)
 #Group to the conditional statements
 oneLineAgo,twoLinesAgo=None,None
 try:
 twoLinesAgo,oneLineAgo=outputList[-2:]
 except:
 pass
 if oneLineAgo and twoLinesAgo:
 oneLineAgoStrip=oneLineAgo.lstrip()
 twoLinesAgoStrip=twoLinesAgo.lstrip()
 oneLineAgoIndentLevel = len(oneLineAgo)-len(oneLineAgoStrip)
 #Check the current indent is less than the last line, and the last line indent is greater than the 2nd last line
 if leadingSpace<oneLineAgoIndentLevel:
 if int(oneLineAgoIndentLevel-indentLevel*indentMultiplier)==len(twoLinesAgo)-len(twoLinesAgoStrip):
 #Make sure 2 lines ago was a statement, but the latest line wasn't
 if any(x in twoLinesAgoStrip[:7] for x in groupableNames) and all(x not in oneLineAgoStrip[:7] for x in groupableNames):
 outputList[-2] = twoLinesAgo+oneLineAgoStrip
 outputList.pop(-1)
 #Add the indent and repeat
 line=' '*leadingSpace+stripLine
 outputList.append(line.rstrip())
 return '\r\n'.join(outputList[:-1])

Here's an example of how it works:

Messy input code:

'''
Some example code
'''
print "Testing "+ ( str( 1234 ) + '3' )*2
b = 7
c = 46
print ( b + c )/3
def myFunction( x ):
 #Just a function
 outputList = []
 for i in range( x ):
 outputList.append( i % 10 )
 return outputList
print myFunction( b )

Basic:

>>>compactCode(input)
print "Testing "+(str(1234)+'3')*2
b=7
c=46
print(b+c)/3
def myFunction(x):
 outputList=[]
 for i in range(x):
 outputList.append(i%10)
 return outputList
print myFunction(b)

With line grouping (the -1 means lines can be any length, otherwise you pick a maximum number):

>>>compactCode(input,-1)
print "Testing "+(str(1234)+'3')*2;b=7;c=46;print(b+c)/3
def myFunction(x):
 outputList=[]
 for i in range(x):outputList.append(i%10)
 return outputList
print myFunction(b)

With reduced indents but no line grouping:

>>>compactCode(input,0,1)
print "Testing "+(str(1234)+'3')*2
b=7
c=46
print(b+c)/3
def myFunction(x):
 outputList=[]
 for i in range(x):
 outputList.append(i%10)
 return outputList
print myFunction(b)

To avoid messing up print statements and stuff, the text will never be edited unless ignoreText is passed as True. Also passing max as True will automatically set the lines to infinite length and set all the indents to 1.

For a larger scale example with the code I was using. Here's the original, and here's the reduced version.

The one thing I didn't use in the examples was indentLevel. It's just in case the code has something other than 4 spaces per indent.

Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Mar 26, 2015 at 22:27
\$\endgroup\$
8
  • \$\begingroup\$ See github.com/gareth-rees/minipy \$\endgroup\$ Commented Mar 26, 2015 at 22:32
  • \$\begingroup\$ Ah nice, I guessed there might have been something out there but I never managed to find anything \$\endgroup\$ Commented Mar 26, 2015 at 22:36
  • \$\begingroup\$ For the record, I just tried it out. Without renaming variables it was a tiny bit longer (since I removed the comments and that doesn't), however when I tried to let it rename variables, it shot up to 650k characters from the original 170k :p \$\endgroup\$ Commented Mar 26, 2015 at 23:03
  • \$\begingroup\$ What was your test case? Trying pastebin.com/VaWTSZH3, I find that minipy shrinks it from 170k to 108k, and with --rename down to 83k. (Also, minipy always removes comments.) I'd be grateful if you could open an issue with all the relevant details (Python version, operating system etc.) so I can figure out what went wrong. \$\endgroup\$ Commented Mar 26, 2015 at 23:37
  • \$\begingroup\$ Ah nice, didn't realise you made it, and it was just a mistake on my part. I'd passed rename slightly wrong in minify :) The reduce part is really cool though, if you don't mind me asking, how do you detect what is a variable, and which letters are already taken? \$\endgroup\$ Commented Mar 26, 2015 at 23:56

1 Answer 1

2
\$\begingroup\$

First, stick to PEP 8.

  • Spacing
  • snake_case
  • Line lengths (more than 80 is OK, but 149 is too much)

Don't use kwargs for optional arguments.

You do:

# Check that grouping is not disabled, and set to 50 if it is not a number
if group_max_spaces not in (False, None) and type(group_max_spaces) not in (int, float): 
 group_max_spaces=50

This is bad in several ways:

  • 0 == False, so your code conflates the two in several places
  • You ignore type errors
  • You treat None and False the same
  • group_max_spaces has another meaning when negative
  • There is no documentation!

Instead, I suggest a smaller mapping:

  • If None, there is no maximum
  • Otherwise, the maximum is the integer given
  • Default is 0, for no grouping (not a special case, just a low number)

The check given can be discarded.

Your next check now looks much simpler, currently as

if max:
 group_max_spaces = None
 change_indents = 1

This is unfortunate; this argument overrides the others. I would personally instead create a separate convenience function:

def compact_code_max(input='', *args, **kwargs):
 return compact_code(input, None, 1, *args, **kwargs)

This means there's no worry about conflicting arguments, like passing group_max_spaces=100 and max at the same time.

You have

input=input.replace('"""',"'''").split("'''");input=''.join(input[::2]);

Honestly, this looks like it's been passed through your own function. Split it up (and remove that trailing semicolon)! Add spacing.

Now, this isn't correct either. Not all tripple quoted strings are doc comments. What about

description = """
MyFoo
Usage:
 ./my_foo.py --help
 ./my_foo.py eat <food>
 ./my_foo.py throw (chair|up)
"""
print description

This compresses to

description=
print description

!

Also this breaks for something like

def foo():
 """
 Here is an example:
 foo('''bar
 bash''')
 """
 ...

This compresses to

def foo():
 bar
 bash
 ...

Oops!

There is no simple way of removing this safely, although AST introspection helps. I suggest just not doing this. However, your code seems to crash inside such strings, so this is no good either.

As such, you really should be doing this through an AST and proper parsing.

You do

possible_suffixes = list("( :")

It seems simpler to just

possible_suffixes = ["(", " ", ":"]

You only loop over it, though, so just loop over the string:

possible_suffixes = "( :"

I also think the prefix possible_ is redundant.

You have

groupable_names = set(i + j for i in ('if','else','elif','try','except','finally','for','with','while') for j in suffixes)

The keywords here should be on a separate line and you should use a set comprehension:

block_opening_keywords = 'if', 'else', 'elif', 'try', 'except', 'finally', 'for', 'with', 'while'
groupable_names = {i + j for i in block_opening_keywords for j in suffixes}

Personally this deprecates the comment which is less explanatory.

You then have

# Conditions which can't be moved up a line
fixed_names = {x: len(x) for x in {i + j for i in ('class', 'def') for j in suffixes} | groupable_names | {'@staticmethod', '@classmethod'}}

Split it up!

# Conditions which can't be moved up a line
fixed_names = {i + j for i in ('class', 'def') for j in suffixes}
fixed_names |= groupable_names
fixed_names |= {'@staticmethod', '@classmethod'}
fixed_names = {x: len(x) for x in fixed_names}

The cost of caling len is low (constant time lookup of a C attribute), so I'd remove that last line and just call len when need be.

You're mixing ' and " somewhat haphazardly; stick to one. You seem to be using ' more, so I'll adjust others to that.

Now, I'm confused about what this actually does. Why separate out @classmethod and @staticmethod, as opposed to all other @decorator calls?


Now, that's enough analysis of the code for now. Howver, there are some bugs. Here's one.

if False: pass
print(1)

gets converted to

if False:pass;print(1)

Really I think this strategy of string replacement is too hard to get right. Look at a proper AST transformer like given in the comments.

answered Mar 28, 2015 at 13:15
\$\endgroup\$
6
  • \$\begingroup\$ Sorry it took a while, but thanks so much for the feedback, it's been really useful reading through lol. I've started trying to document functions and classes now, and breaking up the longer lines (if they break up nicely - if it's a long string I'll leave it). The len part is needed later on to check only the start of the string, since otherwise, if I had print "if", it could still detect it as an if. With mixing ' and ", it was to try detect which was first, so it wouldn't affect the other one if it occured in a string. \$\endgroup\$ Commented Apr 3, 2015 at 15:01
  • \$\begingroup\$ With the bugs, thanks for pointing them out, I hadn't realised not all triple quoted bits were just comments, and likewise, I hadn't thought of putting ''' inside """ or vice versa. With pass, I think it'd solve that by adding it to fixedNames (I forgot about it to be fair), plus I've only ever seen @classmethod and @staticmethod so the others didn't cross my mind, all I'd need to do is make sure anything starting with @ doesn't get moved :) As to the first point you made, the mixed capitals seems ok with PEP8, and I think they look a bit neater than underscores \$\endgroup\$ Commented Apr 3, 2015 at 15:06
  • 1
    \$\begingroup\$ @Peter For the len part, you don't need it. Just call len in the all. In fact, you can just do all(not strip_line.startswith(x) for x in fixed_names). That's the only place you'd use the len. // "With pass, I think it'd solve that by adding it to fixedNames" → That doesn't solve it; it could be any other single statement instead. // "mixed capitals seems ok with PEP8" → PEP 8 says "mixedCase is allowed only in contexts where that's already the prevailing style (e.g. threading.py), to retain backwards compatibility." I'm certain that the convention is snake_case. \$\endgroup\$ Commented Apr 3, 2015 at 15:46
  • \$\begingroup\$ Ahh nice one, I wasn't aware of .startswith(), that looks an awful lot more efficient to use. As to the mixed capitals, I think me using them comes from doing all the python inside Maya (and I'd missed the bit that said it's only for backwards compatibility). All the Maya commands are things like, pm.polyAverageVertex, pm.defaultNavigation etc, and I've only recently branched out into doing non-Maya things with python. What would your suggestion on that be then, bearing in mind I'll usually finish non Maya code like this, but then build a UI in Maya to make it easier to use? \$\endgroup\$ Commented Apr 6, 2015 at 16:29
  • \$\begingroup\$ @Peter In situations like that, it's pretty much whatever you think works best. Personally I would use camelCase when extending Maya (eg. superclasses or wrappers of their stuff) and use snake_case elsewhere but I've never seen Maya, never mind used it. \$\endgroup\$ Commented Apr 6, 2015 at 17:02

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.