1
\$\begingroup\$

I have the following code:

# tests/examples
cases = [
 ["getMyID", "get_my_id"],
 ["getMyAlphabetABC", "get_my_alphabet_abc"],
 ["getAlphabet", "get_alphabet"],
 ["simple", "simple"],
 ["getALetter", "get_a_letter"],
 ["getBook1", "get_book1"],
 ["simpleButNotSoSimpleBecauseItIsVeryLong", "simple_but_not_so_simple_because_it_is_very_long"]
 ]
def camel_case_to_underscore(t):
 start = 0
 parts = []
 for idx, c in enumerate(t):
 if c.isupper():
 parts.append(t[start:idx].lower())
 start = idx
 parts.append(t[start:].lower())
 for i in reversed([idx for idx, (i, j) in enumerate(zip(parts, parts[1:])) if len(i) == len(j) == 1]):
 parts[i] = parts[i] + parts.pop(i + 1)
 return "_".join(parts)
for p in cases:
 print(camel_case_to_underscore(p[0]), camel_case_to_underscore(p[0]) == p[1]) # should be True for all

It seems quite clunky, but works. Is there a way that this can be optimised without using RegEx. I feel like it can be done in only one for loop but I have had zero luck finding this method.

EDIT Small improvement (I think it's actually worse performer) but it feels closer to me.

def camel_case_to_underscore(t):
 upper_idxs = [0] + [idx for idx, c in enumerate(t) if c.isupper()] + [len(t) + 1]
 parts = [t[start:end].lower() for start, end in zip(upper_idxs[:-1], upper_idxs[1:])]
 for i in reversed([idx for idx, (i, j) in enumerate(zip(parts, parts[1:])) if len(i) == len(j) == 1]):
 parts[i] = parts[i] + parts.pop(i + 1)
 return "_".join(parts)
Toby Speight
87.9k14 gold badges104 silver badges325 bronze badges
asked Nov 16, 2022 at 23:20
\$\endgroup\$
9
  • 2
    \$\begingroup\$ Lacking a specification (as well as an example), what's to happen with 'Dog'? \$\endgroup\$ Commented Nov 17, 2022 at 0:47
  • 1
    \$\begingroup\$ (What's the rule for ["getALetter", "get_a_letter"]?) \$\endgroup\$ Commented Nov 17, 2022 at 1:17
  • \$\begingroup\$ (camel_case_to_snake('anSQLquery')) \$\endgroup\$ Commented Nov 17, 2022 at 2:27
  • \$\begingroup\$ Can you explain why you can't use the re library? It's a standard part of Python (one of the "included batteries", if you like). \$\endgroup\$ Commented Nov 17, 2022 at 5:55
  • 1
    \$\begingroup\$ @TobySpeight I am doing this as interview prep and in an interview I would never be able to think up a RegEx. Perhaps the optimal solution is RegEx (although matching all these scenarios will be difficult), it's just not in my use case. \$\endgroup\$ Commented Nov 17, 2022 at 8:14

3 Answers 3

1
\$\begingroup\$

It's good that the code is tested. We can improve that by incorporating the tests into the documentation:

def camel_case_to_underscore(t):
 '''Convert the supplied name to snake_case.
 Examples:
 >>> camel_case_to_underscore('getMyID')
 'get_my_id'
 >>> camel_case_to_underscore('getMyAlphabetABC')
 'get_my_alphabet_abc'
 >>> camel_case_to_underscore('getAlphabet')
 'get_alphabet'
 >>> camel_case_to_underscore('simple')
 'simple'
 >>> camel_case_to_underscore('getALetter')
 'get_a_letter'
 >>> camel_case_to_underscore('getBook1')
 'get_book1'
 >>> camel_case_to_underscore('simpleButNotSoSimpleBecauseItIsVeryLong')
 'simple_but_not_so_simple_because_it_is_very_long'
 '''

We can then run them (when file is executed as main, but not when loaded as a module):

if __name__ == '__main__':
 import doctest
 exit(doctest.testmod()[0] > 0)

We should add some more test cases, including PascalCase and words with initialisms at beginning and middle, not just the end:

 >>> camel_case_to_underscore('AccessHTTPServer')
 'access_http_server'
 >>> camel_case_to_underscore('IDForName')
 'id_for_name'

These ones fail (with more useful message, and non-zero exit status):

**********************************************************************
File "/home/tms/stackexchange/review/./281284.py", line 20, in __main__.camel_case_to_underscore
Failed example:
 camel_case_to_underscore('AccessHTTPServer')
Expected:
 'access_http_server'
Got:
 '_access_http_server'
**********************************************************************
File "/home/tms/stackexchange/review/./281284.py", line 22, in __main__.camel_case_to_underscore
Failed example:
 camel_case_to_underscore('IDForName')
Expected:
 'id_for_name'
Got:
 '_id_for_name'
**********************************************************************
1 items had failures:
 2 of 9 in __main__.camel_case_to_underscore
***Test Failed*** 2 failures.

That's something that could be improved.

answered Nov 17, 2022 at 6:12
\$\endgroup\$
3
  • 1
    \$\begingroup\$ Thanks, I will take a look but in fairness, AccessHTTPServer is not camel case if I am correct as that's pascal case, I would assume accessHTTPServer gives the intended result. \$\endgroup\$ Commented Nov 17, 2022 at 8:16
  • \$\begingroup\$ Yes, if PascalCase is explicitly not converted, then the modified test passes. It might be better to do something different if PascalCase input is detected - return it unchanged, or throw an exception, perhaps? \$\endgroup\$ Commented Nov 17, 2022 at 8:36
  • \$\begingroup\$ To be fair, making it PascalCase is pretty easy, don't explicity add 0 at the start of the list, only add it if it's not there, but beyond the scope of the question imo. I would rather a really good one that works with camelCase before I think about handling PascalCase. \$\endgroup\$ Commented Nov 17, 2022 at 18:31
1
\$\begingroup\$

Iterating the string once seems a laudable goal.

"The rule about two capital letters followed by a lower case one" needs an annoying amount of state.

def to_snake_case(name):
 """ Convert a name to snake case:
 
 Assume a capital letter to start a new word
 to be preceded by an underscore unless at start of name
 or inside a run of capital letters. 
 If such a run is followed by a lowercase letter, it is again
 the start of a word.
 A "run" of one capital is converted to lower.
 """ 
 if not name:
 return name
 if (len(name) <= 1):
 return name.lower()
 # to avoid prepending an underscore before a Pascal case name
 result = name[0].lower() if name[1].islower() else name[0]
 previous = name[1]
 current = ""
 for current in name[2:]:
 if current.islower() and previous.isupper():
 if '_' != result[-1]:
 if len(result) < 2 or result[-2] == '_': # backpatching?!
 result = result[:-1] + result[-1].lower()
 result += '_'
 result += previous.lower()
 elif current.isupper() and previous.islower():
 result += previous + '_'
 else:
 result += previous
 previous = current # alternatives including zip & pairwise
 return result + (current if '_' != result[-1] else current.lower())
answered Nov 20, 2022 at 22:04
\$\endgroup\$
3
  • \$\begingroup\$ That's a nice solution albeit confusing. I will try and get my head round and do some comparisons to my solution. I think you want to call a .lower() on the whole of the last line to meet the test cases exactly as I described. \$\endgroup\$ Commented Nov 27, 2022 at 13:01
  • \$\begingroup\$ I'm confused for the need of the backpatching line. I get what the line actually does but I'm not sure when that if statement would be true? Could you clarify for me please? \$\endgroup\$ Commented Nov 27, 2022 at 13:25
  • \$\begingroup\$ The "backpatching" kicks in with multiple capitals in a row (acronym/initialism) followed by (another capital and) a lower case letter \$\endgroup\$ Commented Nov 27, 2022 at 14:32
1
\$\begingroup\$

On top of the other great answer, I'd like to review the extreme list comprehension you've used:

for i in reversed([idx for idx, (i, j) in enumerate(zip(parts, parts[1:])) if len(i) == len(j) == 1]):
 parts[i] = parts[i] + parts.pop(i + 1)

List comprehensions can be a great tool to express things in a usually more concise and sometimes clearer way. In our case, we are definitly in the "more concise part" but I think it is very hard to understand.

Reorganising things slightly may help:

for idx, (i, j) in reversed(list(enumerate(zip(parts, parts[1:])))):
 if len(i) == len(j) == 1:
 parts[idx] += parts.pop(idx + 1)
answered Dec 21, 2022 at 21:38
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.