Convert camelCase function name to snake_case

Question 1

I have the following code:

# tests/examples
cases = [
 ["getMyID", "get_my_id"],
 ["getMyAlphabetABC", "get_my_alphabet_abc"],
 ["getAlphabet", "get_alphabet"],
 ["simple", "simple"],
 ["getALetter", "get_a_letter"],
 ["getBook1", "get_book1"],
 ["simpleButNotSoSimpleBecauseItIsVeryLong", "simple_but_not_so_simple_because_it_is_very_long"]
 ]
def camel_case_to_underscore(t):
 start = 0
 parts = []
 for idx, c in enumerate(t):
 if c.isupper():
 parts.append(t[start:idx].lower())
 start = idx
 parts.append(t[start:].lower())
 for i in reversed([idx for idx, (i, j) in enumerate(zip(parts, parts[1:])) if len(i) == len(j) == 1]):
 parts[i] = parts[i] + parts.pop(i + 1)
 return "_".join(parts)
for p in cases:
 print(camel_case_to_underscore(p[0]), camel_case_to_underscore(p[0]) == p[1]) # should be True for all

It seems quite clunky, but works. Is there a way that this can be optimised without using RegEx. I feel like it can be done in only one for loop but I have had zero luck finding this method.

EDIT Small improvement (I think it's actually worse performer) but it feels closer to me.

def camel_case_to_underscore(t):
 upper_idxs = [0] + [idx for idx, c in enumerate(t) if c.isupper()] + [len(t) + 1]
 parts = [t[start:end].lower() for start, end in zip(upper_idxs[:-1], upper_idxs[1:])]
 for i in reversed([idx for idx, (i, j) in enumerate(zip(parts, parts[1:])) if len(i) == len(j) == 1]):
 parts[i] = parts[i] + parts.pop(i + 1)
 return "_".join(parts)

Question 2

Lacking a specification (as well as an example), what's to happen with 'Dog'?

Question 3

(What's the rule for ["getALetter", "get_a_letter"]?)

Question 4

(camel_case_to_snake('anSQLquery'))

Question 5

Can you explain why you can't use the re library? It's a standard part of Python (one of the "included batteries", if you like).

Question 6

@TobySpeight I am doing this as interview prep and in an interview I would never be able to think up a RegEx. Perhaps the optimal solution is RegEx (although matching all these scenarios will be difficult), it's just not in my use case.

Question 7

It's good that the code is tested. We can improve that by incorporating the tests into the documentation:

def camel_case_to_underscore(t):
 '''Convert the supplied name to snake_case.
 Examples:
 >>> camel_case_to_underscore('getMyID')
 'get_my_id'
 >>> camel_case_to_underscore('getMyAlphabetABC')
 'get_my_alphabet_abc'
 >>> camel_case_to_underscore('getAlphabet')
 'get_alphabet'
 >>> camel_case_to_underscore('simple')
 'simple'
 >>> camel_case_to_underscore('getALetter')
 'get_a_letter'
 >>> camel_case_to_underscore('getBook1')
 'get_book1'
 >>> camel_case_to_underscore('simpleButNotSoSimpleBecauseItIsVeryLong')
 'simple_but_not_so_simple_because_it_is_very_long'
 '''

We can then run them (when file is executed as main, but not when loaded as a module):

if __name__ == '__main__':
 import doctest
 exit(doctest.testmod()[0] > 0)

We should add some more test cases, including PascalCase and words with initialisms at beginning and middle, not just the end:

 >>> camel_case_to_underscore('AccessHTTPServer')
 'access_http_server'
 >>> camel_case_to_underscore('IDForName')
 'id_for_name'

These ones fail (with more useful message, and non-zero exit status):

**********************************************************************
File "/home/tms/stackexchange/review/./281284.py", line 20, in __main__.camel_case_to_underscore
Failed example:
 camel_case_to_underscore('AccessHTTPServer')
Expected:
 'access_http_server'
Got:
 '_access_http_server'
**********************************************************************
File "/home/tms/stackexchange/review/./281284.py", line 22, in __main__.camel_case_to_underscore
Failed example:
 camel_case_to_underscore('IDForName')
Expected:
 'id_for_name'
Got:
 '_id_for_name'
**********************************************************************
1 items had failures:
 2 of 9 in __main__.camel_case_to_underscore
***Test Failed*** 2 failures.

That's something that could be improved.

Question 8

Thanks, I will take a look but in fairness, AccessHTTPServer is not camel case if I am correct as that's pascal case, I would assume accessHTTPServer gives the intended result.

Question 9

Yes, if PascalCase is explicitly not converted, then the modified test passes. It might be better to do something different if PascalCase input is detected - return it unchanged, or throw an exception, perhaps?

Question 10

To be fair, making it PascalCase is pretty easy, don't explicity add 0 at the start of the list, only add it if it's not there, but beyond the scope of the question imo. I would rather a really good one that works with camelCase before I think about handling PascalCase.

Question 11

Iterating the string once seems a laudable goal.

"The rule about two capital letters followed by a lower case one" needs an annoying amount of state.

def to_snake_case(name):
 """ Convert a name to snake case:
 
 Assume a capital letter to start a new word
 to be preceded by an underscore unless at start of name
 or inside a run of capital letters. 
 If such a run is followed by a lowercase letter, it is again
 the start of a word.
 A "run" of one capital is converted to lower.
 """ 
 if not name:
 return name
 if (len(name) <= 1):
 return name.lower()
 # to avoid prepending an underscore before a Pascal case name
 result = name[0].lower() if name[1].islower() else name[0]
 previous = name[1]
 current = ""
 for current in name[2:]:
 if current.islower() and previous.isupper():
 if '_' != result[-1]:
 if len(result) < 2 or result[-2] == '_': # backpatching?!
 result = result[:-1] + result[-1].lower()
 result += '_'
 result += previous.lower()
 elif current.isupper() and previous.islower():
 result += previous + '_'
 else:
 result += previous
 previous = current # alternatives including zip & pairwise
 return result + (current if '_' != result[-1] else current.lower())

Question 12

That's a nice solution albeit confusing. I will try and get my head round and do some comparisons to my solution. I think you want to call a .lower() on the whole of the last line to meet the test cases exactly as I described.

Question 13

I'm confused for the need of the backpatching line. I get what the line actually does but I'm not sure when that if statement would be true? Could you clarify for me please?

Question 14

The "backpatching" kicks in with multiple capitals in a row (acronym/initialism) followed by (another capital and) a lower case letter

Question 15

On top of the other great answer, I'd like to review the extreme list comprehension you've used:

for i in reversed([idx for idx, (i, j) in enumerate(zip(parts, parts[1:])) if len(i) == len(j) == 1]):
 parts[i] = parts[i] + parts.pop(i + 1)

List comprehensions can be a great tool to express things in a usually more concise and sometimes clearer way. In our case, we are definitly in the "more concise part" but I think it is very hard to understand.

Reorganising things slightly may help:

for idx, (i, j) in reversed(list(enumerate(zip(parts, parts[1:])))):
 if len(i) == len(j) == 1:
 parts[idx] += parts.pop(idx + 1)

Toby Speight Toby Speight 87.9k14 gold badges104 silver badges325 bronze badges · Answer 1 · 2022-11-17 06:12:17Z

It's good that the code is tested. We can improve that by incorporating the tests into the documentation:

def camel_case_to_underscore(t):
 '''Convert the supplied name to snake_case.
 Examples:
 >>> camel_case_to_underscore('getMyID')
 'get_my_id'
 >>> camel_case_to_underscore('getMyAlphabetABC')
 'get_my_alphabet_abc'
 >>> camel_case_to_underscore('getAlphabet')
 'get_alphabet'
 >>> camel_case_to_underscore('simple')
 'simple'
 >>> camel_case_to_underscore('getALetter')
 'get_a_letter'
 >>> camel_case_to_underscore('getBook1')
 'get_book1'
 >>> camel_case_to_underscore('simpleButNotSoSimpleBecauseItIsVeryLong')
 'simple_but_not_so_simple_because_it_is_very_long'
 '''

We can then run them (when file is executed as main, but not when loaded as a module):

if __name__ == '__main__':
 import doctest
 exit(doctest.testmod()[0] > 0)

We should add some more test cases, including PascalCase and words with initialisms at beginning and middle, not just the end:

 >>> camel_case_to_underscore('AccessHTTPServer')
 'access_http_server'
 >>> camel_case_to_underscore('IDForName')
 'id_for_name'

These ones fail (with more useful message, and non-zero exit status):

**********************************************************************
File "/home/tms/stackexchange/review/./281284.py", line 20, in __main__.camel_case_to_underscore
Failed example:
 camel_case_to_underscore('AccessHTTPServer')
Expected:
 'access_http_server'
Got:
 '_access_http_server'
**********************************************************************
File "/home/tms/stackexchange/review/./281284.py", line 22, in __main__.camel_case_to_underscore
Failed example:
 camel_case_to_underscore('IDForName')
Expected:
 'id_for_name'
Got:
 '_id_for_name'
**********************************************************************
1 items had failures:
 2 of 9 in __main__.camel_case_to_underscore
***Test Failed*** 2 failures.

That's something that could be improved.

Thanks, I will take a look but in fairness, AccessHTTPServer is not camel case if I am correct as that's pascal case, I would assume accessHTTPServer gives the intended result.
Yes, if PascalCase is explicitly not converted, then the modified test passes. It might be better to do something different if PascalCase input is detected - return it unchanged, or throw an exception, perhaps?
To be fair, making it PascalCase is pretty easy, don't explicity add 0 at the start of the list, only add it if it's not there, but beyond the scope of the question imo. I would rather a really good one that works with camelCase before I think about handling PascalCase.

greybeard greybeard 7,4313 gold badges21 silver badges55 bronze badges · Answer 2 · 2022-11-20 22:04:08Z

Iterating the string once seems a laudable goal.

"The rule about two capital letters followed by a lower case one" needs an annoying amount of state.

def to_snake_case(name):
 """ Convert a name to snake case:
 
 Assume a capital letter to start a new word
 to be preceded by an underscore unless at start of name
 or inside a run of capital letters. 
 If such a run is followed by a lowercase letter, it is again
 the start of a word.
 A "run" of one capital is converted to lower.
 """ 
 if not name:
 return name
 if (len(name) <= 1):
 return name.lower()
 # to avoid prepending an underscore before a Pascal case name
 result = name[0].lower() if name[1].islower() else name[0]
 previous = name[1]
 current = ""
 for current in name[2:]:
 if current.islower() and previous.isupper():
 if '_' != result[-1]:
 if len(result) < 2 or result[-2] == '_': # backpatching?!
 result = result[:-1] + result[-1].lower()
 result += '_'
 result += previous.lower()
 elif current.isupper() and previous.islower():
 result += previous + '_'
 else:
 result += previous
 previous = current # alternatives including zip & pairwise
 return result + (current if '_' != result[-1] else current.lower())

That's a nice solution albeit confusing. I will try and get my head round and do some comparisons to my solution. I think you want to call a .lower() on the whole of the last line to meet the test cases exactly as I described.
I'm confused for the need of the backpatching line. I get what the line actually does but I'm not sure when that if statement would be true? Could you clarify for me please?
The "backpatching" kicks in with multiple capitals in a row (acronym/initialism) followed by (another capital and) a lower case letter

SylvainD SylvainD 29.8k1 gold badge49 silver badges93 bronze badges · Answer 3 · 2022-12-21 21:38:12Z

On top of the other great answer, I'd like to review the extreme list comprehension you've used:

for i in reversed([idx for idx, (i, j) in enumerate(zip(parts, parts[1:])) if len(i) == len(j) == 1]):
 parts[i] = parts[i] + parts.pop(i + 1)

List comprehensions can be a great tool to express things in a usually more concise and sometimes clearer way. In our case, we are definitly in the "more concise part" but I think it is very hard to understand.

Reorganising things slightly may help:

for idx, (i, j) in reversed(list(enumerate(zip(parts, parts[1:])))):
 if len(i) == len(j) == 1:
 parts[idx] += parts.pop(idx + 1)

Stack Exchange Network

Convert camelCase function name to snake_case

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Convert camelCase function name to snake_case

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions