I have the following code:
# tests/examples
cases = [
["getMyID", "get_my_id"],
["getMyAlphabetABC", "get_my_alphabet_abc"],
["getAlphabet", "get_alphabet"],
["simple", "simple"],
["getALetter", "get_a_letter"],
["getBook1", "get_book1"],
["simpleButNotSoSimpleBecauseItIsVeryLong", "simple_but_not_so_simple_because_it_is_very_long"]
]
def camel_case_to_underscore(t):
start = 0
parts = []
for idx, c in enumerate(t):
if c.isupper():
parts.append(t[start:idx].lower())
start = idx
parts.append(t[start:].lower())
for i in reversed([idx for idx, (i, j) in enumerate(zip(parts, parts[1:])) if len(i) == len(j) == 1]):
parts[i] = parts[i] + parts.pop(i + 1)
return "_".join(parts)
for p in cases:
print(camel_case_to_underscore(p[0]), camel_case_to_underscore(p[0]) == p[1]) # should be True for all
It seems quite clunky, but works. Is there a way that this can be optimised without using RegEx. I feel like it can be done in only one for
loop but I have had zero luck finding this method.
EDIT Small improvement (I think it's actually worse performer) but it feels closer to me.
def camel_case_to_underscore(t):
upper_idxs = [0] + [idx for idx, c in enumerate(t) if c.isupper()] + [len(t) + 1]
parts = [t[start:end].lower() for start, end in zip(upper_idxs[:-1], upper_idxs[1:])]
for i in reversed([idx for idx, (i, j) in enumerate(zip(parts, parts[1:])) if len(i) == len(j) == 1]):
parts[i] = parts[i] + parts.pop(i + 1)
return "_".join(parts)
3 Answers 3
It's good that the code is tested. We can improve that by incorporating the tests into the documentation:
def camel_case_to_underscore(t):
'''Convert the supplied name to snake_case.
Examples:
>>> camel_case_to_underscore('getMyID')
'get_my_id'
>>> camel_case_to_underscore('getMyAlphabetABC')
'get_my_alphabet_abc'
>>> camel_case_to_underscore('getAlphabet')
'get_alphabet'
>>> camel_case_to_underscore('simple')
'simple'
>>> camel_case_to_underscore('getALetter')
'get_a_letter'
>>> camel_case_to_underscore('getBook1')
'get_book1'
>>> camel_case_to_underscore('simpleButNotSoSimpleBecauseItIsVeryLong')
'simple_but_not_so_simple_because_it_is_very_long'
'''
We can then run them (when file is executed as main, but not when loaded as a module):
if __name__ == '__main__':
import doctest
exit(doctest.testmod()[0] > 0)
We should add some more test cases, including PascalCase and words with initialisms at beginning and middle, not just the end:
>>> camel_case_to_underscore('AccessHTTPServer')
'access_http_server'
>>> camel_case_to_underscore('IDForName')
'id_for_name'
These ones fail (with more useful message, and non-zero exit status):
**********************************************************************
File "/home/tms/stackexchange/review/./281284.py", line 20, in __main__.camel_case_to_underscore
Failed example:
camel_case_to_underscore('AccessHTTPServer')
Expected:
'access_http_server'
Got:
'_access_http_server'
**********************************************************************
File "/home/tms/stackexchange/review/./281284.py", line 22, in __main__.camel_case_to_underscore
Failed example:
camel_case_to_underscore('IDForName')
Expected:
'id_for_name'
Got:
'_id_for_name'
**********************************************************************
1 items had failures:
2 of 9 in __main__.camel_case_to_underscore
***Test Failed*** 2 failures.
That's something that could be improved.
-
1\$\begingroup\$ Thanks, I will take a look but in fairness,
AccessHTTPServer
is not camel case if I am correct as that's pascal case, I would assumeaccessHTTPServer
gives the intended result. \$\endgroup\$TomS– TomS2022年11月17日 08:16:04 +00:00Commented Nov 17, 2022 at 8:16 -
\$\begingroup\$ Yes, if PascalCase is explicitly not converted, then the modified test passes. It might be better to do something different if PascalCase input is detected - return it unchanged, or throw an exception, perhaps? \$\endgroup\$Toby Speight– Toby Speight2022年11月17日 08:36:37 +00:00Commented Nov 17, 2022 at 8:36
-
\$\begingroup\$ To be fair, making it PascalCase is pretty easy, don't explicity add 0 at the start of the list, only add it if it's not there, but beyond the scope of the question imo. I would rather a really good one that works with camelCase before I think about handling PascalCase. \$\endgroup\$TomS– TomS2022年11月17日 18:31:56 +00:00Commented Nov 17, 2022 at 18:31
Iterating the string once seems a laudable goal.
"The rule about two capital letters followed by a lower case one" needs an annoying amount of state.
def to_snake_case(name):
""" Convert a name to snake case:
Assume a capital letter to start a new word
to be preceded by an underscore unless at start of name
or inside a run of capital letters.
If such a run is followed by a lowercase letter, it is again
the start of a word.
A "run" of one capital is converted to lower.
"""
if not name:
return name
if (len(name) <= 1):
return name.lower()
# to avoid prepending an underscore before a Pascal case name
result = name[0].lower() if name[1].islower() else name[0]
previous = name[1]
current = ""
for current in name[2:]:
if current.islower() and previous.isupper():
if '_' != result[-1]:
if len(result) < 2 or result[-2] == '_': # backpatching?!
result = result[:-1] + result[-1].lower()
result += '_'
result += previous.lower()
elif current.isupper() and previous.islower():
result += previous + '_'
else:
result += previous
previous = current # alternatives including zip & pairwise
return result + (current if '_' != result[-1] else current.lower())
-
\$\begingroup\$ That's a nice solution albeit confusing. I will try and get my head round and do some comparisons to my solution. I think you want to call a
.lower()
on the whole of the last line to meet the test cases exactly as I described. \$\endgroup\$TomS– TomS2022年11月27日 13:01:08 +00:00Commented Nov 27, 2022 at 13:01 -
\$\begingroup\$ I'm confused for the need of the backpatching line. I get what the line actually does but I'm not sure when that if statement would be true? Could you clarify for me please? \$\endgroup\$TomS– TomS2022年11月27日 13:25:15 +00:00Commented Nov 27, 2022 at 13:25
-
\$\begingroup\$ The "backpatching" kicks in with multiple capitals in a row (acronym/initialism) followed by (another capital and) a lower case letter \$\endgroup\$greybeard– greybeard2022年11月27日 14:32:33 +00:00Commented Nov 27, 2022 at 14:32
On top of the other great answer, I'd like to review the extreme list comprehension you've used:
for i in reversed([idx for idx, (i, j) in enumerate(zip(parts, parts[1:])) if len(i) == len(j) == 1]):
parts[i] = parts[i] + parts.pop(i + 1)
List comprehensions can be a great tool to express things in a usually more concise and sometimes clearer way. In our case, we are definitly in the "more concise part" but I think it is very hard to understand.
Reorganising things slightly may help:
for idx, (i, j) in reversed(list(enumerate(zip(parts, parts[1:])))):
if len(i) == len(j) == 1:
parts[idx] += parts.pop(idx + 1)
'Dog'
? \$\endgroup\$["getALetter", "get_a_letter"]
?) \$\endgroup\$camel_case_to_snake('anSQLquery')
) \$\endgroup\$re
library? It's a standard part of Python (one of the "included batteries", if you like). \$\endgroup\$