How can I make this code faster? the string can contain characters such as ", .?#" and possibly others.
Const Nums = ['0'..'9'];
function CleanNumber(s: String): Int64;
Var z: Cardinal;
begin
for z := length(s) downto 1 do
if not (s[z] in Nums) then Delete(s,z,1);
if s = '' then
Result := 0 else
Result := StrToInt64(s);
end;
2 Answers 2
Practically speaking, this function is probably not the single bottleneck of your application, which would mean it takes more than 30% of the total run time. Therefore you don't need to optimize this code for speed, but for correctness and legibility.
If you want to get the maximum speed out of this little function nevertheless, you should look at the implementation of StrToInt64
and implement the essential from that function yourself. Approximately like this:
function CleanNumber(s: String): Int64;
var
i, num, limit10, limit1: Int64;
begin
limit10 := MaxInt64 div 10;
limit1 := MaxInt64 mod 10;
num := 0;
for i := 1 to Length(s) do
if (s[i] >= '0') and (s[i] <= '9') then begin
if num > limit10 then raise EValue();
if (num = limit10) and (Ord(s[i]) - Ord('0') > limit1) then raise EValue();
num := 10 * num + Ord(s[i]) - Ord('0');
end;
Result := num;
end;
I didn't test the code and I'm not sure about the correct syntax for throwing exceptions, but I expect the basic algorithm to work fine. Some test cases:
''
'0'
'9'
'100'
'1.0.0'
'1234567890123456789'
'9223372036854775807'
'9223372036854775808' // too large
'-9223372036854775808' // also too large
'aaaaaaaaaaaaaa' // => 0
-
\$\begingroup\$ It is a good thinking to get the result without modifying
s
. Although personally I am against reimplementing built-in functions, unless one is buggy or highly inefficient. I am also not sure whether your version is faster, because I believeStrToInt64
is implemented in assembly and your function has quite a few comparisons. \$\endgroup\$W. Chang– W. Chang2019年09月02日 21:48:54 +00:00Commented Sep 2, 2019 at 21:48 -
\$\begingroup\$ Interesting results, 1 million loops with "1234.5678;": 32-bit: Mine = 120-125ms, yours 65-70ms. 64-bit: Mine = 90-95ms, yours... 15-20ms \$\endgroup\$hikari– hikari2019年09月03日 00:25:37 +00:00Commented Sep 3, 2019 at 0:25
-
\$\begingroup\$ Regarding the number of comparisons: the first two comparisons are hopefully optimized by the compiler into a single comparison. The other ones probably don't hurt because branch prediction always predicts them correctly. The two comparisons with
limit10
can also be merged into a singlecmpq
instruction. All in all, I expect the generated machine code to be more efficient than the Delphi source code appears to be. \$\endgroup\$Roland Illig– Roland Illig2019年09月03日 05:27:42 +00:00Commented Sep 3, 2019 at 5:27
Don't over use set
. It is faster when the items are discrete, but slower when the items are continuous. Although, the difference is very small in your case.
The problem here is that you use delete
repeatedly. Each call moves (copies) a potentially large chuck of characters. This is inefficient because many characters are moved multiple times. For example, 5
in string 1 2 3 4 5
is moved 4 times.
Given that string
in Delphi is mutable, it is better to copy individual characters:
function CleanNumber(s: String): Int64;
Var
z, l: Cardinal;
begin
l := 1;
for z := 1 to length(s) do
if (s[z] >= '0') and (s[z] <= '9') then
begin
s[l] := s[z];
inc(l);
end;
SetLength(s, l - 1);
if s = '' then
Result := 0 else
Result := StrToInt64(s);
end;
It can be further improved:
function CleanNumber(s: String): Int64;
Var
z, l: Cardinal;
begin
l := 1;
while (l <= length(s)) and (s[l] >= '0') and (s[l] <= '9') do
inc(l); // Scan for the first non-numeric char
for z := l + 1 to length(s) do // Start from l + 1
if (s[z] >= '0') and (s[z] <= '9') then
begin
s[l] := s[z];
inc(l);
end;
if l = 1 then Result := 0 else
begin
SetLength(s, l - 1);
Result := StrToInt64(s);
end;
end;
Note: code not tested.
-
1\$\begingroup\$ Why did you choose
z
andl
for the variable names? Especially the assignmentl := 1
is hard to read. \$\endgroup\$Roland Illig– Roland Illig2019年09月02日 20:07:41 +00:00Commented Sep 2, 2019 at 20:07 -
\$\begingroup\$ @Roland Illig: I used
z
because that's what hikari used. I usually usei
andj
for loop variables. I used single charl
instead of my usuallen
because it is more consistent to his/her style. I seldom get confused byl
and1
because they are shown in different colors due to the use of theme. I guess I should. \$\endgroup\$W. Chang– W. Chang2019年09月02日 21:34:06 +00:00Commented Sep 2, 2019 at 21:34