2
\$\begingroup\$

How can I make this code faster? the string can contain characters such as ", .?#" and possibly others.

Const Nums = ['0'..'9'];
function CleanNumber(s: String): Int64;
Var z: Cardinal;
begin
 for z := length(s) downto 1 do
 if not (s[z] in Nums) then Delete(s,z,1);
 if s = '' then
 Result := 0 else
 Result := StrToInt64(s);
end;
dfhwze
14.1k3 gold badges40 silver badges101 bronze badges
asked May 19, 2019 at 18:42
\$\endgroup\$

2 Answers 2

3
\$\begingroup\$

Practically speaking, this function is probably not the single bottleneck of your application, which would mean it takes more than 30% of the total run time. Therefore you don't need to optimize this code for speed, but for correctness and legibility.

If you want to get the maximum speed out of this little function nevertheless, you should look at the implementation of StrToInt64 and implement the essential from that function yourself. Approximately like this:

function CleanNumber(s: String): Int64;
var
 i, num, limit10, limit1: Int64;
begin
 limit10 := MaxInt64 div 10;
 limit1 := MaxInt64 mod 10;
 num := 0;
 for i := 1 to Length(s) do
 if (s[i] >= '0') and (s[i] <= '9') then begin
 if num > limit10 then raise EValue();
 if (num = limit10) and (Ord(s[i]) - Ord('0') > limit1) then raise EValue();
 num := 10 * num + Ord(s[i]) - Ord('0');
 end;
 Result := num;
end;

I didn't test the code and I'm not sure about the correct syntax for throwing exceptions, but I expect the basic algorithm to work fine. Some test cases:

''
'0'
'9'
'100'
'1.0.0'
'1234567890123456789'
'9223372036854775807'
'9223372036854775808' // too large
'-9223372036854775808' // also too large
'aaaaaaaaaaaaaa' // => 0
answered Sep 2, 2019 at 20:56
\$\endgroup\$
3
  • \$\begingroup\$ It is a good thinking to get the result without modifying s. Although personally I am against reimplementing built-in functions, unless one is buggy or highly inefficient. I am also not sure whether your version is faster, because I believe StrToInt64 is implemented in assembly and your function has quite a few comparisons. \$\endgroup\$ Commented Sep 2, 2019 at 21:48
  • \$\begingroup\$ Interesting results, 1 million loops with "1234.5678;": 32-bit: Mine = 120-125ms, yours 65-70ms. 64-bit: Mine = 90-95ms, yours... 15-20ms \$\endgroup\$ Commented Sep 3, 2019 at 0:25
  • \$\begingroup\$ Regarding the number of comparisons: the first two comparisons are hopefully optimized by the compiler into a single comparison. The other ones probably don't hurt because branch prediction always predicts them correctly. The two comparisons with limit10 can also be merged into a single cmpq instruction. All in all, I expect the generated machine code to be more efficient than the Delphi source code appears to be. \$\endgroup\$ Commented Sep 3, 2019 at 5:27
4
\$\begingroup\$

Don't over use set. It is faster when the items are discrete, but slower when the items are continuous. Although, the difference is very small in your case.

The problem here is that you use delete repeatedly. Each call moves (copies) a potentially large chuck of characters. This is inefficient because many characters are moved multiple times. For example, 5 in string 1 2 3 4 5 is moved 4 times.

Given that string in Delphi is mutable, it is better to copy individual characters:

function CleanNumber(s: String): Int64;
Var
 z, l: Cardinal;
begin
 l := 1;
 for z := 1 to length(s) do
 if (s[z] >= '0') and (s[z] <= '9') then
 begin
 s[l] := s[z];
 inc(l);
 end;
 SetLength(s, l - 1);
 if s = '' then
 Result := 0 else
 Result := StrToInt64(s);
end;

It can be further improved:

function CleanNumber(s: String): Int64;
Var
 z, l: Cardinal;
begin
 l := 1;
 while (l <= length(s)) and (s[l] >= '0') and (s[l] <= '9') do
 inc(l); // Scan for the first non-numeric char
 for z := l + 1 to length(s) do // Start from l + 1
 if (s[z] >= '0') and (s[z] <= '9') then
 begin
 s[l] := s[z];
 inc(l);
 end;
 if l = 1 then Result := 0 else
 begin
 SetLength(s, l - 1);
 Result := StrToInt64(s);
 end;
end;

Note: code not tested.

Toby Speight
87.2k14 gold badges104 silver badges322 bronze badges
answered Sep 2, 2019 at 18:55
\$\endgroup\$
2
  • 1
    \$\begingroup\$ Why did you choose z and l for the variable names? Especially the assignment l := 1 is hard to read. \$\endgroup\$ Commented Sep 2, 2019 at 20:07
  • \$\begingroup\$ @Roland Illig: I used z because that's what hikari used. I usually use i and j for loop variables. I used single char l instead of my usual len because it is more consistent to his/her style. I seldom get confused by l and 1 because they are shown in different colors due to the use of theme. I guess I should. \$\endgroup\$ Commented Sep 2, 2019 at 21:34

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.