Considering the suggestion of using modern string arrays instead of char vectors proposed by Cris Luengo, I am trying to make a code snippet for performance comparison of these two cases.
The experimental implementation
%% Print system information
system('systeminfo');
%% Setup: test parameters
concatTestTimes = 25;
repetitionTimes = 10;
fprintf("The performance test will run %d times and the count of the concatenation operation in each test iteration is %d.\n", repetitionTimes, concatTestTimes);
%% Setup: preparing the space for storing evaluation results
resultsOfCharVectors = zeros(1, repetitionTimes);
resultsOfString = zeros(1, repetitionTimes);
%% Run tests
for eachIterate = 1:repetitionTimes
tic;
initialCharVectors = ['123'];
for i = 1:concatTestTimes
initialCharVectors = [initialCharVectors filesep initialCharVectors];
end
resultsOfCharVectors(eachIterate) = toc;
tic;
initialString = "123";
for i = 1:concatTestTimes
initialString = initialString + convertCharsToStrings(filesep) + initialString;
end
resultsOfString(eachIterate) = toc;
end
%% Print test results
fprintf("The average execution time of char vectors: %d.\n", mean(resultsOfCharVectors, 'all'));
fprintf("The average execution time of string: %d.\n", mean(resultsOfString, 'all'));
%% Save results (to Excel file)
outputFolderRoot = '.';
outputFilename = fullfile(outputFolderRoot, 'StringAndCharVecConcatComparisonResults.xlsx');
% Write title
outputTitle = ["Test Iteration" "Performance of using CharVectors" "Performance of using String"];
writematrix(outputTitle, outputFilename, 'Sheet', 1, 'Range', 'A1');
indexColumn = 1:repetitionTimes;
outputData = [indexColumn' resultsOfCharVectors' resultsOfString'];
writematrix(outputData, outputFilename, 'Sheet', 1, 'Range', 'A2');
%% Plot results
figure('Renderer', 'painters', 'Position', [10 10 900 600]);
plot(resultsOfCharVectors);
hold;
plot(resultsOfString);
xlabel('Test Iteration');
ylabel('Execution Time (seconds)');
title('String / Char Vector Concatenation Performance Comparison');
legend('Performance of using CharVectors','Performance of using String');
grid on;
%% Save results
saveas(gcf,'StringAndCharVecConcatComparison.png');
Output of the Proposed Experimental Implementation
Host Name: DESKTOP-DFPCSK8
OS Name: Microsoft Windows 10 Pro
OS Version: 10.0.19043 N/A Build 19043
OS Manufacturer: Microsoft Corporation
OS Configuration: Standalone Workstation
OS Build Type: Multiprocessor Free
Registered Owner: user
Registered Organization:
Product ID: 00330-80000-00000-AA846
Original Install Date: 5/24/2022, 4:44:08 AM
System Boot Time: 6/7/2022, 1:53:46 PM
System Manufacturer: Gigabyte Technology Co., Ltd.
System Model: Z490 AORUS MASTER
System Type: x64-based PC
Processor(s): 1 Processor(s) Installed.
[01]: Intel64 Family 6 Model 165 Stepping 5 GenuineIntel ~2904 Mhz
BIOS Version: American Megatrends Inc. F7, 10/27/2020
Windows Directory: C:\Windows
System Directory: C:\Windows\system32
Boot Device: \Device\HarddiskVolume1
System Locale: en-us;English (United States)
Input Locale: en-us;English (United States)
Time Zone: (UTC+08:00) Taipei
Total Physical Memory: 65,460 MB
Available Physical Memory: 44,378 MB
Virtual Memory: Max Size: 94,520 MB
Virtual Memory: Available: 40,532 MB
Virtual Memory: In Use: 53,988 MB
Page File Location(s): C:\pagefile.sys
Domain: WORKGROUP
Logon Server: \\DESKTOP-DFPCSK8
Hotfix(s): 6 Hotfix(s) Installed.
[01]: KB5013887
[02]: KB5000736
[03]: KB5005716
[04]: KB5014023
[05]: KB5014035
[06]: KB5001405
Network Card(s): 3 NIC(s) Installed.
[01]: Intel(R) Wi-Fi 6 AX201 160MHz
Connection Name: Wi-Fi
DHCP Enabled: Yes
DHCP Server: 1.1.1.1
IP address(es)
[01]: 10.174.192.53
[02]: fe80::6de6:48:1954:9405
[02]: Intel(R) Ethernet Controller I225-V
Connection Name: Ethernet
Status: Media disconnected
[03]: Bluetooth Device (Personal Area Network)
Connection Name: Bluetooth Network Connection
Status: Media disconnected
Hyper-V Requirements: VM Monitor Mode Extensions: Yes
Virtualization Enabled In Firmware: Yes
Second Level Address Translation: Yes
Data Execution Prevention Available: Yes
The performance test will run 10 times and the count of the concatenation operation in each test iteration is 25.
The average execution time of char vectors: 1.902807e-01.
The average execution time of string: 2.932196e-01.
Current plot held
StringAndCharVectorConcatenationPerformanceComparison
About the evaluation results, I found that the performance of using char vectors is slightly better than the case of using string when it comes to the concatenation task. However, I am wondering is the proposed experimental implementation a good way to performance comparison task. Is there any defect or any possible improvement to this kind of task?
All suggestions are welcome.
1 Answer 1
Interesting experiment!
Reducing the value of concatTestTimes
, you see a different behavior. For example with 5, the string one is faster. Somewhere between 10 and 15 the behavior flips. Maybe the string implementation becomes a bit slower when the string becomes too long?
This is what I would have done to test the speeds:
- Create two small anonymous functions that take a string or char array as input and apply the concatenation.
- Use
timeit
to see how much time each function takes.
timeit
is absolutely fantastic. It was first posted on the MATLAB File Exchange by senior MATLAB developer Steve Eddins (here is his blog post at the time). This tool will run the function passed in as often as needed to get a good estimate of the runtime. It takes care of "warming up" MATLAB, and it takes into account the time it takes to call a function handle as well as the loop overhead (so that it only times the actual function).
timeit
returns the time in seconds.
fc = @(s) [s s];
fs = @(s) s + s;
timeit(@() fc('123'))
timeit(@() fs("123"))
Output:
Warning: The measured time for F may be inaccurate because it is running too fast. Try measuring
something that takes longer.
> In timeit (line 158)
ans =
6.7071e-07
ans =
5.4690e-07
This is a very small time, so the measurement is not very stable. Repeated runs give me different values in the range 4.5e-7 to 7.7e-7. Still, most of the time the string concatenation is ever so slightly faster.
You can vary things from there, for example fs = @(s) s + s + s + s + s + s;
, or passing a much longer string as input to the functions.
Note that tic
and toc
are still very useful to time longer pieces of code. If a portion of code takes several seconds to run, putting tic
/toc
around it will give you good information about how much time it takes.
But anything that takes less than a few seconds to run is much better timed with timeit
.
Here's a few comments about your code:
By putting system('systeminfo');
at the top of your script, you limit it to Windows machines. systeminfo
is a shell command that does not exist on other platforms. Your script errored out on this line on my machine.
initialCharVectors = ['123'];
is the same as initialCharVectors = '123';
. The [
/]
operator concatenates things, if you put only one thing in between, it does nothing. The MATLAB Editor even warns you about this.
initialString + convertCharsToStrings(filesep)
is the same as initialString + filesep
. There is no need to explicitly cast to string things you concatenate with a string. This is one of the nice things about strings: conversion is automatic. You can convert a number to a string just by concatenating it with a string!
mean(resultsOfCharVectors, 'all')
. Computing the mean of a series of timings is not always the best solution. For example, when I ran your script, the first iteration took longer than subsequent iterations, because the first time you call a function, the function is loaded into memory and parsed. median()
would provide a better estimate that is not affected by this type of outlier. Also, compute time is affected by other things going on in your machine, which happen at random times and you cannot control. Typically the shortest run time is the one that had least interruptions by other stuff. min()
is therefore a common choice to estimate compute times.
figure('Renderer', 'painters')
. I would recommend against explicitly setting the renderer, and let MATLAB pick the best one for the graphics you're displaying. You'd explicitly pick "painters" only in very specific situations: a 3D graphic that benefits from OpenGL rendering, being exported to a format that supports vector graphics (SVG, EPS), and you explicitly wanting it as vector graphics rather than a bitmap -- which typically doesn't look good anyway. In your case, and in most other cases, the renderer chosen by MATLAB will be the right one.
You should always time code inside a function. Put function xxx
at the top of your script file to turn it into a function. The MATLAB JIT (just-in-time compiler) behaves differently for statements executed at the command line, in scripts, and in functions. Within a function you always get the full benefit of the JIT. You can measure significant timing differences by running code within a function or outside a function.
Another interesting thing to compare is putting char arrays into a cell array vs strings into a string array: {'123','123'}
vs ["123","123"]
.
fc = @(s) {s, s, s, s, s};
fs = @(s) [s, s, s, s, s];
timeit(@() fc('123'))
timeit(@() fs("123"))
ans =
2.2004e-06
ans =
1.2305e-06
-
\$\begingroup\$ "Within a function you always get the full benefit of the JIT". I believe this is only after the first run, since MATLAB must generate native machine level code that is optimized for the function you have written. You save time on subsequent runs. \$\endgroup\$Stewie Griffin– Stewie Griffin2022年06月16日 07:27:43 +00:00Commented Jun 16, 2022 at 7:27
-
\$\begingroup\$ @StewieGriffin: the compilation step is done before the function runs, not while it runs. So timing a part of the function should give results with full benefit of the JIT. The function call itself takes longer the first time. AFAIK, scripts cannot be optimized in the same way because they run in an ever-changing workspace, changing the definition of a variable in the workspace before running a script could change how the script runs. \$\endgroup\$Cris Luengo– Cris Luengo2022年06月16日 13:24:00 +00:00Commented Jun 16, 2022 at 13:24
-
\$\begingroup\$ "The function call itself takes longer the first time. [...] timing a part of the function should give results with full benefit of the JIT." - Good point. Didn't think about that. I only thought about the time it takes to run it, not the timing part (which is the whole point). \$\endgroup\$Stewie Griffin– Stewie Griffin2022年06月17日 06:57:16 +00:00Commented Jun 17, 2022 at 6:57
Explore related questions
See similar questions with these tags.