How can I purge c++ source code?

Question 1

Suppose there is a C++ source code base of millions of lines composed of several hundred *.cpp and *.h files.

There is also a driver program main.cpp that uses several header files from the above source code.

Suppose I want to scan the entire code base and

(1) pull only those files that are used by main.cpp,
and
(2) keep only those classes and functions that are used by main.cpp

I have finished step step#1. How can I do the #2?

How can I do that?

Question 2

What is your goal here? Because C/C++ is completely messed up when it comes to static analyzability, so there is no good way short of instrumenting the compiler. But there might be a couple of ways that might be good enough for your specific purposes.

Question 3

@candied_orange Yes, that was one of my thoughts – compile the compilation unit, then use objdump or readelf to extract the linker tables from the object code to see which other functions it depends on. This is 100% reliable, but tedious and only gives us objects with linkage (e.g. functions, extern variables). It cannot tell us anything about classes, templates, typedefs, macros, and so on. If OP is just trying to navigate the source code, a language server like clangd might be more appropriate (or a conventional C++ IDE).

Question 4

Ah, some other strategies that might be useful: (1) Put the code into Git, or create a new branch. Then remove stuff that looks irrelevant and check if it still compiles. If so, commit and repeat. If not, revert to last commit and try remove something else. With a bit of intuition, this should quickly get from the "millions" into the "thousands" of lines. (2) Run the code under a profiler (e.g. valgrind-callgrind or gcc --coverage) and use a tool to draw the call graph. Unlike linker stuff this won't be perfect, but it might highlight the most important relationships.

Question 5

A good "characterization test" that makes it do the most important thing it should be able to do would be handy. On the off chance you break its behavior without breaking the build.

Question 6

I have to question why understanding an algorithm requires deleting other code not used in the algorithm. You should be able to navigate directly to the algorithm's actual dependencies (and their dependencies) with a correctly set-up IDE, so just do that instead of reading whole files and worrying about which parts are relevant.

Question 7

This is not easy and will be significant work. You first want to reach a stage where you can pick any set of files and they either compile or they don’t, but if they compile, they will give the same result. Why wouldn’t they? For example because header files #define things that other header files or source files use.

Say your source file says

#ifndef PI
 #define PI 3.15
#endif

If you stop including a header file with a precise definition of PI and nothing else, your program is suddenly very inaccurate.

So you look for things defined in header files and where they are used. The source file here would be changed, the PI madness removed and the header file included instead.

Look for things with context. I hope nobody does this, but someone could open a namespace and include a header file. So if you change include order things compile differently. Find and remove all rubbish like that.

Now make all headerfiles standalone. That means a C++ source file stating just "#include myheader.h" must compile. So you comment out all include statements. Then try compiling the first header file. If it doesn’t compile fix it by making it include everything it needs. If a.h needs something from b.h then you include b.h. If b.h doesn’t compile on its own then you fix it first and so on until all your header files compile and act the same no matter how they are included.

Then you do the same with your source files. And then your main can build with the source files you want after adding everything needed to fix linker errors.

Now if you find you have a huge source file that includes some small utility that you need, and a huge bunch of stuff you don’t care about, you split it up. And so on.

Question 8

The long road to sanity begins with a single step... ;-)

Question 9

The linker removes unreferenced code. The map file generated by the linker will show what functions are still referenced. You can then begin removing portions of code in main.cpp, compile, link, and verify if the algorithm of interest is still referenced in the map file.

Question 10

where can I find this map file? I am using MSYS2, CLion and Visual Studio 2017.

Question 11

In Visual Studio, go to the project properties -> Configuration Properties -> Linker -> Debugging -> Generate Map File to output a map file. There is an option to give the map file a name different from the output file. The generated map file will be located somewhere in the configuration directory (Release, Debug, etc.) with a .map extension.

gnasher729 gnasher729 49.3k4 gold badges71 silver badges137 bronze badges · Answer 1 · 2023-10-11 14:36:37Z

This is not easy and will be significant work. You first want to reach a stage where you can pick any set of files and they either compile or they don’t, but if they compile, they will give the same result. Why wouldn’t they? For example because header files #define things that other header files or source files use.

Say your source file says

#ifndef PI
 #define PI 3.15
#endif

If you stop including a header file with a precise definition of PI and nothing else, your program is suddenly very inaccurate.

So you look for things defined in header files and where they are used. The source file here would be changed, the PI madness removed and the header file included instead.

Look for things with context. I hope nobody does this, but someone could open a namespace and include a header file. So if you change include order things compile differently. Find and remove all rubbish like that.

Now make all headerfiles standalone. That means a C++ source file stating just "#include myheader.h" must compile. So you comment out all include statements. Then try compiling the first header file. If it doesn’t compile fix it by making it include everything it needs. If a.h needs something from b.h then you include b.h. If b.h doesn’t compile on its own then you fix it first and so on until all your header files compile and act the same no matter how they are included.

Then you do the same with your source files. And then your main can build with the source files you want after adding everything needed to fix linker errors.

Now if you find you have a huge source file that includes some small utility that you need, and a huge bunch of stuff you don’t care about, you split it up. And so on.

The long road to sanity begins with a single step... ;-)

Deduplicator
– Deduplicator

2023年10月11日 18:07:17 +00:00
Commented Oct 11, 2023 at 18:07

Steve Mathwig Steve Mathwig 1771 bronze badge · Answer 2 · 2023-10-11 13:54:05Z

1

The linker removes unreferenced code. The map file generated by the linker will show what functions are still referenced. You can then begin removing portions of code in main.cpp, compile, link, and verify if the algorithm of interest is still referenced in the map file.

Share

Improve this answer

answered Oct 11, 2023 at 13:54

Steve Mathwig's user avatar

Steve Mathwig Steve Mathwig

1771 bronze badge

2

where can I find this map file? I am using MSYS2, CLion and Visual Studio 2017.

user366312
– user366312

2023年10月11日 14:07:35 +00:00
Commented Oct 11, 2023 at 14:07
In Visual Studio, go to the project properties -> Configuration Properties -> Linker -> Debugging -> Generate Map File to output a map file. There is an option to give the map file a name different from the output file. The generated map file will be located somewhere in the configuration directory (Release, Debug, etc.) with a .map extension.

Steve Mathwig
– Steve Mathwig

2023年10月12日 14:33:16 +00:00
Commented Oct 12, 2023 at 14:33

Add a comment |

Stack Exchange Network

How can I purge c++ source code?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

How can I purge c++ source code?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions