Suppose there is a C++ source code base of millions of lines composed of several hundred *.cpp
and *.h
files.
There is also a driver program main.cpp
that uses several header files from the above source code.
Suppose I want to scan the entire code base and
(1) pull only those files that are used by main.cpp
,
and
(2) keep only those classes and functions that are used by main.cpp
I have finished step step#1. How can I do the #2?
How can I do that?
2 Answers 2
This is not easy and will be significant work. You first want to reach a stage where you can pick any set of files and they either compile or they don’t, but if they compile, they will give the same result. Why wouldn’t they? For example because header files #define things that other header files or source files use.
Say your source file says
#ifndef PI
#define PI 3.15
#endif
If you stop including a header file with a precise definition of PI and nothing else, your program is suddenly very inaccurate.
So you look for things defined in header files and where they are used. The source file here would be changed, the PI madness removed and the header file included instead.
Look for things with context. I hope nobody does this, but someone could open a namespace and include a header file. So if you change include order things compile differently. Find and remove all rubbish like that.
Now make all headerfiles standalone. That means a C++ source file stating just "#include myheader.h" must compile. So you comment out all include statements. Then try compiling the first header file. If it doesn’t compile fix it by making it include everything it needs. If a.h needs something from b.h then you include b.h. If b.h doesn’t compile on its own then you fix it first and so on until all your header files compile and act the same no matter how they are included.
Then you do the same with your source files. And then your main can build with the source files you want after adding everything needed to fix linker errors.
Now if you find you have a huge source file that includes some small utility that you need, and a huge bunch of stuff you don’t care about, you split it up. And so on.
-
The long road to sanity begins with a single step... ;-)Deduplicator– Deduplicator2023年10月11日 18:07:17 +00:00Commented Oct 11, 2023 at 18:07
The linker removes unreferenced code. The map file generated by the linker will show what functions are still referenced. You can then begin removing portions of code in main.cpp, compile, link, and verify if the algorithm of interest is still referenced in the map file.
-
where can I find this map file? I am using MSYS2, CLion and Visual Studio 2017.user366312– user3663122023年10月11日 14:07:35 +00:00Commented Oct 11, 2023 at 14:07
-
In Visual Studio, go to the project properties -> Configuration Properties -> Linker -> Debugging -> Generate Map File to output a map file. There is an option to give the map file a name different from the output file. The generated map file will be located somewhere in the configuration directory (Release, Debug, etc.) with a .map extension.Steve Mathwig– Steve Mathwig2023年10月12日 14:33:16 +00:00Commented Oct 12, 2023 at 14:33
objdump
orreadelf
to extract the linker tables from the object code to see which other functions it depends on. This is 100% reliable, but tedious and only gives us objects with linkage (e.g. functions, extern variables). It cannot tell us anything about classes, templates, typedefs, macros, and so on. If OP is just trying to navigate the source code, a language server likeclangd
might be more appropriate (or a conventional C++ IDE).gcc --coverage
) and use a tool to draw the call graph. Unlike linker stuff this won't be perfect, but it might highlight the most important relationships.