122

Today, I discovered a rather interesting thing about either g++ or nm...constructor definitions appear to have two entries in libraries.

I have a header thing.hpp:

class Thing
{
 Thing();
 Thing(int x);
 void foo();
};

And thing.cpp:

#include "thing.hpp"
Thing::Thing()
{ }
Thing::Thing(int x)
{ }
void Thing::foo()
{ }

I compile this with:

g++ thing.cpp -c -o libthing.a

Then, I run nm on it:

%> nm -gC libthing.a
0000000000000030 T Thing::foo()
0000000000000022 T Thing::Thing(int)
000000000000000a T Thing::Thing()
0000000000000014 T Thing::Thing(int)
0000000000000000 T Thing::Thing()
 U __gxx_personality_v0

As you can see, both of the constructors for Thing are listed with two entries in the generated static library. My g++ is 4.4.3, but the same behavior happens in clang, so it isn't just a gcc issue.

This doesn't cause any apparent problems, but I was wondering:

  • Why are defined constructors listed twice?
  • Why doesn't this cause "multiple definition of symbol __" problems?

EDIT: For Carl, the output without the C argument:

%> nm -g libthing.a
0000000000000030 T _ZN5Thing3fooEv
0000000000000022 T _ZN5ThingC1Ei
000000000000000a T _ZN5ThingC1Ev
0000000000000014 T _ZN5ThingC2Ei
0000000000000000 T _ZN5ThingC2Ev
 U __gxx_personality_v0

As you can see...the same function is generating multiple symbols, which is still quite curious.

And while we're at it, here is a section of generated assembly:

.globl _ZN5ThingC2Ev
 .type _ZN5ThingC2Ev, @function
_ZN5ThingC2Ev:
.LFB1:
 .cfi_startproc
 .cfi_personality 0x3,__gxx_personality_v0
 pushq %rbp
 .cfi_def_cfa_offset 16
 movq %rsp, %rbp
 .cfi_offset 6, -16
 .cfi_def_cfa_register 6
 movq %rdi, -8(%rbp)
 leave
 ret
 .cfi_endproc
.LFE1:
 .size _ZN5ThingC2Ev, .-_ZN5ThingC2Ev
 .align 2
.globl _ZN5ThingC1Ev
 .type _ZN5ThingC1Ev, @function
_ZN5ThingC1Ev:
.LFB2:
 .cfi_startproc
 .cfi_personality 0x3,__gxx_personality_v0
 pushq %rbp
 .cfi_def_cfa_offset 16
 movq %rsp, %rbp
 .cfi_offset 6, -16
 .cfi_def_cfa_register 6
 movq %rdi, -8(%rbp)
 leave
 ret
 .cfi_endproc

So the generated code is...well...the same.


EDIT: To see what constructor actually gets called, I changed Thing::foo() to this:

void Thing::foo()
{
 Thing t;
}

The generated assembly is:

.globl _ZN5Thing3fooEv
 .type _ZN5Thing3fooEv, @function
_ZN5Thing3fooEv:
.LFB550:
 .cfi_startproc
 .cfi_personality 0x3,__gxx_personality_v0
 pushq %rbp
 .cfi_def_cfa_offset 16
 movq %rsp, %rbp
 .cfi_offset 6, -16
 .cfi_def_cfa_register 6
 subq 48,ドル %rsp
 movq %rdi, -40(%rbp)
 leaq -32(%rbp), %rax
 movq %rax, %rdi
 call _ZN5ThingC1Ev
 leaq -32(%rbp), %rax
 movq %rax, %rdi
 call _ZN5ThingD1Ev
 leave
 ret
 .cfi_endproc

So it is invoking the complete object constructor.

osgx
95.3k58 gold badges389 silver badges531 bronze badges
asked Aug 3, 2011 at 3:22
8
  • 10
    You're obfuscating your problem with the -C flag to nm. If you leave it off, you'll see that the constructors that are emitted in fact have different symbols (which is the answer to your second question). I have no idea why two identical constructors are emitted with different symbol names, but I'm trying to read up on that now... more if I figure it out. Commented Aug 3, 2011 at 3:32
  • 3
    Your output looks roughly the same as what I get here - so the question, really, is "what's the difference between the mangled name with a C1 in it versus that with a C2 in it?", and I have no answer to that question. I'm surprised the documentation doesn't have more about it.... hrm. Commented Aug 3, 2011 at 3:48
  • Its interesting that the exact same behavior happens in two different compilers. Commented Aug 3, 2011 at 3:52
  • 1
    I'd be interested to see which one a subclass calls and which one new calls... Commented Aug 3, 2011 at 3:56
  • 2
    Possibly relevant: stackoverflow.com/questions/6613870/… Commented Aug 3, 2011 at 4:17

1 Answer 1

172

We'll start by declaring that GCC follows the Itanium C++ ABI.


According to the ABI, the mangled name for your Thing::foo() is easily parsed:

_Z | N | 5Thing | 3foo | E | v
prefix | nested | `Thing` | `foo`| end nested | parameters: `void`

You can read the constructor names similarly, as below. Notice how the constructor "name" isn't given, but instead a C clause:

_Z | N | 5Thing | C1 | E | i
prefix | nested | `Thing` | Constructor | end nested | parameters: `int`

But what's this C1? Your duplicate has C2. What does this mean?

Well, this is quite simple too:

 <ctor-dtor-name> ::= C1 # complete object constructor
 ::= C2 # base object constructor
 ::= C3 # complete object allocating constructor
 ::= D0 # deleting destructor
 ::= D1 # complete object destructor
 ::= D2 # base object destructor

Wait, why is this simple? This class has no base. Why does it have a "complete object constructor" and a "base object constructor" for each?

  • This Q&A implies to me that this is simply a by-product of polymorphism support, even though it's not actually required in this case.

  • Note that c++filt used to include this information in its demangled output, but doesn't any more.

  • This forum post asks the same question, and the only response doesn't do any better at answering it, except for the implication that GCC could avoid emitting two constructors when polymorphism is not involved, and that this behaviour ought to be improved in the future.

  • This newsgroup posting describes a problem with setting breakpoints in constructors due to this dual-emission. It's stated again that the root of the issue is support for polymorphism.

In fact, this is listed as a GCC "known issue":

G++ emits two copies of constructors and destructors.

In general there are three types of constructors (and destructors).

  • The complete object constructor/destructor.
  • The base object constructor/destructor.
  • The allocating constructor/deallocating destructor.

The first two are different, when virtual base classes are involved.


The meaning of these different constructors seems to be as follows:

  • The "complete object constructor". It additionally constructs virtual base classes.

  • The "base object constructor". It creates the object itself, as well as data members and non-virtual base classes.

  • The "allocating object constructor". It does everything the complete object constructor does, plus it calls operator new to actually allocate the memory... but apparently this is not usually seen.

If you have no virtual base classes, [the first two] are are identical; GCC will, on sufficient optimization levels, actually alias the symbols to the same code for both.

answered Aug 3, 2011 at 3:59
Sign up to request clarification or add additional context in comments.

6 Comments

Hooray for an answer - I think I was closing in on this, but it's good to see the right information.
@Tomalak Geret'kal: +1, for a very detailed research for answering the Q.
This is an awesome answer, but is there documentation for that the difference between these constructor types? Mostly: What is an "allocating constructor" and a "deleting destructor"? Are they for overloading operator new and operator delete?
@Travis: I'm not entirely sure yet. bdonlan [argh, SO, quit limiting my notifications in comments FFS] pointed out this highly-related question, and there appears to be lots of pertinent information there.
@Travis: Yes, I think that they are. I don't want this answer to turn into general documentation for the entire construction/destruction process, but I briefly cover that in my latest edit.
|

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.