Atomic operations using extended inline assembler of C32

Question 1

I'm trying to write atomic code, in my example below I need to perform simple operation a ^= 1;

 static volatile int a = 0;
 //-- a ^= 1;
 __asm__ __volatile__( "xori %0, %0, 1"
 : "=r"(a)
 : "r"(a)
 );

Generated code is not atomic:

9D0014E8 8F828018 LW V0, -32744(GP)
9D0014EC 38420001 XORI V0, V0, 1
9D0014F0 AF828018 SW V0, -32744(GP)

As I see in docs, operations LL and SC provide atomic Read-Modify-Write sequence. How can I make compiler to generate code with LL, SC instead of LW, SW? I tried to write such code myself:

 static volatile int a = 0;
 __asm__ __volatile__( "ll $t1, 0(%0)": : "r"(a) );
 __asm__ __volatile__( "xori $t1, $t1, 1" );
 __asm__ __volatile__( "sc $t1, 0(%0)": : "r"(a) );

But this is wrong, result is other than I need:

140: __asm__ __volatile__( "ll $t1, 0(%0)": : "r"(a) );
9D001454 8F828018 LW V0, -32744(GP) # WRONG! | I need for LL T1, -32744(GP) instead of
9D001458 C0490000 LL T1, 0(V0) # WRONG! | these two LW, LL instructions
141: __asm__ __volatile__( "xori $t1, $t1, 1" );
9D00145C 39290001 XORI T1, T1, 1
142: __asm__ __volatile__( "sc $t1, 0(%0)": : "r"(a) );
9D001460 8F828018 LW V0, -32744(GP) # WRONG! | I need for SC T1, -32744(GP) instead of
9D001464 E0490000 SC T1, 0(V0) # WRONG! | these two LW, SC instructions

How can I do that?

Question 2

Which chip is this ?

Question 3

It's PIC32MX440F512H

Question 4

Well, there's one of these happy moments when I need just to ask someone, and solution comes to my head immediately:

 __asm__ __volatile__( "ll $t1, 0(%0)": : "r"(&a) );
 __asm__ __volatile__( "xori $t1, $t1, 1" );
 __asm__ __volatile__( "sc $t1, 0(%0)": : "r"(&a) );

I.e. I need to use &a instead of a. Now, generated code is:

104: __asm__ __volatile__( "ll $t1, 0(%0)": : "r"(&a) );
9D001434 27828018 ADDIU V0, GP, -32744
9D001438 C0490000 LL T1, 0(V0)
105: __asm__ __volatile__( "xori $t1, $t1, 1" );
9D00143C 39290001 XORI T1, T1, 1
106: __asm__ __volatile__( "sc $t1, 0(%0)": : "r"(&a) );
9D001440 E0490000 SC T1, 0(V0)

Which seems to be what I need. Note: to make it better, we need to use "beqz" instruction in order to loop if SC failed (there's an example in MIPS32 instruction quick reference). But this is another story.

More, at microchip forum user andersm suggested to use GCC's atomic builtins instead of re-inventing the wheel. (But, these builtins add two sync instructions that are useless on PIC32, so, it might make sense to write my own macro)

Question 5

Without researching it too deeply, it looks like this instruction pairing is only potential atomic - ie, it either works atomically or else it fails to write back and sets a flag. You don't seem to be checking for / handling the failure possibility in the way the example code at your instruction reference link does. It is permissible to accept your own answer if you are fully satisfied with it.

Question 6

@ChrisStratton, i can't understand which instruction pairing are you talking about, and how instruction pairing might make code atomic in general. Atomicity in the code above is achieved by instructions LL and SC. As to assepting my answer, I'll surely accept it, when system will permit it to me (I can accept it after two days only)

Question 7

This answer admittedly solves the issue of incorrect assembly generation, but what Chris meant was that three separate assembly instructions cannot ensure a single atomic operation. This code can be interrupted at any point in between these instructions, so if you don't check if SC failed, you don't gain any benefits compared to your initial code in terms of atomicity. The other forum user was right that it's better not to reinvent the wheel and simply use __sync_fetch_and_xor instead (if your compiler/mcu combo permits it).

Dmitry Frank Dmitry Frank 3071 silver badge9 bronze badges · Answer 1 · 2013-07-05 12:44:19Z

Well, there's one of these happy moments when I need just to ask someone, and solution comes to my head immediately:

 __asm__ __volatile__( "ll $t1, 0(%0)": : "r"(&a) );
 __asm__ __volatile__( "xori $t1, $t1, 1" );
 __asm__ __volatile__( "sc $t1, 0(%0)": : "r"(&a) );

I.e. I need to use &a instead of a. Now, generated code is:

104: __asm__ __volatile__( "ll $t1, 0(%0)": : "r"(&a) );
9D001434 27828018 ADDIU V0, GP, -32744
9D001438 C0490000 LL T1, 0(V0)
105: __asm__ __volatile__( "xori $t1, $t1, 1" );
9D00143C 39290001 XORI T1, T1, 1
106: __asm__ __volatile__( "sc $t1, 0(%0)": : "r"(&a) );
9D001440 E0490000 SC T1, 0(V0)

Which seems to be what I need. Note: to make it better, we need to use "beqz" instruction in order to loop if SC failed (there's an example in MIPS32 instruction quick reference). But this is another story.

More, at microchip forum user andersm suggested to use GCC's atomic builtins instead of re-inventing the wheel. (But, these builtins add two sync instructions that are useless on PIC32, so, it might make sense to write my own macro)

Without researching it too deeply, it looks like this instruction pairing is only potential atomic - ie, it either works atomically or else it fails to write back and sets a flag. You don't seem to be checking for / handling the failure possibility in the way the example code at your instruction reference link does. It is permissible to accept your own answer if you are fully satisfied with it.
@ChrisStratton, i can't understand which instruction pairing are you talking about, and how instruction pairing might make code atomic in general. Atomicity in the code above is achieved by instructions LL and SC. As to assepting my answer, I'll surely accept it, when system will permit it to me (I can accept it after two days only)
This answer admittedly solves the issue of incorrect assembly generation, but what Chris meant was that three separate assembly instructions cannot ensure a single atomic operation. This code can be interrupted at any point in between these instructions, so if you don't check if SC failed, you don't gain any benefits compared to your initial code in terms of atomicity. The other forum user was right that it's better not to reinvent the wheel and simply use __sync_fetch_and_xor instead (if your compiler/mcu combo permits it).

Stack Exchange Network

Atomic operations using extended inline assembler of C32

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Atomic operations using extended inline assembler of C32

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions