Bulk HTTP request queue consumer

Question 1

Today I had to write a small tool to help me send HTTP requests in bulk. Rabbit was overloading my server, so I decided to change my consumers to buffer the contents of the request, before sending. After changing my API, I did this:

#include <amqpcpp.h>
#include <amqpcpp/libev.h>
#include <ev.h>
#include <cpr/cpr.h>
typedef std::unordered_map<std::string, std::ostringstream> HttpBuffer;
const char* RabbitQueue = getenv("RABBIT_NAME");
char QueueAddress[128];
int main()
{
 sprintf(QueueAddress, "amqp://%s:%s", getenv("RABBIT_HOST"), getenv("RABBIT_PORT"));
 int MaxThreads = std::thread::hardware_concurrency();
 std::vector<std::thread> threads;
 for (int i = 0; i < MaxThreads; i++) {
 threads.push_back(std::thread(RabbitThread));
 }
 for (std::thread &thread : threads) {
 thread.join();
 }
 threads.clear();
}
void RabbitThread()
{
 struct ev_loop *loop = ev_loop_new(0);
 AMQP::LibEvHandler handler(loop);
 AMQP::TcpConnection connection(&handler, AMQP::Address(QueueAddress));
 AMQP::TcpChannel channel(&connection);
 AMQP::MessageCallback onMessage = [&channel](const AMQP::Message &message, uint64_t deliveryTag, bool redelivered) {
 HttpBuffer[message.replyTo()] << message.message().c_str() << ",";
 if (HttpBuffer[message.replyTo()].tellp() < 300000) {
 channel.ack(deliveryTag);
 return;
 }
 cpr::PostCallback(handle_response, cpr::Url{"http://localhost:8888"}, cpr::Body{HttpBuffer[message.replyTo()].str()});
 HttpBuffer[message.replyTo()].str("");
 channel.ack(deliveryTag);
 };
 channel.declareQueue(RabbitQueue);
 channel.bindQueue("default", RabbitQueue, "default");
 channel.consume(RabbitQueue).onReceived(onMessage);
 ev_run(loop);
 ev_loop_destroy(loop);
}

What do you guys think? How can I improve this one?

Question 2

Seems strange to run an event loop in each thread. Can you give us some more description of what is happening?

Question 3

Well, i want to have more consumers in a single process. So, i spawn the maximum amount of threads that the machine can handle and i tell them to listen to the queue.

Question 4

I don't know rabbit and have only small nit-picks regarding C++.

emplace

You might consider constructing threads in place instead of moving them to vector.

Instead of:

threads.push_back(std::thread(RabbitThread));

I would try:

threads.emplace_back(RabbitThread);

clear()

Is threads.clear(); in main() necessary?

struct and null pointer

This looks like C-style struct keyword usage. It is very rare to see in C++ (not saying it is wrong).

struct ev_loop *loop = ev_loop_new(0);

This might be also ok and looks more familiar

ev_loop *loop = ev_loop_new(0);

One more thing and this is a wild guess. Is this zero really a NULL pointer?

struct ev_loop *loop = ev_loop_new(0);

This would be more C++11 way:

struct ev_loop *loop = ev_loop_new(nullptr);

hardware_concurrency()

I really have no idea where lies bottleneck in you use case but I suspect it is not CPU bound so maybe you might utilize even much more threads.

int MaxThreads = std::thread::hardware_concurrency();

Question 5

I'm not sure if i understand the emplace. What is the difference? About clear... don't i have to free the resources allocated by the threads themselves?

Question 6

Without hardware_concurrency(), how can i know the maximum amount of threads to run?

Question 7

emplace_back() means only std::thread constructor is called for each member. push_back() means std::thread constructor and std::thread move constructor are called for each member. hardware_concurrency() is definitely not maximum amount of threads! It is number of hardware implemented threads (and not guaranteed to be exact). On Linux you can have thousands of logical threads on single core. See stackoverflow.com/questions/344203/…

Question 8

I've never used the Rabbit libraries, so I'll only comment on the C++.

`#include` what you use

I had to add

#include <unordered_map>
#include <stdlib.h>
#include <string.h>
#include <thread>
#include <vector>

before I could compile. Don't omit the essentials from code reviews!

Error handling

Did you leave out all the error handling to simplify the review? If so, please tell us this in your question. Otherwise, it looks like you've not thought about it at all!

`sprintf`

sprintf(QueueAddress, "amqp://%s:%s", getenv("RABBIT_HOST"), getenv("RABBIT_PORT"));

There's no bounds checking here. You could use snprintf(QueueAddress, sizeof QueueAddress, ...), but I'd lean more to building a std::string (with + or with a std::ostringstream), and use its c_str() if you need a C-style string:

const auto QueueAddress = std::string("amqp://") + getenv("RABBIT_HOST") + ':' + getenv("RABBIT_PORT");
....
auto address = AMQP::Address(QueueAddress.c_str());

threads.push_back

Generally, we prefer emplace_back() when creating new objects in a vector. This saves us from having to think whether the class has an efficient move assignment operator.
threads.clear()

I'm not sure about clearing the vector immediately before it goes out of scope. I can see that it demonstrates a desire to clean up correctly - but perhaps it shows a lack of confidence in scoped variables. It's obviously good practice in garbage-collected environments, but in C++ it just adds noise to the program.
magic constants

What's that 300000 doing in the middle of the program? I can't tell the significance of it, or how it was chosen; it looks like a policy choice mixed in with the implementation. I'd suggest creating a named constant so that we can understand why that test is there. I think it's the minimum buffered data to trigger a reply, so I'll call it BUFFER_THRESHOLD.
naming

Naming standards can always be contentious, and if you have existing conventions, then you should stick with them. But I find the use of PascalCase for variables jarring - C++ code generally uses snake_case or camelCase here. I note that the Rabbit classes use PascalCase for classes, and camelCase for members, so I recommend doing likewise. Especially as your single-word variables (threads, channel, loop) are not capitalised.

This actually is an issue here, as you create a type alias HttpBuffer but then refer to a variable of that name. It's not clear whether the typedef should be a declaration instead, or whether you meant to use the type later to declare a local. It appears that you meant the former, but a consistent naming scheme would have helped.

`HttpBuffer[message.replyTo()]`

It's probably worth keeping a local for the result of this call in onMessage, as it's used four times:

AMQP::MessageCallback onMessage = [&channel](const AMQP::Message &message, uint64_t deliveryTag, bool /*unused*/) {
 auto& reply = HttpBuffer[message.replyTo()];
 reply << message.message().c_str() << ",";
 if (reply.tellp() < BUFFER_THRESHOLD) {
 channel.ack(deliveryTag);
 return;
 }
 cpr::PostCallback(handle_response, cpr::Url{"http://localhost:8888"}, cpr::Body{reply.str()});
 reply.str("");
 channel.ack(deliveryTag);
};

(And what's the type of message.message()? If it's a std::string, then no need to convert to a C string for operator<<.

inverted condition

In the callback, you have condition of the form:

if (c) {
 foo();
 return;
}
...;
foo();
}

It's probably clearer to use the opposite condition to decide whether to take action:

if (!c) {
 ...;
}
foo();
}

This gives you

AMQP::MessageCallback onMessage = [&channel](const AMQP::Message &message, uint64_t deliveryTag, bool /*unused*/) {
 const auto& reply_to = message.replyTo();
 auto& reply = HttpBuffer[message.replyTo()];
 reply << message.message().c_str() << ",";
 if (reply.tellp() >= BUFFER_THRESHOLD) {
 cpr::PostCallback(handle_response, cpr::Url{"http://localhost:8888"}, cpr::Body{reply.str()});
 reply.str("");
 }
 channel.ack(deliveryTag);
};

Final flushing

It's not clear to me whether one or more threads may still have unflushed data at program exit. I can't see any code to ensure that all replies get transmitted.

Question 9

I haven't added error handling. There aren't many errors to handle here, at least, not that i've seen.

Question 10

How can i build an std::string with sprintf?

Question 11

As for the cleanup code, i was just trying to avoid memory leaks.

Question 12

Oh, i also just noticed a bug. If i stop getting messages, HttpBuffer will have zombie data. Any ideas on how to solve that?

Question 13

You don't need sprintf() to build the string - I've edited to show.

Jan Korous Jan Korous 9176 silver badges15 bronze badges · Answer 1 · 2016-06-16 22:16:53Z

I don't know rabbit and have only small nit-picks regarding C++.

emplace

You might consider constructing threads in place instead of moving them to vector.

Instead of:

threads.push_back(std::thread(RabbitThread));

I would try:

threads.emplace_back(RabbitThread);

clear()

Is threads.clear(); in main() necessary?

struct and null pointer

This looks like C-style struct keyword usage. It is very rare to see in C++ (not saying it is wrong).

struct ev_loop *loop = ev_loop_new(0);

This might be also ok and looks more familiar

ev_loop *loop = ev_loop_new(0);

One more thing and this is a wild guess. Is this zero really a NULL pointer?

struct ev_loop *loop = ev_loop_new(0);

This would be more C++11 way:

struct ev_loop *loop = ev_loop_new(nullptr);

hardware_concurrency()

I really have no idea where lies bottleneck in you use case but I suspect it is not CPU bound so maybe you might utilize even much more threads.

int MaxThreads = std::thread::hardware_concurrency();

I'm not sure if i understand the emplace. What is the difference? About clear... don't i have to free the resources allocated by the threads themselves?
Without hardware_concurrency(), how can i know the maximum amount of threads to run?
emplace_back() means only std::thread constructor is called for each member. push_back() means std::thread constructor and std::thread move constructor are called for each member. hardware_concurrency() is definitely not maximum amount of threads! It is number of hardware implemented threads (and not guaranteed to be exact). On Linux you can have thousands of logical threads on single core. See stackoverflow.com/questions/344203/…

Toby Speight Toby Speight 87.5k14 gold badges104 silver badges322 bronze badges · Answer 2 · 2016-06-17 09:23:47Z

I've never used the Rabbit libraries, so I'll only comment on the C++.

`#include` what you use

I had to add

#include <unordered_map>
#include <stdlib.h>
#include <string.h>
#include <thread>
#include <vector>

before I could compile. Don't omit the essentials from code reviews!

Error handling

Did you leave out all the error handling to simplify the review? If so, please tell us this in your question. Otherwise, it looks like you've not thought about it at all!

`sprintf`

sprintf(QueueAddress, "amqp://%s:%s", getenv("RABBIT_HOST"), getenv("RABBIT_PORT"));

There's no bounds checking here. You could use snprintf(QueueAddress, sizeof QueueAddress, ...), but I'd lean more to building a std::string (with + or with a std::ostringstream), and use its c_str() if you need a C-style string:

const auto QueueAddress = std::string("amqp://") + getenv("RABBIT_HOST") + ':' + getenv("RABBIT_PORT");
....
auto address = AMQP::Address(QueueAddress.c_str());

threads.push_back

Generally, we prefer emplace_back() when creating new objects in a vector. This saves us from having to think whether the class has an efficient move assignment operator.
threads.clear()

I'm not sure about clearing the vector immediately before it goes out of scope. I can see that it demonstrates a desire to clean up correctly - but perhaps it shows a lack of confidence in scoped variables. It's obviously good practice in garbage-collected environments, but in C++ it just adds noise to the program.
magic constants

What's that 300000 doing in the middle of the program? I can't tell the significance of it, or how it was chosen; it looks like a policy choice mixed in with the implementation. I'd suggest creating a named constant so that we can understand why that test is there. I think it's the minimum buffered data to trigger a reply, so I'll call it BUFFER_THRESHOLD.
naming

Naming standards can always be contentious, and if you have existing conventions, then you should stick with them. But I find the use of PascalCase for variables jarring - C++ code generally uses snake_case or camelCase here. I note that the Rabbit classes use PascalCase for classes, and camelCase for members, so I recommend doing likewise. Especially as your single-word variables (threads, channel, loop) are not capitalised.

This actually is an issue here, as you create a type alias HttpBuffer but then refer to a variable of that name. It's not clear whether the typedef should be a declaration instead, or whether you meant to use the type later to declare a local. It appears that you meant the former, but a consistent naming scheme would have helped.

`HttpBuffer[message.replyTo()]`

It's probably worth keeping a local for the result of this call in onMessage, as it's used four times:

AMQP::MessageCallback onMessage = [&channel](const AMQP::Message &message, uint64_t deliveryTag, bool /*unused*/) {
 auto& reply = HttpBuffer[message.replyTo()];
 reply << message.message().c_str() << ",";
 if (reply.tellp() < BUFFER_THRESHOLD) {
 channel.ack(deliveryTag);
 return;
 }
 cpr::PostCallback(handle_response, cpr::Url{"http://localhost:8888"}, cpr::Body{reply.str()});
 reply.str("");
 channel.ack(deliveryTag);
};

(And what's the type of message.message()? If it's a std::string, then no need to convert to a C string for operator<<.

inverted condition

In the callback, you have condition of the form:

if (c) {
 foo();
 return;
}
...;
foo();
}

It's probably clearer to use the opposite condition to decide whether to take action:

if (!c) {
 ...;
}
foo();
}

This gives you

AMQP::MessageCallback onMessage = [&channel](const AMQP::Message &message, uint64_t deliveryTag, bool /*unused*/) {
 const auto& reply_to = message.replyTo();
 auto& reply = HttpBuffer[message.replyTo()];
 reply << message.message().c_str() << ",";
 if (reply.tellp() >= BUFFER_THRESHOLD) {
 cpr::PostCallback(handle_response, cpr::Url{"http://localhost:8888"}, cpr::Body{reply.str()});
 reply.str("");
 }
 channel.ack(deliveryTag);
};

Final flushing

It's not clear to me whether one or more threads may still have unflushed data at program exit. I can't see any code to ensure that all replies get transmitted.

I haven't added error handling. There aren't many errors to handle here, at least, not that i've seen.
As for the cleanup code, i was just trying to avoid memory leaks.
Oh, i also just noticed a bug. If i stop getting messages, HttpBuffer will have zombie data. Any ideas on how to solve that?
You don't need sprintf() to build the string - I've edited to show.

Stack Exchange Network

Bulk HTTP request queue consumer

2 Answers 2

emplace

clear()

struct and null pointer

hardware_concurrency()

`#include` what you use

Error handling

`sprintf`

`threads.push_back`

`threads.clear()`

magic constants

naming

`HttpBuffer[message.replyTo()]`

inverted condition

Final flushing

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Bulk HTTP request queue consumer

2 Answers 2

emplace

clear()

struct and null pointer

hardware_concurrency()

#include what you use

Error handling

sprintf

threads.push_back

threads.clear()

magic constants

naming

HttpBuffer[message.replyTo()]

inverted condition

Final flushing

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

`#include` what you use

`sprintf`

`threads.push_back`

`threads.clear()`

`HttpBuffer[message.replyTo()]`