C++ Winsock HTTP Request Library

Question 1

I'm trying to write a HTTP Request library for practice with Sockets and C++ in general. I created a seperate function to setup the struct needed to Connect() however it is now breaking and I cant seem to figure out why even with a debugger. Also looking for any code optimization I can be doing better.

#include <winsock2.h>
#include <ws2tcpip.h>
#include <iostream>
#include <string>
#include <vector>
#pragma comment (lib, "Ws2_32.lib")
/*
 Struct For Holding Data Pre/Post HTTP Request
*/
struct HTTPRequest_s
{
 std::string Host; // Host
 std::string Path; // Path
 short Port; // Port
 short HTTPCode; // 404, 301..etc
 std::string Response; // Holds Data We Get Back
 std::vector<std::pair<std::string, std::string>> PostParams; // Holds Any Data We POST (If Applicable)
};
bool InitWinsock()
{
 WSADATA WsaDat;
 if (WSAStartup(MAKEWORD(2, 2), &WsaDat) != 0)
 {
 std::cout << "Failed To Start Winsocket With WSAStartup()\n";
 WSACleanup();
 return(false);
 }
 return(true);
}
/*
 Recieve Everything Including Headers
 (Note: Sometimes A Server Will Reply Without A 'Content-Length' Property)
*/
std::string loopRecieve(SOCKET Sock)
{
 char recvBuf[256]; // For Transporting Data
 std::string outBuf; // Output String
 unsigned int nret;
 while (true)
 {
 nret = recv(Sock, recvBuf, sizeof(recvBuf), 0);
 if (nret == -1)
 {
 printf("A Socket Error Occured(-1)\n");
 break;
 }
 else if (nret == 0)
 {
 printf("Done Reading!\n");
 break;
 }
 // Append Newly Recieved Bytes To String
 outBuf.append(recvBuf, nret);
 }
 return(outBuf);
}
/*
 Create The Struct We Need For Connection
*/
void CreateAddrInfoStruct(HTTPRequest_s *HTTP_s, sockaddr_in *servAddr)
{
 int nret;
 ADDRINFO *pResult = NULL,
 hints;
 // Strip Out 'http://' && 'https://'
 if (HTTP_s->Host.find("http://") != -1) {
 HTTP_s->Host = HTTP_s->Host.substr(HTTP_s->Host.find("http://") + 7);
 }
 else if (HTTP_s->Host.find("https://") != -1) {
 printf("SSL Not Supported Yet\n");
 return;
 }
 hints = { 0 };
 hints.ai_flags = AI_ALL;
 hints.ai_family = PF_INET;
 hints.ai_protocol = IPPROTO_IPV4;
 nret = getaddrinfo(HTTP_s->Host.c_str(), nullptr, &hints, &pResult);
 if (nret != 0)
 {
 std::cout << "Failed To Do GetAddrInfo()\n";
 return;
 }
 servAddr = { 0 };
 servAddr->sin_family = AF_INET;
 servAddr->sin_addr.S_un.S_addr = *((ULONG*)&(((sockaddr_in*)pResult->ai_addr)->sin_addr));
 servAddr->sin_port = htons(HTTP_s->Port);
}
/*
 Basic GET
*/
void getWebPage(HTTPRequest_s *HTTP_s)
{
 int nret;
 SOCKET theSocket;
 sockaddr_in servAddr;
 // Create Our Struct
 CreateAddrInfoStruct(HTTP_s, &servAddr);
 theSocket = socket(AF_INET, SOCK_STREAM, 0);
 if (theSocket == INVALID_SOCKET)
 {
 std::cout << "Socket Is Invalid, Is Winsock Initialized?\n";
 return;
 }
 nret = connect(theSocket, (sockaddr*)&servAddr, sizeof(servAddr));
 if (nret == SOCKET_ERROR)
 {
 std::cout << "Failed To Connect To Host\n";
 return;
 }
 std::string request;
 request = "GET " + HTTP_s->Path + " HTTP/1.1" + "\r\n";
 request += "Host: " + HTTP_s->Host + "\r\n";
 request += "Accept: */*\r\n";
 request += "Accept-Language: en-us\r\n";
 request += "Connection: close\r\n";
 request += "User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.0\r\n";
 request += "Referer: http://" + HTTP_s->Host + "\r\n";
 request += "\r\n";
 nret = send(theSocket, request.c_str(), request.length(), 0);
 if (nret == SOCKET_ERROR)
 {
 std::cout << "Failed To Send To Host\n";
 return;
 }
 HTTP_s->Response = loopRecieve(theSocket);
 closesocket(theSocket);
}
/*
 Basic POST
*/
void postWebPage(HTTPRequest_s *HTTP_s)
{
 int nret;
 SOCKET theSocket;
 sockaddr_in servAddr;
 // Create Our Struct
 CreateAddrInfoStruct(HTTP_s, &servAddr);
 theSocket = socket(AF_INET, SOCK_STREAM, 0);
 if (theSocket == INVALID_SOCKET)
 {
 std::cout << "Socket Is Invalid, Is Winsock Initialized?\n";
 return;
 }
 nret = connect(theSocket, (sockaddr*)&servAddr, sizeof(servAddr));
 if (nret == SOCKET_ERROR)
 {
 std::cout << "Failed To Connect To Host\n";
 return;
 }
 // Structure POST Data Properly
 std::string concatPostData;
 for (auto i : HTTP_s->PostParams)
 {
 concatPostData += i.first + "=" + i.second + "&";
 }
 concatPostData.pop_back(); // Pop Off Extra '&' At The End Of Loop (Note: Might Not Need To Pop This Off)
 // Construct HEADER
 std::string header;
 header = "POST " + HTTP_s->Path + " HTTP/1.1\r\n";
 header += "Host: " + HTTP_s->Host + ":" + std::to_string(HTTP_s->Port) + "\r\n";
 header += "User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.0\r\n";
 header += "Referer: http://" + HTTP_s->Host + "\r\n";
 header += "Content-Type: application/x-www-form-urlencoded\r\n";
 header += "Content-Length: " + std::to_string(concatPostData.length()) + "\r\n";
 header += "Accept-Charset: utf-8\r\n";
 header += "Connection: close\r\n\r\n";
 header += concatPostData + "\r\n";
 header += "\r\n";
 // Need To Check Here That We Sent All Data
 nret = send(theSocket, header.c_str(), header.length(), 0);
 if (nret == SOCKET_ERROR)
 {
 std::cout << "Failed To Send To Host\n";
 return;
 }
 HTTP_s->Response = loopRecieve(theSocket);
 closesocket(theSocket);
}
int main()
{
 // Init Winsock So We Can Use Sockets
 if (InitWinsock() != true)
 {
 printf("Failed To Init Winsock!\n");
 return(0);
 }
 // Setup HTTPRequest Structure
 HTTPRequest_s Http_s;
 Http_s.Host = "http://google.com";
 Http_s.Path = "/";
 Http_s.Port = 80;
 // Make Our Request
 getWebPage(&Http_s);
 // Print The Data From Our Request Result
 printf("Data Dump:\n%s", Http_s.Response.c_str());
 system("pause");
 return(0);
}

Question 2

As glampert pointed out this is a very good example for a class. I took the liberty to start converting it into one (I did not review the POST function at all so it also doesn't appear in that class).

The first step was adding an constructor, which takes the Host & Port arguments needed to make an connection. I left out the Path component so that if you ever want to send multiple requests over the same connection you can do that.
That meant that your original getWebPage() was a member of that new class and has only one argument: Path.
Your CreateAddrInfoStruct() and loopRecieve() functions also will ever only be used in this class, so I also put them into it.
That meant I could make the socket a private member, and remove the sockaddr_in* argument from CreateAddrInfoStruct(). As loopReciev()e also was a member function I removed the return value and made Response also a member.

One of the fundamental rules of coding is: Keep it simple, stupid.
If you do not have a feature yet (In your case e.g. the SSL part) don't add any code to handle that.

CreateAddrInfoStruct() resolved the domain name but if the domain used Round-Robin DNS, it only looked at the first result.
So the connect() from getWebPage() was moved into that function with a nice little loop around it.

As tempting it is to do all debugging via console output, once you reuse that library you need to remove it anyway. If you really think you need console output, print to std::cerr.

The final product is:

#include <winsock2.h>
#include <ws2tcpip.h>
#include <iostream>
#include <string>
#include <vector>
#pragma comment (lib, "Ws2_32.lib")
class HTTPRequest
{
private:
 const std::string Host;
 const short Port;
 SOCKET Sock;
 std::string Response;
 bool loop_recieve();
 bool resolve_and_connect();
public:
 HTTPRequest::HTTPRequest(const std::string& host, const short port)
 : Host(host), Port(port)
 { 
 if((Sock = socket(AF_INET, SOCK_STREAM, 0)) == INVALID_SOCKET)
 {
 throw std::exception("Couldn't create socket");
 }
 }
 HTTPRequest::~HTTPRequest()
 {
 closesocket(Sock);
 }
 std::string get_response()
 {
 return Response;
 }
 bool get_request(const std::string& path);
};
bool InitWinsock()
{
 WSADATA WsaDat;
 if(WSAStartup(MAKEWORD(2, 2), &WsaDat) != 0)
 {
 return false;
 }
 return true;
}
/*
Recieve Everything Including Headers
(Note: Sometimes A Server Will Reply Without A 'Content-Length' Property)
*/
bool HTTPRequest::loop_recieve()
{
 while(true)
 {
 char recvBuf[256];
 auto nret = recv(Sock, recvBuf, sizeof(recvBuf), 0);
 if(nret == -1)
 {
 return false;
 }
 else if(nret == 0)
 {
 break;
 }
 // Append Newly Recieved Bytes To String
 Response.append(recvBuf, nret);
 }
 return true;
}
/*
Create The Struct We Need For Connection
*/
bool HTTPRequest::resolve_and_connect()
{
 bool ret = false;
 ADDRINFO hints = {0};
 hints.ai_flags = AI_ALL;
 hints.ai_family = PF_INET;
 hints.ai_protocol = IPPROTO_IPV4;
 ADDRINFO *pResult = nullptr;
 if(getaddrinfo(Host.c_str(), nullptr, &hints, &pResult))
 {
 return false;
 }
 sockaddr_in servAddr = {0};
 servAddr.sin_family = AF_INET; //! todo: IPv6 support
 servAddr.sin_port = htons(Port);
 for(size_t i = 0; i < pResult->ai_addrlen; i++)
 {
 servAddr.sin_addr.S_un.S_addr = (ULONG)((sockaddr_in*)&pResult->ai_addr[i])->sin_addr.S_un.S_addr;
 if(connect(Sock, (sockaddr*)&servAddr, sizeof(servAddr)) != SOCKET_ERROR)
 {
 ret = true;
 break;
 }
 }
 freeaddrinfo(pResult);
 return ret;
}
/*
Basic GET
 \todo Check whether path is legitmate
*/
bool HTTPRequest::get_request(const std::string& path)
{
 if(!resolve_and_connect())
 return false;
 std::string request = "GET " + path + " HTTP/1.1" + "\r\n";
 request += "Host: " + Host + "\r\n";
 request += "Accept: */*\r\n";
 request += "Accept-Language: en-us\r\n";
 request += "User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.0\r\n";
 request += "Connection: close\r\n";
 request += "\r\n";
 if(send(Sock, request.c_str(), request.length(), 0) == SOCKET_ERROR)
 {
 return false;
 }
 return loop_recieve();
}
int main()
{
 // Init Winsock So We Can Use Sockets
 if(!InitWinsock())
 {
 return(0);
 }
 HTTPRequest Http("google.com", 80);
 // Make Our Request
 if(Http.getWebPage("/"))
 {
 std::cout << Http.get_response();
 }
 WSACleanup();
 std::cin.get();
 return(0);
}

Question 3

Any particular reason for not using a class?

A class would be the "default" way of implementing what you have here. The way it is right now, it seems like you've transcribed a C implementation into C++.

HTTPRequest could be a class, with apparently only getWebPage() as public, since that's all your main calls (probably also postWebPage?). InitWinsock would then be done inside the class constructor.

By the way, a well written program doesn't leak resources, so you should also call WSACleanup at some point, either in a destructor or before exiting main.

`string.find` doesn't return -1

if (HTTP_s->Host.find("http://") != -1) {

It returns std::string::npos if the substring is not found. Comparing against -1 is not guaranteed to be portable, not even across different versions of the same compiler.

Avoid mixing `printf` and `cout`

Keep the code uniform. Avoid mixing both printing models, but also, std::cout and the standard C++ streams are more robust than printf and the C IO library.

Format strings tend to be more readable and there is some value in separating the data from the presentation, but unfortunately, they are a C mechanism that is incompatible with C++'s object model. So you cannot, for instance, pass a C++ class type to one of those functions.

Format strings are also not type safe, since you wave compile-time type inference and rely on manually matching the type of each parameter with the %s in the string's body. In the end, that's a recipe for disaster and these issues outweigh any gains in readability. Having to manually specify the type in the format, when the compiler always knows the type of the variable passed in, is also a violation of the DRY principle.

Send error messages to the error stream

There's a standard stream meant for printing errors, stderr, or in C++ land, std::cerr. Don't send both errors and normal output to stdout. If you do so, users can't easily filter error output from normal program output.

Keep naming uniform

Watch out for names that break uniformity. Use either camelCase or MultiCase for functions/methods, but don't mix both. Users of your code will waste time with typos when trying to remember the names of functions if the naming is not consistent.

`system("pause")` is haphazard

No serious piece of code should use it. Not only it is Windows-specific, but it also presents a major security hole. You're basically trusting in an external program which might be harmful (system will happily attempt to exec any program you ask it to).

You may omit `return 0` at the end of `main`

main is a special function for the compiler, and so it has this additional rule that you don't have to explicitly return 0 at the end. If the program reaches the end of main without encountering a return statement, it defaults to 0, so it is quite common practice and even recommended to just leave out return 0 at the end.

Also, this is just me, but I see no point in writing return as if it was a function:

return(outBuf);

Return is a statement, built into the language, not a function. Also, if you just put a space after the keyword, you don't have to type the two ( ), so it's one less char per return. Think about that for a second ;).

Question 4

InitWinsock() shouldn't be done inside the class constructor - it would be called every time a new HTTP class is created which is unnecessary.

Question 5

@user45891 Well, the documentation states that "An application can call WSAStartup more than once ..." (near the end of the page), so its not like it would be re-initializing the library every time. But having a static method called once at the start of main isn't bad either. My suggestion was towards a simpler interface, that's all.

Question 6

True however a little below that it states An application must call the WSACleanup function for every successful time the WSAStartup function is called... (Admittedly I also just learned that)

user45891 user45891 6306 silver badges10 bronze badges · Answer 1 · 2015-09-12 11:56:40Z

As glampert pointed out this is a very good example for a class. I took the liberty to start converting it into one (I did not review the POST function at all so it also doesn't appear in that class).

The first step was adding an constructor, which takes the Host & Port arguments needed to make an connection. I left out the Path component so that if you ever want to send multiple requests over the same connection you can do that.
That meant that your original getWebPage() was a member of that new class and has only one argument: Path.
Your CreateAddrInfoStruct() and loopRecieve() functions also will ever only be used in this class, so I also put them into it.
That meant I could make the socket a private member, and remove the sockaddr_in* argument from CreateAddrInfoStruct(). As loopReciev()e also was a member function I removed the return value and made Response also a member.

One of the fundamental rules of coding is: Keep it simple, stupid.
If you do not have a feature yet (In your case e.g. the SSL part) don't add any code to handle that.

CreateAddrInfoStruct() resolved the domain name but if the domain used Round-Robin DNS, it only looked at the first result.
So the connect() from getWebPage() was moved into that function with a nice little loop around it.

As tempting it is to do all debugging via console output, once you reuse that library you need to remove it anyway. If you really think you need console output, print to std::cerr.

The final product is:

#include <winsock2.h>
#include <ws2tcpip.h>
#include <iostream>
#include <string>
#include <vector>
#pragma comment (lib, "Ws2_32.lib")
class HTTPRequest
{
private:
 const std::string Host;
 const short Port;
 SOCKET Sock;
 std::string Response;
 bool loop_recieve();
 bool resolve_and_connect();
public:
 HTTPRequest::HTTPRequest(const std::string& host, const short port)
 : Host(host), Port(port)
 { 
 if((Sock = socket(AF_INET, SOCK_STREAM, 0)) == INVALID_SOCKET)
 {
 throw std::exception("Couldn't create socket");
 }
 }
 HTTPRequest::~HTTPRequest()
 {
 closesocket(Sock);
 }
 std::string get_response()
 {
 return Response;
 }
 bool get_request(const std::string& path);
};
bool InitWinsock()
{
 WSADATA WsaDat;
 if(WSAStartup(MAKEWORD(2, 2), &WsaDat) != 0)
 {
 return false;
 }
 return true;
}
/*
Recieve Everything Including Headers
(Note: Sometimes A Server Will Reply Without A 'Content-Length' Property)
*/
bool HTTPRequest::loop_recieve()
{
 while(true)
 {
 char recvBuf[256];
 auto nret = recv(Sock, recvBuf, sizeof(recvBuf), 0);
 if(nret == -1)
 {
 return false;
 }
 else if(nret == 0)
 {
 break;
 }
 // Append Newly Recieved Bytes To String
 Response.append(recvBuf, nret);
 }
 return true;
}
/*
Create The Struct We Need For Connection
*/
bool HTTPRequest::resolve_and_connect()
{
 bool ret = false;
 ADDRINFO hints = {0};
 hints.ai_flags = AI_ALL;
 hints.ai_family = PF_INET;
 hints.ai_protocol = IPPROTO_IPV4;
 ADDRINFO *pResult = nullptr;
 if(getaddrinfo(Host.c_str(), nullptr, &hints, &pResult))
 {
 return false;
 }
 sockaddr_in servAddr = {0};
 servAddr.sin_family = AF_INET; //! todo: IPv6 support
 servAddr.sin_port = htons(Port);
 for(size_t i = 0; i < pResult->ai_addrlen; i++)
 {
 servAddr.sin_addr.S_un.S_addr = (ULONG)((sockaddr_in*)&pResult->ai_addr[i])->sin_addr.S_un.S_addr;
 if(connect(Sock, (sockaddr*)&servAddr, sizeof(servAddr)) != SOCKET_ERROR)
 {
 ret = true;
 break;
 }
 }
 freeaddrinfo(pResult);
 return ret;
}
/*
Basic GET
 \todo Check whether path is legitmate
*/
bool HTTPRequest::get_request(const std::string& path)
{
 if(!resolve_and_connect())
 return false;
 std::string request = "GET " + path + " HTTP/1.1" + "\r\n";
 request += "Host: " + Host + "\r\n";
 request += "Accept: */*\r\n";
 request += "Accept-Language: en-us\r\n";
 request += "User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.0\r\n";
 request += "Connection: close\r\n";
 request += "\r\n";
 if(send(Sock, request.c_str(), request.length(), 0) == SOCKET_ERROR)
 {
 return false;
 }
 return loop_recieve();
}
int main()
{
 // Init Winsock So We Can Use Sockets
 if(!InitWinsock())
 {
 return(0);
 }
 HTTPRequest Http("google.com", 80);
 // Make Our Request
 if(Http.getWebPage("/"))
 {
 std::cout << Http.get_response();
 }
 WSACleanup();
 std::cin.get();
 return(0);
}

glampert glampert 17.3k4 gold badges31 silver badges89 bronze badges · Answer 2 · 2015-09-12 01:56:16Z

Any particular reason for not using a class?

A class would be the "default" way of implementing what you have here. The way it is right now, it seems like you've transcribed a C implementation into C++.

HTTPRequest could be a class, with apparently only getWebPage() as public, since that's all your main calls (probably also postWebPage?). InitWinsock would then be done inside the class constructor.

By the way, a well written program doesn't leak resources, so you should also call WSACleanup at some point, either in a destructor or before exiting main.

`string.find` doesn't return -1

if (HTTP_s->Host.find("http://") != -1) {

It returns std::string::npos if the substring is not found. Comparing against -1 is not guaranteed to be portable, not even across different versions of the same compiler.

Avoid mixing `printf` and `cout`

Keep the code uniform. Avoid mixing both printing models, but also, std::cout and the standard C++ streams are more robust than printf and the C IO library.

Format strings tend to be more readable and there is some value in separating the data from the presentation, but unfortunately, they are a C mechanism that is incompatible with C++'s object model. So you cannot, for instance, pass a C++ class type to one of those functions.

Format strings are also not type safe, since you wave compile-time type inference and rely on manually matching the type of each parameter with the %s in the string's body. In the end, that's a recipe for disaster and these issues outweigh any gains in readability. Having to manually specify the type in the format, when the compiler always knows the type of the variable passed in, is also a violation of the DRY principle.

Send error messages to the error stream

There's a standard stream meant for printing errors, stderr, or in C++ land, std::cerr. Don't send both errors and normal output to stdout. If you do so, users can't easily filter error output from normal program output.

Keep naming uniform

Watch out for names that break uniformity. Use either camelCase or MultiCase for functions/methods, but don't mix both. Users of your code will waste time with typos when trying to remember the names of functions if the naming is not consistent.

`system("pause")` is haphazard

No serious piece of code should use it. Not only it is Windows-specific, but it also presents a major security hole. You're basically trusting in an external program which might be harmful (system will happily attempt to exec any program you ask it to).

You may omit `return 0` at the end of `main`

main is a special function for the compiler, and so it has this additional rule that you don't have to explicitly return 0 at the end. If the program reaches the end of main without encountering a return statement, it defaults to 0, so it is quite common practice and even recommended to just leave out return 0 at the end.

Also, this is just me, but I see no point in writing return as if it was a function:

return(outBuf);

Return is a statement, built into the language, not a function. Also, if you just put a space after the keyword, you don't have to type the two ( ), so it's one less char per return. Think about that for a second ;).

InitWinsock() shouldn't be done inside the class constructor - it would be called every time a new HTTP class is created which is unnecessary.
@user45891 Well, the documentation states that "An application can call WSAStartup more than once ..." (near the end of the page), so its not like it would be re-initializing the library every time. But having a static method called once at the start of main isn't bad either. My suggestion was towards a simpler interface, that's all.
True however a little below that it states An application must call the WSACleanup function for every successful time the WSAStartup function is called... (Admittedly I also just learned that)

Stack Exchange Network

C++ Winsock HTTP Request Library

2 Answers 2

Any particular reason for not using a class?

`string.find` doesn't return -1

Avoid mixing `printf` and `cout`

Send error messages to the error stream

Keep naming uniform

`system("pause")` is haphazard

You may omit `return 0` at the end of `main`

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

C++ Winsock HTTP Request Library

2 Answers 2

Any particular reason for not using a class?

string.find doesn't return -1

Avoid mixing printf and cout

Send error messages to the error stream

Keep naming uniform

system("pause") is haphazard

You may omit return 0 at the end of main

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

`string.find` doesn't return -1

Avoid mixing `printf` and `cout`

`system("pause")` is haphazard

You may omit `return 0` at the end of `main`