I'm trying to write a HTTP Request library for practice with Sockets and C++ in general. I created a seperate function to setup the struct needed to Connect() however it is now breaking and I cant seem to figure out why even with a debugger. Also looking for any code optimization I can be doing better.
#include <winsock2.h>
#include <ws2tcpip.h>
#include <iostream>
#include <string>
#include <vector>
#pragma comment (lib, "Ws2_32.lib")
/*
Struct For Holding Data Pre/Post HTTP Request
*/
struct HTTPRequest_s
{
std::string Host; // Host
std::string Path; // Path
short Port; // Port
short HTTPCode; // 404, 301..etc
std::string Response; // Holds Data We Get Back
std::vector<std::pair<std::string, std::string>> PostParams; // Holds Any Data We POST (If Applicable)
};
bool InitWinsock()
{
WSADATA WsaDat;
if (WSAStartup(MAKEWORD(2, 2), &WsaDat) != 0)
{
std::cout << "Failed To Start Winsocket With WSAStartup()\n";
WSACleanup();
return(false);
}
return(true);
}
/*
Recieve Everything Including Headers
(Note: Sometimes A Server Will Reply Without A 'Content-Length' Property)
*/
std::string loopRecieve(SOCKET Sock)
{
char recvBuf[256]; // For Transporting Data
std::string outBuf; // Output String
unsigned int nret;
while (true)
{
nret = recv(Sock, recvBuf, sizeof(recvBuf), 0);
if (nret == -1)
{
printf("A Socket Error Occured(-1)\n");
break;
}
else if (nret == 0)
{
printf("Done Reading!\n");
break;
}
// Append Newly Recieved Bytes To String
outBuf.append(recvBuf, nret);
}
return(outBuf);
}
/*
Create The Struct We Need For Connection
*/
void CreateAddrInfoStruct(HTTPRequest_s *HTTP_s, sockaddr_in *servAddr)
{
int nret;
ADDRINFO *pResult = NULL,
hints;
// Strip Out 'http://' && 'https://'
if (HTTP_s->Host.find("http://") != -1) {
HTTP_s->Host = HTTP_s->Host.substr(HTTP_s->Host.find("http://") + 7);
}
else if (HTTP_s->Host.find("https://") != -1) {
printf("SSL Not Supported Yet\n");
return;
}
hints = { 0 };
hints.ai_flags = AI_ALL;
hints.ai_family = PF_INET;
hints.ai_protocol = IPPROTO_IPV4;
nret = getaddrinfo(HTTP_s->Host.c_str(), nullptr, &hints, &pResult);
if (nret != 0)
{
std::cout << "Failed To Do GetAddrInfo()\n";
return;
}
servAddr = { 0 };
servAddr->sin_family = AF_INET;
servAddr->sin_addr.S_un.S_addr = *((ULONG*)&(((sockaddr_in*)pResult->ai_addr)->sin_addr));
servAddr->sin_port = htons(HTTP_s->Port);
}
/*
Basic GET
*/
void getWebPage(HTTPRequest_s *HTTP_s)
{
int nret;
SOCKET theSocket;
sockaddr_in servAddr;
// Create Our Struct
CreateAddrInfoStruct(HTTP_s, &servAddr);
theSocket = socket(AF_INET, SOCK_STREAM, 0);
if (theSocket == INVALID_SOCKET)
{
std::cout << "Socket Is Invalid, Is Winsock Initialized?\n";
return;
}
nret = connect(theSocket, (sockaddr*)&servAddr, sizeof(servAddr));
if (nret == SOCKET_ERROR)
{
std::cout << "Failed To Connect To Host\n";
return;
}
std::string request;
request = "GET " + HTTP_s->Path + " HTTP/1.1" + "\r\n";
request += "Host: " + HTTP_s->Host + "\r\n";
request += "Accept: */*\r\n";
request += "Accept-Language: en-us\r\n";
request += "Connection: close\r\n";
request += "User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.0\r\n";
request += "Referer: http://" + HTTP_s->Host + "\r\n";
request += "\r\n";
nret = send(theSocket, request.c_str(), request.length(), 0);
if (nret == SOCKET_ERROR)
{
std::cout << "Failed To Send To Host\n";
return;
}
HTTP_s->Response = loopRecieve(theSocket);
closesocket(theSocket);
}
/*
Basic POST
*/
void postWebPage(HTTPRequest_s *HTTP_s)
{
int nret;
SOCKET theSocket;
sockaddr_in servAddr;
// Create Our Struct
CreateAddrInfoStruct(HTTP_s, &servAddr);
theSocket = socket(AF_INET, SOCK_STREAM, 0);
if (theSocket == INVALID_SOCKET)
{
std::cout << "Socket Is Invalid, Is Winsock Initialized?\n";
return;
}
nret = connect(theSocket, (sockaddr*)&servAddr, sizeof(servAddr));
if (nret == SOCKET_ERROR)
{
std::cout << "Failed To Connect To Host\n";
return;
}
// Structure POST Data Properly
std::string concatPostData;
for (auto i : HTTP_s->PostParams)
{
concatPostData += i.first + "=" + i.second + "&";
}
concatPostData.pop_back(); // Pop Off Extra '&' At The End Of Loop (Note: Might Not Need To Pop This Off)
// Construct HEADER
std::string header;
header = "POST " + HTTP_s->Path + " HTTP/1.1\r\n";
header += "Host: " + HTTP_s->Host + ":" + std::to_string(HTTP_s->Port) + "\r\n";
header += "User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.0\r\n";
header += "Referer: http://" + HTTP_s->Host + "\r\n";
header += "Content-Type: application/x-www-form-urlencoded\r\n";
header += "Content-Length: " + std::to_string(concatPostData.length()) + "\r\n";
header += "Accept-Charset: utf-8\r\n";
header += "Connection: close\r\n\r\n";
header += concatPostData + "\r\n";
header += "\r\n";
// Need To Check Here That We Sent All Data
nret = send(theSocket, header.c_str(), header.length(), 0);
if (nret == SOCKET_ERROR)
{
std::cout << "Failed To Send To Host\n";
return;
}
HTTP_s->Response = loopRecieve(theSocket);
closesocket(theSocket);
}
int main()
{
// Init Winsock So We Can Use Sockets
if (InitWinsock() != true)
{
printf("Failed To Init Winsock!\n");
return(0);
}
// Setup HTTPRequest Structure
HTTPRequest_s Http_s;
Http_s.Host = "http://google.com";
Http_s.Path = "/";
Http_s.Port = 80;
// Make Our Request
getWebPage(&Http_s);
// Print The Data From Our Request Result
printf("Data Dump:\n%s", Http_s.Response.c_str());
system("pause");
return(0);
}
2 Answers 2
As glampert pointed out this is a very good example for a class. I took the liberty to start converting it into one (I did not review the POST function at all so it also doesn't appear in that class).
The first step was adding an constructor, which takes the Host
& Port
arguments needed to make an connection. I left out the Path
component so that if you ever want to send multiple requests over the same connection you can do that.
That meant that your original getWebPage()
was a member of that new class and has only one argument: Path
.
Your CreateAddrInfoStruct()
and loopRecieve()
functions also will ever only be used in this class, so I also put them into it.
That meant I could make the socket a private member, and remove the sockaddr_in*
argument from CreateAddrInfoStruct()
. As loopReciev()e
also was a member function I removed the return value and made Response
also a member.
One of the fundamental rules of coding is: Keep it simple, stupid.
If you do not have a feature yet (In your case e.g. the SSL part) don't add any code to handle that.
CreateAddrInfoStruct()
resolved the domain name but if the domain used Round-Robin DNS, it only looked at the first result.
So the connect()
from getWebPage()
was moved into that function with a nice little loop around it.
As tempting it is to do all debugging via console output, once you reuse that library you need to remove it anyway. If you really think you need console output, print to std::cerr.
The final product is:
#include <winsock2.h>
#include <ws2tcpip.h>
#include <iostream>
#include <string>
#include <vector>
#pragma comment (lib, "Ws2_32.lib")
class HTTPRequest
{
private:
const std::string Host;
const short Port;
SOCKET Sock;
std::string Response;
bool loop_recieve();
bool resolve_and_connect();
public:
HTTPRequest::HTTPRequest(const std::string& host, const short port)
: Host(host), Port(port)
{
if((Sock = socket(AF_INET, SOCK_STREAM, 0)) == INVALID_SOCKET)
{
throw std::exception("Couldn't create socket");
}
}
HTTPRequest::~HTTPRequest()
{
closesocket(Sock);
}
std::string get_response()
{
return Response;
}
bool get_request(const std::string& path);
};
bool InitWinsock()
{
WSADATA WsaDat;
if(WSAStartup(MAKEWORD(2, 2), &WsaDat) != 0)
{
return false;
}
return true;
}
/*
Recieve Everything Including Headers
(Note: Sometimes A Server Will Reply Without A 'Content-Length' Property)
*/
bool HTTPRequest::loop_recieve()
{
while(true)
{
char recvBuf[256];
auto nret = recv(Sock, recvBuf, sizeof(recvBuf), 0);
if(nret == -1)
{
return false;
}
else if(nret == 0)
{
break;
}
// Append Newly Recieved Bytes To String
Response.append(recvBuf, nret);
}
return true;
}
/*
Create The Struct We Need For Connection
*/
bool HTTPRequest::resolve_and_connect()
{
bool ret = false;
ADDRINFO hints = {0};
hints.ai_flags = AI_ALL;
hints.ai_family = PF_INET;
hints.ai_protocol = IPPROTO_IPV4;
ADDRINFO *pResult = nullptr;
if(getaddrinfo(Host.c_str(), nullptr, &hints, &pResult))
{
return false;
}
sockaddr_in servAddr = {0};
servAddr.sin_family = AF_INET; //! todo: IPv6 support
servAddr.sin_port = htons(Port);
for(size_t i = 0; i < pResult->ai_addrlen; i++)
{
servAddr.sin_addr.S_un.S_addr = (ULONG)((sockaddr_in*)&pResult->ai_addr[i])->sin_addr.S_un.S_addr;
if(connect(Sock, (sockaddr*)&servAddr, sizeof(servAddr)) != SOCKET_ERROR)
{
ret = true;
break;
}
}
freeaddrinfo(pResult);
return ret;
}
/*
Basic GET
\todo Check whether path is legitmate
*/
bool HTTPRequest::get_request(const std::string& path)
{
if(!resolve_and_connect())
return false;
std::string request = "GET " + path + " HTTP/1.1" + "\r\n";
request += "Host: " + Host + "\r\n";
request += "Accept: */*\r\n";
request += "Accept-Language: en-us\r\n";
request += "User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.0\r\n";
request += "Connection: close\r\n";
request += "\r\n";
if(send(Sock, request.c_str(), request.length(), 0) == SOCKET_ERROR)
{
return false;
}
return loop_recieve();
}
int main()
{
// Init Winsock So We Can Use Sockets
if(!InitWinsock())
{
return(0);
}
HTTPRequest Http("google.com", 80);
// Make Our Request
if(Http.getWebPage("/"))
{
std::cout << Http.get_response();
}
WSACleanup();
std::cin.get();
return(0);
}
Any particular reason for not using a class?
A class would be the "default" way of implementing what you have here. The way it is right now, it seems like you've transcribed a C implementation into C++.
HTTPRequest
could be a class, with apparently only getWebPage()
as public, since that's all your main
calls (probably also postWebPage
?). InitWinsock
would then be done inside the class constructor.
By the way, a well written program doesn't leak resources, so you should also call WSACleanup
at some point, either in a destructor or before exiting main
.
string.find
doesn't return -1
if (HTTP_s->Host.find("http://") != -1) {
It returns std::string::npos
if the substring is not found. Comparing against -1 is not guaranteed to be portable, not even across different versions of the same compiler.
Avoid mixing printf
and cout
Keep the code uniform. Avoid mixing both printing models, but also, std::cout
and the standard C++ streams are more robust than printf
and the C IO library.
Format strings tend to be more readable and there is some value in separating the data from the presentation, but unfortunately, they are a C mechanism that is incompatible with C++'s object model. So you cannot, for instance, pass a C++ class type to one of those functions.
Format strings are also not type safe, since you wave compile-time type inference and rely on manually matching the type of each parameter with the %
s in the string's body. In the end, that's a recipe for disaster and these issues outweigh any gains in readability. Having to manually specify the type in the format, when the compiler always knows the type of the variable passed in, is also a violation of the DRY principle.
Send error messages to the error stream
There's a standard stream meant for printing errors, stderr
, or in C++ land, std::cerr
. Don't send both errors and normal output to stdout
. If you do so, users can't easily filter error output from normal program output.
Keep naming uniform
Watch out for names that break uniformity. Use either camelCase
or MultiCase
for functions/methods, but don't mix both. Users of your code will waste time with typos when trying to remember the names of functions if the naming is not consistent.
system("pause")
is haphazard
No serious piece of code should use it. Not only it is Windows-specific, but it also presents a major security hole. You're basically trusting in an external program which might be harmful (system
will happily attempt to exec any program you ask it to).
You may omit return 0
at the end of main
main
is a special function for the compiler, and so it has this additional rule that you don't have to explicitly return 0
at the end. If the program reaches the end of main without encountering a return statement, it defaults to 0
, so it is quite common practice and even recommended to just leave out return 0
at the end.
Also, this is just me, but I see no point in writing return
as if it was a function:
return(outBuf);
Return is a statement, built into the language, not a function. Also, if you just put a space after the keyword, you don't have to type the two ( )
, so it's one less char per return. Think about that for a second ;)
.
-
\$\begingroup\$
InitWinsock()
shouldn't be done inside the class constructor - it would be called every time a new HTTP class is created which is unnecessary. \$\endgroup\$user45891– user458912015年09月12日 11:58:02 +00:00Commented Sep 12, 2015 at 11:58 -
\$\begingroup\$ @user45891 Well, the documentation states that "An application can call WSAStartup more than once ..." (near the end of the page), so its not like it would be re-initializing the library every time. But having a static method called once at the start of main isn't bad either. My suggestion was towards a simpler interface, that's all. \$\endgroup\$glampert– glampert2015年09月12日 17:38:08 +00:00Commented Sep 12, 2015 at 17:38
-
\$\begingroup\$ True however a little below that it states
An application must call the WSACleanup function for every successful time the WSAStartup function is called...
(Admittedly I also just learned that) \$\endgroup\$user45891– user458912015年09月12日 18:40:27 +00:00Commented Sep 12, 2015 at 18:40