I am a C# developer, dont know much about vc++ or c++, never used it, for some reasons i have decided to use a c++ dll in my app for downloading content from the web.
I dont want to use WebClient.
I want it to download html content by providing Url of the resource, the dll will return the string response.
this is the code i have till now, source
#include <iostream>
#include <string>
#include <stdlib.h>
#include <winsock.h>//dont forget to add wsock32.lib to linker dependencies
using namespace std;
#define BUFFERSIZE 1024
void die_with_error(char *errorMessage);
void die_with_wserror(char *errorMessage);
int main(int argc, char *argv[])
{
string request;
string response;
int resp_leng;
char buffer[BUFFERSIZE];
struct sockaddr_in serveraddr;
int sock;
WSADATA wsaData;
char *ipaddress = "208.109.181.178";
int port = 80;
request+="GET /test.html HTTP/1.0\r\n";
request+="Host: www.zedwood.com\r\n";
request+="\r\n";
//init winsock
if (WSAStartup(MAKEWORD(2, 0), &wsaData) != 0)
die_with_wserror("WSAStartup() failed");
//open socket
if ((sock = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP)) < 0)
die_with_wserror("socket() failed");
//connect
memset(&serveraddr, 0, sizeof(serveraddr));
serveraddr.sin_family = AF_INET;
serveraddr.sin_addr.s_addr = inet_addr(ipaddress);
serveraddr.sin_port = htons((unsigned short) port);
if (connect(sock, (struct sockaddr *) &serveraddr, sizeof(serveraddr)) < 0)
die_with_wserror("connect() failed");
//send request
if (send(sock, request.c_str(), request.length(), 0) != request.length())
die_with_wserror("send() sent a different number of bytes than expected");
//get response
response = "";
resp_leng= BUFFERSIZE;
while (resp_leng == BUFFERSIZE)
{
resp_leng= recv(sock, (char*)&buffer, BUFFERSIZE, 0);
if (resp_leng>0)
response+= string(buffer).substr(0,resp_leng);
//note: download lag is not handled in this code
}
//display response
cout << response << endl;
//disconnect
closesocket(sock);
//cleanup
WSACleanup();
return 0;
}
void die_with_error(char *errorMessage)
{
cerr << errorMessage << endl;
exit(1);
}
void die_with_wserror(char *errorMessage)
{
cerr << errorMessage << ": " << WSAGetLastError() << endl;
exit(1);
}
The instructions with the code says, code will end an http download 'early' if the download is anything slower than instant.
Please suggest better code, or modifications, which can be perfect for use in a Fast Crawler.
-
7\$\begingroup\$ Your implementation is completely broken with respect to HTTP compliance. You might be lucky if it works in more than one case. Implementing HTTP is not trivial. Use WebClient or any other well-tested HTTP library. Furthermore, your performance will not magically increase by switching from C# to C++. \$\endgroup\$dtb– dtb2011年07月27日 04:36:47 +00:00Commented Jul 27, 2011 at 4:36
1 Answer 1
The above implementation has little to no chance of working. The HTTP protocol is quite complex, and none of that complexity is contemplated.
My advice is, don't waste time and energy in this. Use a high-level library.
Also, using low-level code will NOT improve the performance of fetching a document via HTTP -- even if your implementation was blazing-fast 100% polished assembly code, you'd still have to wait centuries (from the CPU point of view) for the data to arrive from the network.