I created this function to recursively copy an entire directory from an FTP server. It works just fine except that it is about 4 times slower than using FileZilla to do the same operation. It takes approximately 55 seconds to download the directory in FileZilla but it takes 229 seconds with this function. What can I do to make it download/run faster?
Private Sub CopyEntireDirectory(ByVal directory As String)
Dim localPath = localDirectory & formatPath(directory)
'creates directory in destination path
IO.Directory.CreateDirectory(localPath)
'Gets the directory details so I can separate folders from files
Dim fileList As ArrayList = Ftp.ListDirectoryDetails(directory, "")
For Each item In fileList
'checks if it's a folder or file: d=folder
If (item.ToString().StartsWith("d")) Then
'gets the directory from the details
Dim subDirectory As String = item.ToString().Substring(item.ToString().LastIndexOf(" ") + 1)
CopyEntireDirectory(directory & "/" & subDirectory)
Else
Dim remoteFilePath As String = directory & "/" & item.ToString().Substring(item.ToString().LastIndexOf(" ") + 1)
Dim destinationPath = localPath & "\" & item.ToString().Substring(item.ToString().LastIndexOf(" ") + 1)
'downloads file to destination directory
Ftp.DownLoadFile(remoteFilePath, destinationPath)
End If
Next
End Sub
This function below is what is actually taking up 97% of the time.
Public Sub DownLoadFile(ByVal fromFilename As String, ByVal toFilename As String)
Dim files As ArrayList = Me.ListDirectory(fromFilename, "")
Dim request As FtpWebRequest = Me.CreateRequestObject(fromFilename)
request.Method = WebRequestMethods.Ftp.DownloadFile
Dim response As FtpWebResponse = CType(request.GetResponse(), FtpWebResponse)
If response.StatusCode <> FtpStatusCode.OpeningData AndAlso response.StatusCode <> FtpStatusCode.DataAlreadyOpen Then
Throw New ApplicationException(Me.BuildCustomFtpErrorMessage(request, response))
End If
Dim fromFilenameStream As Stream = response.GetResponseStream()
Dim toFilenameStream As FileStream = File.Create(toFilename)
Dim buffer(BLOCK_SIZE) As Byte
Dim bytesRead As Integer = fromFilenameStream.Read(buffer, 0, buffer.Length)
Do While bytesRead > 0
toFilenameStream.Write(buffer, 0, bytesRead)
Array.Clear(buffer, 0, buffer.Length)
bytesRead = fromFilenameStream.Read(buffer, 0, buffer.Length)
Loop
response.Close()
fromFilenameStream.Close()
toFilenameStream.Close()
End Sub
2 Answers 2
[DownLoadFile] is actually taking up 97% of the time.
Thank you for that profiling observation.
single responsibility
It would be helpful to lift the Me.ListDirectory(fromFilename, "")
call
out of the function, so we could focus on speed of the download loop.
It is a somewhat round trippy call, costing at least one
RTT
and letting the network channel go idle.
Let's assume the files you're downloading are big so that query's cost
is negligible, which leaves us with
download loop
There's not much going on here.
Maybe you really need to Clear()
the buffer?
I doubt it.
Save a few cycles by omitting it.
The cost likely is insignificant.
Now we just have
Do While bytesRead > 0
toFilenameStream.Write(buffer, 0, bytesRead)
bytesRead = fromFilenameStream.Read(buffer, 0, buffer.Length)
Loop
You didn't tell us how big that buffer
is.
Likely it is "too small".
Increase its size to as big as you can,
or at a minimum increase it to a megabyte.
You also didn't tell us how far away the fileserver is, so I will assume a WAN speed-of-light delay of 100 msec roundtrip ping times.
That Read()
call is trying to drain the operating system's TCP buffer;
after enough of that the local TCP will request more data from the distant server.
An interpreted loop with lots of little read calls will be slower
than making one or a handful of calls where you pass in a giant buffer.
Over several RTTs this gives the receiving TCP an opportunity
to open up the congestion window, so that the sender will have
multiple segments in flight at any given instant.
This is how TCP decouples latency from bandwidth,
discovering the capacity of the end-to-end path's bottleneck router.
You can see how much the receive window has opened by running
tcpdump -w packet.trace -c 1000
, or by using WireShark.
Congestion window figures appear in that data, and the timestamps reveal
when we stall for 100 msec, waiting a RTT for an ACK.
You will also be able to see what options were negotiated.
Hopefully you will see that
SACK
is enabled.
Once you have some bandwidth measurements when using a giant buffer, you will be in a good position to try repeatedly cutting the buffer length in half, until you notice an impact on download times.
Check your buffer/BLOCKSIZE. If it's too small, it may be making many more reads and writes, and each has to wait for the network to respond.
I'm not given any information as to what block size you're using, so I can't help anymore.
-
2\$\begingroup\$ Please expand on what is harmful about having a too-small buffer. Is there a method to determine the optimal buffer size? \$\endgroup\$user34073– user340732016年01月16日 02:16:17 +00:00Commented Jan 16, 2016 at 2:16
Parallel.ForEach
msdn.microsoft.com/en-us/library/dd460720(v=vs.110).aspx \$\endgroup\$Parallel.ForEach
but there is a hangup when it usesMe.ListDirectory(fromFilename, "")
. Somewhere it doesn't allow Parallel Tasks and is blocking the operation. \$\endgroup\$