11
\$\begingroup\$

I created this function to recursively copy an entire directory from an FTP server. It works just fine except that it is about 4 times slower than using FileZilla to do the same operation. It takes approximately 55 seconds to download the directory in FileZilla but it takes 229 seconds with this function. What can I do to make it download/run faster?

Private Sub CopyEntireDirectory(ByVal directory As String)
 Dim localPath = localDirectory & formatPath(directory)
 'creates directory in destination path
 IO.Directory.CreateDirectory(localPath)
 'Gets the directory details so I can separate folders from files
 Dim fileList As ArrayList = Ftp.ListDirectoryDetails(directory, "")
 For Each item In fileList
 'checks if it's a folder or file: d=folder
 If (item.ToString().StartsWith("d")) Then
 'gets the directory from the details
 Dim subDirectory As String = item.ToString().Substring(item.ToString().LastIndexOf(" ") + 1)
 CopyEntireDirectory(directory & "/" & subDirectory)
 Else
 Dim remoteFilePath As String = directory & "/" & item.ToString().Substring(item.ToString().LastIndexOf(" ") + 1)
 Dim destinationPath = localPath & "\" & item.ToString().Substring(item.ToString().LastIndexOf(" ") + 1)
 'downloads file to destination directory
 Ftp.DownLoadFile(remoteFilePath, destinationPath)
 End If
 Next
End Sub

This function below is what is actually taking up 97% of the time.

Public Sub DownLoadFile(ByVal fromFilename As String, ByVal toFilename As String)
 Dim files As ArrayList = Me.ListDirectory(fromFilename, "")
 Dim request As FtpWebRequest = Me.CreateRequestObject(fromFilename)
 request.Method = WebRequestMethods.Ftp.DownloadFile
 Dim response As FtpWebResponse = CType(request.GetResponse(), FtpWebResponse)
 If response.StatusCode <> FtpStatusCode.OpeningData AndAlso response.StatusCode <> FtpStatusCode.DataAlreadyOpen Then
 Throw New ApplicationException(Me.BuildCustomFtpErrorMessage(request, response))
 End If
 Dim fromFilenameStream As Stream = response.GetResponseStream()
 Dim toFilenameStream As FileStream = File.Create(toFilename)
 Dim buffer(BLOCK_SIZE) As Byte
 Dim bytesRead As Integer = fromFilenameStream.Read(buffer, 0, buffer.Length)
 Do While bytesRead > 0
 toFilenameStream.Write(buffer, 0, bytesRead)
 Array.Clear(buffer, 0, buffer.Length)
 bytesRead = fromFilenameStream.Read(buffer, 0, buffer.Length)
 Loop
 response.Close()
 fromFilenameStream.Close()
 toFilenameStream.Close()
End Sub
asked Dec 2, 2015 at 20:11
\$\endgroup\$
4
  • 3
    \$\begingroup\$ FileZilla is probably using threads to download multiple files at the same time. You may want to consider first creating the directory structure and a list of files with full paths to download, then using the Task Parallel Library to split that list in to multiple download jobs. Specifically, Parallel.ForEach msdn.microsoft.com/en-us/library/dd460720(v=vs.110).aspx \$\endgroup\$ Commented Dec 2, 2015 at 20:28
  • \$\begingroup\$ It appears that FileZilla sets up a queue and downloads the files one after another. \$\endgroup\$ Commented Dec 2, 2015 at 20:30
  • \$\begingroup\$ I wouldn't know how this compares speedwise, but I've previously found this useful: stackoverflow.com/a/7535879/1316573 \$\endgroup\$ Commented Dec 4, 2015 at 14:59
  • \$\begingroup\$ @BradleyUffner I am using Parallel.ForEach but there is a hangup when it uses Me.ListDirectory(fromFilename, ""). Somewhere it doesn't allow Parallel Tasks and is blocking the operation. \$\endgroup\$ Commented Dec 14, 2015 at 17:14

2 Answers 2

3
\$\begingroup\$

[DownLoadFile] is actually taking up 97% of the time.

Thank you for that profiling observation.

single responsibility

It would be helpful to lift the Me.ListDirectory(fromFilename, "") call out of the function, so we could focus on speed of the download loop. It is a somewhat round trippy call, costing at least one RTT and letting the network channel go idle. Let's assume the files you're downloading are big so that query's cost is negligible, which leaves us with

download loop

There's not much going on here. Maybe you really need to Clear() the buffer? I doubt it. Save a few cycles by omitting it. The cost likely is insignificant. Now we just have

 Do While bytesRead > 0
 toFilenameStream.Write(buffer, 0, bytesRead)
 bytesRead = fromFilenameStream.Read(buffer, 0, buffer.Length)
 Loop

You didn't tell us how big that buffer is. Likely it is "too small". Increase its size to as big as you can, or at a minimum increase it to a megabyte.

You also didn't tell us how far away the fileserver is, so I will assume a WAN speed-of-light delay of 100 msec roundtrip ping times.

That Read() call is trying to drain the operating system's TCP buffer; after enough of that the local TCP will request more data from the distant server. An interpreted loop with lots of little read calls will be slower than making one or a handful of calls where you pass in a giant buffer. Over several RTTs this gives the receiving TCP an opportunity to open up the congestion window, so that the sender will have multiple segments in flight at any given instant. This is how TCP decouples latency from bandwidth, discovering the capacity of the end-to-end path's bottleneck router.

You can see how much the receive window has opened by running
tcpdump -w packet.trace -c 1000, or by using WireShark. Congestion window figures appear in that data, and the timestamps reveal when we stall for 100 msec, waiting a RTT for an ACK. You will also be able to see what options were negotiated. Hopefully you will see that SACK is enabled.

Once you have some bandwidth measurements when using a giant buffer, you will be in a good position to try repeatedly cutting the buffer length in half, until you notice an impact on download times.

answered Dec 3, 2024 at 19:35
\$\endgroup\$
-4
\$\begingroup\$

Check your buffer/BLOCKSIZE. If it's too small, it may be making many more reads and writes, and each has to wait for the network to respond.

I'm not given any information as to what block size you're using, so I can't help anymore.

Jamal
35.2k13 gold badges134 silver badges238 bronze badges
answered Jan 16, 2016 at 1:45
\$\endgroup\$
1
  • 2
    \$\begingroup\$ Please expand on what is harmful about having a too-small buffer. Is there a method to determine the optimal buffer size? \$\endgroup\$ Commented Jan 16, 2016 at 2:16

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.