3
\$\begingroup\$

This handles a parsing system to look at every users profile page and determine their job classification based on the class of a div element. I noticed that the class was always consistent on specific types of employees so I created this script to weed them out from other classes so I can have a pure list of employees for searches. What I have noticed is the script can handle 2 - 3 per second, and there are 30,000 of them to sort. This takes a few hours so I was forced to open several instances of the same program to cut the time down.

What might I do to make this code more efficient and less time consuming? I have been looking for solutions, but none that point out directly the flaws of my coding practice.

Imports System
Imports System.Net
Imports System.Text
Imports System.IO
Public Class Form1
Private Sub btnInput_Click(sender As Object, e As EventArgs) Handles btnInput.Click
 Dim myFileDlog As New OpenFileDialog()
 Dim appPath As String = Application.StartupPath()
 'look for files in the default folder
 myFileDlog.InitialDirectory = appPath.ToString & "\Reports"
 'specifies what type of data files to look for
 myFileDlog.Filter = "Data Files (*.csv)|*.csv"
 'specifies which data type is focused on start up
 myFileDlog.FilterIndex = 1
 'Gets or sets a value indicating whether the dialog box restores the current directory before closing.
 myFileDlog.RestoreDirectory = True
 'seperates message outputs for files found or not found
 If myFileDlog.ShowDialog() = DialogResult.OK Then
 If Dir(myFileDlog.FileName) <> "" Then
 'Adds the file directory to the text box
 tbInput.Text = myFileDlog.FileName
 myFileDlog.FileName = Nothing
 myFileDlog.Dispose()
 Else
 MsgBox("File Not Found", MsgBoxStyle.Critical)
 End If
 End If
End Sub
Private Sub btnOutput_Click(sender As Object, e As EventArgs) Handles btnOutput.Click
 Dim SaveFile As New SaveFileDialog()
 Dim appPath As String = Application.StartupPath()
 'look for files in the c drive
 SaveFile.InitialDirectory = appPath.ToString & "\Reports"
 SaveFile.Filter = "Data Files (*.csv)|*.csv"
 SaveFile.Title = "Output"
 If SaveFile.ShowDialog() = DialogResult.OK Then
 Dim Write As New System.IO.StreamWriter(SaveFile.FileName)
 tbOutput.Text = SaveFile.FileName
 SaveFile.FileName = Nothing
 Write.Dispose()
 End If
End Sub
Public Function CheckAddress(ByVal URL As String) As Boolean
 Try
 Dim request As WebRequest = WebRequest.Create(URL)
 request.Credentials = CredentialCache.DefaultCredentials
 Dim response As WebResponse = request.GetResponse()
 Catch ex As Exception
 Return False
 End Try
 Return True
End Function
Private Sub btnRun_Click(sender As Object, e As EventArgs) Handles btnRun.Click
 FileOpen(1, "orange_emps.csv", OpenMode.Output)
FileOpen(2, tbOutput.Text, OpenMode.Output)
' ------- User Table ----------->
Dim userL As New List(Of String)
'----------------------------- Read the User Table to Lists --------------------------------->
Using MyReader As New Microsoft.VisualBasic.FileIO.TextFieldParser(tbInput.Text)
 MyReader.TextFieldType = Microsoft.VisualBasic.FileIO.FieldType.Delimited
 MyReader.Delimiters = New String() {","}
 Dim currentRow As String()
 Dim rowP As Integer = 1
 While Not MyReader.EndOfData
 Try
 currentRow = MyReader.ReadFields()
 Dim cellP As Integer = 0
 For Each currentField As String In currentRow
 If rowP > 0 Then
 If Not currentField = "" Then
 userL.Add(currentField.Replace("""", ""))
 End If
 cellP += 1
 End If
 Next
 Catch ex As Microsoft.VisualBasic.FileIO.MalformedLineException
 MsgBox("Line " & ex.Message & " is invalid. Skipping")
 End Try
 rowP += 1
 End While
End Using '----------------------------------------------------------------------------------->
Dim userLAR As String() = userL.ToArray
Dim orangeL As New List(Of String)
Dim curLN As String ' ------- the current user in the row --->
Dim jobCount = IO.File.ReadAllLines(tbInput.Text).Length
Dim jobPer As Double = 0
Dim pBar As Integer = 1
progBar.Maximum = userLAR.Length
 Dim pBarScale As Decimal = 0
 For i As Integer = 0 To userLAR.Length - 1
 curLN = userLAR(i).ToString
 ' Specify the URL to receive the request. 
 Dim request As HttpWebRequest = CType(WebRequest.Create("https://empDB.mysite.com/emps/" & curLN), HttpWebRequest)
 ' Set some reasonable limits on resources used by this request
 request.MaximumAutomaticRedirections = 4
 request.MaximumResponseHeadersLength = 4
 ' Set credentials to use for this request.
 request.Credentials = CredentialCache.DefaultCredentials
 Dim response As HttpWebResponse = CType(request.GetResponse(), HttpWebResponse)
 ' Get the stream associated with the response. 
 Dim receiveStream As Stream = response.GetResponseStream()
 ' Pipes the stream to a higher level stream reader with the required encoding format. 
 Dim readStream As New StreamReader(receiveStream, Encoding.UTF8)
 ' Store contents in this String.
 Dim line As String
 Dim newURL As String = "https://empDB.mysite.com/emps/" & curLN
 Dim sourceCheck As Boolean = CheckAddress(newURL)
 ' ---- make sure the employee still exists in db -->
 If sourceCheck = True Then
 ' Read first line.
 line = readStream.ReadLine
 Dim lineCount As Integer = 0
 ' Loop over each line in file, While list is Not Nothing.
 Do While (lineCount < 500)
 If lineCount > 400 Then
 jobPer = Format(((i + 1) / userLAR.Length) * 100, "0.00")
 labProg.Text = "Progress: " & i + 1 & " of " & userLAR.Length
 labPer.Text = jobPer & "%"
 Me.Text = "PhoneTool Scraper " & jobPer & "%"
 If line.Contains("orange-frame") Then
 orangeL.Add(userLAR(i))
 lineCount = 500
 End If
 End If
 ' Read in the next line.
 line = readStream.ReadLine
 lineCount += 1
 Loop
 End If
 response.Close()
 readStream.Close()
 If progBar.Value + pBar < progBar.Maximum Then
 progBar.Value += pBar
 End If
 Application.DoEvents()
 Next
 PrintLine(1, "----- Orange Emps -----")
 For Each orange In orangeL
 PrintLine(1, orange)
 Next
 PrintLine(1, "")
 PrintLine(1, orangeL.Count)
progBar.Value = userLAR.Length
 FileClose(1)
 FileClose(2)
 Dim FILE_NAME As String = "orange_emps.csv"
 If System.IO.File.Exists(FILE_NAME) = True Then
 Process.Start(FILE_NAME)
 Else
 MsgBox("File Does Not Exist")
 End If
End Sub
End Class

Quick definitions:

userL = a list of employees pre-compiled for the check

userLAR = an array created from userL

orangeL = the list of employees pulled from the report sheet

The reason I start the lineCount If statement at 400 and end at 500 is because I believed it would save time not doing comparisons until the range that the values show up in. I don't believe this was correct.

Application.DoEvents() is being used only to update the users with the current progress of the report as well as the current count of employees filtered.

Here is a sample of the web link's html lines that are read by the loop:

<option value="Country">Country</option> 
<option value="City">City</option></select> 
</i> 
</button> 
</div> 
</form> 
</div> 
</div> 
</div> 
</nav> 
<div class='alert-wrapper'> 
</div> 
<div id='content'> 
<!-- / Ring Ring Ring Ring Ring Ring Ring --> 
<div class='container-fluid'> 
<div class='row-fluid emp'> 
<div class='employee-frame'> 
<div class='no-frame-border pull-right worker-frame orange-frame'> 
<div class='hole-wrapper'> 
<div class='hole'></div> 
</div> 
<div class='user'> 
johndoe 
</div> 
<div class='row-fluid picture-frame'> 
<div class='photo'> 
<img alt="John Doe" id="frame-image" src="./?uid=johndoe" style="" /> 
</div> 
</div> 
<div class='name'> 
<p> 
<strong> 
John 
</strong> 
</p> 
<p> 
Doe 
</p> 
</div> 
</div> 
</div> 
<div class='emp-info'> 
<div class='row-fluid'> 
<p class='name'> 
John Doe 
<div class='prefname'> 
</div> 
</p> 
<p class='title'> 
House Cleaning
<a href="#">External (8725)</a> 
</p> 
<div class='row-fluid'> 
<p class='email'> 
<a href="mailto:[email protected]">[email protected]</a> 
</p> 
<p class='display-options pull-right'> 
<i class="icon-cog icon-large muted"></i> 
<a href="#display-options-modal" class="muted" data-toggle="modal">Display options</a> 
<div class='modal hide fade' id='display-options-modal' role='dialog' tabindex='-1'> 
<div class='modal-dialog'> 
<div class='modal-content'> 
<div class='modal-header header-name colored-header'> 
Display Options 
<a href="#" class="pull-right" data-dismiss="modal"><i class="icon-remove- symble"></i></a> 
</div> 
<form accept-charset="UTF-8" action="/users/johndoe/update_user_pref" class="formtastic user_pref" id="edit_user_pref_123456789" method="post" novalidate="novalidate"><div style="margin:0;padding:0;display:inline"><input name="utf8" type="hidden" value="&#x1234;" /><input name="_method" type="hidden" value="put" /><input name="authenticity_token" type="hidden" value="4asdfeadfagtadgfasdg5ad=" /></div><ul class='nav nav-tabs'> 
<li class='active'> 
<a href="#tab-main" data-toggle="tab">Main</a> 
</li> 
<li> 
<a href="#tab-tree" data-toggle="tab">Chart Tab</a> 
</li> 
</ul> 
<div class='tab-content'> 
<div class='tab-pane active' id='tab-main'> 
<div class='modal-header title-header'> 
Frame Image 
</div> 
<div class='row-fluid'> 
<div class='span12'> 
<li class="checkbox boolean input optional" id="user_pref_profile_show_frame_wrap_input"><input name="user_pref[profile_show_frame_wrap]" type="hidden" value="0" /><label class="" for="user_pref_profile_show_frame_wrap"><input checked="checked" id="user_pref_profile_show_frame_wrap" name="user_pref[profile_show_frame_wrap]" type="checkbox" value="1" />Show Frame Wraps (indicates tenure)</label> 
</li> 
<li class="checkbox boolean input optional" id="user_pref_profile_show_custom_image_input"><input name="user_pref[profile_show_custom_iamge]" type="hidden" value="0" /><label class="" for="user_pref_profile_show_custom_iamge"><input checked="checked" id="user_pref_profile_show_custom_image" name="user_pref[profile_show_custom_image]" type="checkbox" value="1" />Show custom (user-uploaded) Image by default</label> 
 </li> 
 </div> 
 </div> 
 <div class='modal-header title-header'> 
info block 
 </div> 
<div class='row-fluid'> 
<div class='span6'> 
<li class="checkbox boolean input optional" id="user_pref_profile_show_local_input"><input name="user_pref[profile_show_local]" type="hidden" value="0" /><label class="" for="user_pref_profile_show_local"><input checked="checked" id="user_pref_profile_show_local" name="user_pref[profile_show_local]" type="checkbox" value="1" /> Show Area (e.g. Building 1 - My Name)</label>

This is just a sample of the web link page. The code is not rellevent other than for the parsing purpoes. It is looking for a line containing orange-frame and adding the userL name to a orangeL.

asked Jul 1, 2015 at 15:01
\$\endgroup\$
6
  • \$\begingroup\$ Can you post the entire method body? And if you do, please include the CheckAddress method. I recommend you read this blog post while waiting for reviews :) Keeping your UI Responsive and the Dangers of Application.DoEvents \$\endgroup\$ Commented Jul 1, 2015 at 15:28
  • 1
    \$\begingroup\$ Complete code added! \$\endgroup\$ Commented Jul 1, 2015 at 16:31
  • \$\begingroup\$ It would be great if you could add some sample data. \$\endgroup\$ Commented Jul 1, 2015 at 16:48
  • \$\begingroup\$ Let me work on that. That might take a bit of time on that one. I will have to do some work on that one, because I will have to change the names of the data as it is company confidential information. I have already changed the web link and names for that reason. The file has roughly 30k lines in a single column of logins. I am sure I can produce something. :) \$\endgroup\$ Commented Jul 1, 2015 at 17:20
  • \$\begingroup\$ I added the web link page html as a reference for what the parser is actually handling. \$\endgroup\$ Commented Jul 1, 2015 at 18:31

1 Answer 1

1
\$\begingroup\$

on your Button Click Event (btnOutput_Click) you should be using a using statement for your writer here:

Dim Write As New System.IO.StreamWriter(SaveFile.FileName)
tbOutput.Text = SaveFile.FileName
SaveFile.FileName = Nothing
Write.Dispose()

like this

Using Write As New System.IO.StreamWriter(SaveFile.FileName)
 tbOutput.Text = SaveFile.FileName
 SaveFile.FileName = Nothing
End Using

it will make sure that no matter what, once the scope leaves that using block that the StreamWriter is disposed of, which is very important. Anything that implements the IDisposable interface should be used in conjunction with a using block.

answered Aug 18, 2015 at 21:53
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.