I now know this is patched to a large extent since youtube implemented captcha commenting, but I would still like to learn it. I have set up most of the source code, but I am stuck on figuring out how to get the links for users. Basically what the program does is searches a string, then gathers users based on that string. I have it working(basically) with this code:
Code:
Private Function GetText(ByVal url As String)
Dim request As System.Net.HttpWebRequest = System.Net.HttpWebRequest.Create(url)
Dim response As System.Net.HttpWebResponse = request.GetResponse()
Dim sr As System.IO.StreamReader = New System.IO.StreamReader(response.GetResponseStream(), System.Text.Encoding.GetEncoding("windows-1252"))
readString = sr.ReadToEnd()
sr.Close()
Return readString
End Function
Private Function GetLinks(ByVal returnVar As String, ByVal searchStart As String, ByVal searchEnd As String)
Dim i As Integer = 0
Dim fini As Integer = 0
Dim f As Integer = 0
Dim l As Integer = 0
l = searchStart.Length
i = readString.IndexOf(searchStart)
i = i + l
f = readString.IndexOf(searchEnd, i)
f = f - 1
fini = f - i
returnVar = readString.Substring(i, fini)
Return returnVar
End Sub
GetText gets the webpages source, and then GetLinks gets the users. Unfortunately, this only works to a small extent because it will only get one user. Even if I loop it, that same user will be harvested over and over instead of different users being chosen. I'm sure there's a simple solution to this, but I can't figure it out for the life of me. Any help would be appreciated.
Hmm...you should do it line by line. It gives you complete control of the data in a better way. I've managed to retrieve the users of all the comments of a video by writing a small piece of code:
Private Function GetText( ByVal url As String )
Dim readstring As String = ""
Dim request As System.Net.HttpWebRequest = System.Net.HttpWebRequest.Create(url)
Dim response As System.Net.HttpWebResponse = request.GetResponse()
Dim sr As System.IO.StreamReader = New System.IO.StreamReader(response.GetResponseStream( ),System.Text.Encoding.GetEncoding( "windows-1252"))
readstring = sr.ReadToEnd()
sr.Close()
Return readstring
End Function
Private Function GetLinks( ByVal url As String , ByVal linestart As String , ByVal searchstart As String , ByVal searchend As String )
Dim stringtosearch As String = GetText(url)
Dim x As String = My.Computer.FileSystem.GetTempFileName() & Rnd() * 777
My.Computer.FileSystem.WriteAllText(x,stringtosear ch, False )
Dim Reader As New IO.StreamReader(x)
Dim bool As Boolean = False
While Not Reader.EndOfStream
Dim line As String = Reader.ReadLine.Trim
If line.StartsWith(linestart) Then
If bool Then
bool = False
Else
bool = True
End If
If bool Then
Continue While
End If
Dim x1 As Integer = line.IndexOf(searchstart,0) + searchstart.Length + 1
Dim x2 As Integer = line.IndexOf(searchend,x1) + 1
Dim result As String = Mid(line,x1,x2 - x1)
MsgBox(result)
End If
End While
Reader.Dispose()
My.Computer.FileSystem.DeleteFile(x,FileIO.UIOptio n.OnlyErrorDialogs,FileIO.RecycleOption.DeletePerm anently,FileIO.UICancelOption.DoNothing)
End Function
Private Sub Form1Load( ByVal sender As System. Object , ByVal e As System.EventArgs) Handles MyBase .Load
GetLinks( "http://www.youtube.com/watch?v=tAMSRj7ly-g&feature=related", "<a class=""author", "/user/", """")
End Sub
Explanation:
GetLinks Function:
As I am reading the code line by line. I need 1 more info from the user that is "LineStart" which says that donot search for the start and end item unless the current line starts with what is given in LineStart variable.
It then loops through every line. If it matches the LineStart with current line start, it stops and extracts the username.
So, what's up with the bool variable ??
When parsing webpages you need to be careful about the duplication. For example If I won't use the bool variable here, then the same username will be returned two times, because of the youtube page structure. To avoid this, I used bool variable. It tells the while loop that if the last comment was added by the same user then don't add it again.
Tweak the code as it fits your needs. I think you can handle it from now on.
Hope this helps.
Thanks for your post, so close to getting it! My only problem is that your function doesn't return a value, and when I try to return "result"(after deleting the temporary file at the end of the code) it claims that result isn't declared. Could you tell me how to return the result?
Good job string-slave
Originally Posted by FLAMESABER
Hmm...you should do it line by line. It gives you complete control of the data in a better way. I've managed to retrieve the users of all the comments of a video by writing a small piece of code:
Private Function GetText( ByVal url As String )
Dim readstring As String = ""
Dim request As System.Net.HttpWebRequest = System.Net.HttpWebRequest.Create(url)
Dim response As System.Net.HttpWebResponse = request.GetResponse()
Dim sr As System.IO.StreamReader = New System.IO.StreamReader(response.GetResponseStream( ),System.Text.Encoding.GetEncoding( "windows-1252"))
readstring = sr.ReadToEnd()
sr.Close()
Return readstring
End Function
Private Function GetLinks( ByVal url As String , ByVal linestart As String , ByVal searchstart As String , ByVal searchend As String )
Dim stringtosearch As String = GetText(url)
Dim x As String = My.Computer.FileSystem.GetTempFileName() & Rnd() * 777
My.Computer.FileSystem.WriteAllText(x,stringtosear ch, False )
Dim Reader As New IO.StreamReader(x)
Dim bool As Boolean = False
While Not Reader.EndOfStream
Dim line As String = Reader.ReadLine.Trim
If line.StartsWith(linestart) Then
If bool Then
bool = False
Else
bool = True
End If
If bool Then
Continue While
End If
Dim x1 As Integer = line.IndexOf(searchstart,0) + searchstart.Length + 1
Dim x2 As Integer = line.IndexOf(searchend,x1) + 1
Dim result As String = Mid(line,x1,x2 - x1)
MsgBox(result)
End If
End While
Reader.Dispose()
My.Computer.FileSystem.DeleteFile(x,FileIO.UIOptio n.OnlyErrorDialogs,FileIO.RecycleOption.DeletePerm anently,FileIO.UICancelOption.DoNothing)
End Function
Private Sub Form1Load( ByVal sender As System. Object , ByVal e As System.EventArgs) Handles MyBase .Load
GetLinks( "http://www.youtube.com/watch?v=tAMSRj7ly-g&feature=related", "<a class=""author", "/user/", """")
End Sub
Explanation:
GetLinks Function:
As I am reading the code line by line. I need 1 more info from the user that is "LineStart" which says that donot search for the start and end item unless the current line starts with what is given in LineStart variable.
It then loops through every line. If it matches the LineStart with current line start, it stops and extracts the username.
So, what's up with the bool variable ??
When parsing webpages you need to be careful about the duplication. For example If I won't use the bool variable here, then the same username will be returned two times, because of the youtube page structure. To avoid this, I used bool variable. It tells the while loop that if the last comment was added by the same user then don't add it again.
Tweak the code as it fits your needs. I think you can handle it from now on.
Hope this helps.
sheet nice GJ! +1
Originally Posted by ShadowPwnz
Thanks for your post, so close to getting it! My only problem is that your function doesn't return a value, and when I try to return "result"(after deleting the temporary file at the end of the code) it claims that result isn't declared. Could you tell me how to return the result?
You need to learn variable scopes. When you declare a variable inside a Loop / Sub / Function / etc..., it only exists there and disposes when reaches outside. As you want all comments of the users, you should access and do operation there.
Like you can add each item to Listbox:
Dim x1 As Integer = line.IndexOf(searchstart,0) + searchstart.Length + 1
Dim x2 As Integer = line.IndexOf(searchend,x1) + 1
Dim result As String = Mid(line,x1,x2 - x1) Listbox1.Items.add(Result)