Important alert: (current site time 7/15/2013 9:33:31 PM EDT)
 

VB icon

Wb Spider

Email
Submitted on: 4/8/2002 9:43:38 PM
By: nfs 
Level: Advanced
User Rating: By 2 Users
Compatibility: 5.0 (all versions), Active Perl specific, 4.0 (all versions)
Views: 16551
(About the author)
 
     Webspider is a Perl script that, when given a start page, will "follow" every link it finds, scanning the HTML code for the use of CGI's. After Webspider has gone over all available links it will report every CGI used on the web site.
 
code:
Can't Copy and Paste this?
Click here for a copy-and-paste friendly version of this code!
 
Terms of Agreement:   
By using this code, you agree to the following terms...   
  1. You may use this code in your own programs (and may compile it into a program and distribute it in compiled format for languages that allow it) freely and with no charge.
  2. You MAY NOT redistribute this code (for example to a web site) without written permission from the original author. Failure to do so is a violation of copyright laws.   
  3. You may link to this code from another website, but ONLY if it is not wrapped in a frame. 
  4. You will abide by any additional copyright restrictions which the author may have placed in the code or code's description.
				
=**************************************
= Name: Wb Spider
= Description:Webspider is a Perl script that, when given a start page, will "follow" every link it finds, scanning the HTML code for the use of CGI's. After Webspider has gone over all available links it will report every CGI used on the web site.
= By: nfs
=
=This code is copyrighted and has= limited warranties.Please see http://www.Planet-Source-Code.com/vb/scripts/ShowCode.asp?txtCodeId=309&lngWId=6=for details.=**************************************

# Hmmm, why on earth would we need a socket ?
use Socket;
sub preps() {
 if ($ARGV[2] eq '') { 
print "\n\nUsage: perl webspider_1.1.pl <proxy server> <proxy port> <URL>\n";
print "Example: perl webspider_1.1.pl proxy.pandora.be 8080 http://www.microsoft.com/\n";
exit;
 }
 $proxy = $ARGV[0];
 $port = $ARGV[1];
 @currentlayer[0] = $ARGV[2];
 $layer = "10";
 $maxcurrentlayerteller = "100";
 $noname = "WebSpider 1.1";
 @currentlayer[$currentlayerteller] =~ s/http:\/\///g ;
 ($server, $dir, $file) = split(/\//, @currentlayer[$currentlayerteller]);
 $logfile = "WebSpider_Log.txt";
 @currentlayer[$currentlayerteller] = "http://@currentlayer[$currentlayerteller]";
 @dontignore[1] = ".html";
 @dontignore[2] = ".xml";
 @dontignore[3] = ".asp";
 @dontignore[4] = ".php";
 @dontignore[5] = ".htm";
 $prepsdontignoreteller = 0 ;
 while (@dontignore[$prepsdontignoreteller] ne '') { print "Don\'t Ignore: @dontignore[$prepsdontignoreteller]\n"; $prepsdontignoreteller++; }
 
}
sub LogToFile() {
 open(OUTF, ">>$logfile");
 print OUTF "$layerteller $currentlayerteller @foundcgi[$foundcgiteller] http://@currentlayer[$currentlayerteller]\n";
 close(OUTF);
}
sub CheckCGIHistory() {
 $cgihistoryteller = 0 ;
 $cgiwasinhistory = 0 ;
 while (@cgihistory[$cgihistoryteller] ne '') { if (@cgihistory[$cgihistoryteller] eq @foundcgi[$foundcgiteller]) { $cgiwasinhistory = 1; } $cgihistoryteller++; }
 if ($cgiwasinhistory != 0) { $foundcgiteller-- ; } else { @cgihistory[$cgihistoryteller] = @foundcgi[$foundcgiteller] ; print "$layerteller:$currentlayerteller @foundcgi[$foundcgiteller]\n"; LogToFile(); }
}
sub CheckHistory() {
 $historyteller = 0 ;
 $wasinhistory = 0 ;
 while (@history[$historyteller] ne '') { 
if (@history[$historyteller] eq @nextlayer[$nextlayerteller]) { 
 $wasinhistory = 1; 
 $placeinhistory = $historyteller ;
} 
$historyteller++; 
 }
 if ($wasinhistory == 0) { 
@history[$historyteller] = @nextlayer[$nextlayerteller] ;
 } else { 
@nextlayer[$nextlayerteller] = "";
$nextlayerteller-- ; 
 }
}
sub itcontainslocation() {
 ($temp, $link) = split(/ /, @response[$responseteller]);
 if ($link =~ /(.*)http:\/\/(.*)/) { @nextlayer[$nextlayerteller] = "$link"; } else { @nextlayer[$nextlayerteller] = "http://$server/"; if ($dir ne '') { @nextlayer[$nextlayerteller] = "@nextlayer[$nextlayerteller]$dir/"; } else { @nextlayer[$nextlayerteller] = "@nextlayer[$nextlayerteller]$link"; } }
 CheckHistory() ;
 $nextlayerteller++ ;
}
sub itcontainshref() {
 ($temp, $therest) = split(/href=\"/, @response[$responseteller]);
 ($link,$temp) = split(/\"/, $therest);
 if ($link =~ /(.*)http:\/\/(.*)/) { @nextlayer[$nextlayerteller] = "$link"; } else { @nextlayer[$nextlayerteller] = "http://$server/"; if ($dir ne '') { @nextlayer[$nextlayerteller] = "@nextlayer[$nextlayerteller]$dir/"; } else { @nextlayer[$nextlayerteller] = "@nextlayer[$nextlayerteller]$link"; } }
 CheckHistory() ;
 $nextlayerteller++ ;
}
sub itcontainsscr() {
 ($temp, $therest) = split(/scr=\"/, @response[$responseteller]);
 ($link,$temp) = split(/\"/, $therest);
 if ($link =~ /(.*)http:\/\/(.*)/) { @nextlayer[$nextlayerteller] = "$link"; } else { @nextlayer[$nextlayerteller] = "http://$server/"; if ($dir ne '') { @nextlayer[$nextlayerteller] = "@nextlayer[$nextlayerteller]$dir/"; } else { @nextlayer[$nextlayerteller] = "@nextlayer[$nextlayerteller]$link"; } }
 CheckHistory() ;
 $nextlayerteller++ ;
}
sub itcontainsaction() {
 ($temp, $therest) = split(/action=\"/, @response[$responseteller]);
 ($cgi,$temp) = split(/\"/, $therest);
 if ($cgi =~ /(.*)http:\/\/(.*)/) { $tempfoundcgi = "$cgi"; } else { $tempfoundcgi = "http://$server/"; if ($dir ne '') { $tempfoundcgi = "$tempfoundcgi$dir/$cgi"; } else { $tempfoundcgi = "$tempfoundcgi$cgi"; } }
 @foundcgi[$foundcgiteller] = $tempfoundcgi ;
 CheckCGIHistory() ;
 $foundcgiteller++ ;
}
sub parse() {
 $serverIP = inet_aton($proxy);
 $serverAddr = sockaddr_in($port, $serverIP);
 socket(SOCKET, PF_INET, SOCK_STREAM, getprotobyname('tcp')); 
 if (!connect(SOCKET, $serverAddr)) { print "Could not connect, try another proxy server.\n"; exit ; }
 
# Send the URL 
 print "Sending: GET http://@currentlayer[$currentlayerteller] HTTP/1.0\n";
 send(SOCKET,"GET http://@currentlayer[$currentlayerteller] HTTP/1.0\n\n",0);
 
 @response=<SOCKET>;
 $responseteller = 0 ;
 while (@response[$responseteller] ne '') {
chomp (@response[$responseteller]);
# Convert everything to lowercase...
@response[$responseteller] = "\L@response[$responseteller]\E";
# If we get a 302...
if (@response[$responseteller] =~ /(.*)Location:(.*)/) { itcontainslocation() ; }
# If we get a 200...
if (@response[$responseteller] =~ /(.*)href=(.*)/) { 
 $dontignoreteller = 0 ;
 $dontignoreit = 0 ;
 # If the link is not in the @dontignore-list, $dontignoreit stays 0
 while(@dontignore[$dontignoreteller] ne '') { if (@response[$responseteller] =~ /(.*)@dontignore[$dontignoreteller](.*)/) { $dontignoreit = 1 ; } $dontignoreteller++; }
 if ($dontignoreit == 0) { itcontainshref(); }
}
# Site has frames...
if (@response[$responseteller] =~ /(.*)scr=(.*)/) { itcontainsscr() ; }
# CGI found...
if (@response[$responseteller] =~ /(.*)action=(.*)/) { itcontainsaction() ; }
$responseteller++;
 }
}
################
# MAIN PROGGIE #
################
print "\nPreparing...";
preps();
print "Done.\n";
for ($layerteller=0;$layerteller<$layer;$layerteller++) {
 for ($currentlayerteller=0;$currentlayerteller<$maxcurrentlayerteller;$currentlayerteller++) {
@currentlayer[$currentlayerteller] =~ s/http:\/\///g ;
($server, $dir, $file) = split(/\//, @currentlayer[$currentlayerteller]);
if (@currentlayer[$currentlayerteller] ne '') { parse(); }
 }
 @currentlayer = @nextlayer ;
 $nextlayerteller = 0 ;
}


Other 1 submission(s) by this author

 


Report Bad Submission
Use this form to tell us if this entry should be deleted (i.e contains no code, is a virus, etc.).
This submission should be removed because:

Your Vote

What do you think of this code (in the Advanced category)?
(The code with your highest vote will win this month's coding contest!)
Excellent  Good  Average  Below Average  Poor (See voting log ...)
 

Other User Comments
4/8/2002 10:05:44 PMvsim

It works,Thanks.
(If this comment was disrespectful, please report it.)

 
4/10/2002 3:43:32 AMHarry

what if i do not have a proxy server?

(If this comment was disrespectful, please report it.)

 
4/16/2002 5:22:28 PMNew User

I was looking for it and its Good so 5*,Thanks
(If this comment was disrespectful, please report it.)

 
6/22/2002 4:19:45 PMboujouj

tried to run it; ful of syntax errors.
does not work for me.
(If this comment was disrespectful, please report it.)

 
3/8/2003 1:21:32 AM

Could not connect , try another proxy server
Could not connect , try another proxy server
Could not connect , try another proxy server
Could not connect , try another proxy server
(If this comment was disrespectful, please report it.)

 
8/14/2003 1:34:29 PM

wow, looks just like another script i found on PacketStorm... Looks like someone is stealing code?
(If this comment was disrespectful, please report it.)

 

Add Your Feedback
Your feedback will be posted below and an email sent to the author. Please remember that the author was kind enough to share this with you, so any criticisms must be stated politely, or they will be deleted. (For feedback not related to this particular code, please click here instead.)
 

To post feedback, first please login.