Important alert: (current site time 7/15/2013 8:33:12 PM EDT)
|
|
Writing Internet Clients with Perl - Fast and Easy
|
Email
|
| Submitted on: |
1/1/2002 3:58:06 PM |
| By: |
T. E. Geek
|
| Level: |
Intermediate |
| User Rating: |
By 12 Users |
| Compatibility: |
5.0 (all versions) |
| Views: |
31447 |
|
(About the author) |
|
|
|
A brief introduction to the libwww-perl module. No advanced knowledged assumed. Learn how to request webpages and then parse them within perl. Easy to learn in under five minutes.
|
| | Terms of Agreement:
By using this article, you agree to the following terms...
- You may use
this article in your own programs (and may compile it into a program and distribute it in compiled format for languages that allow it) freely and with no charge.
- You MAY NOT redistribute this article (for example to a web site) without written permission from the original author. Failure to do so is a violation of copyright laws.
- You may link to this article from another website, but ONLY if it is not wrapped in a frame.
- You will abide by any additional copyright restrictions which the author may have placed in the article or article's description.
|
Introduction to writing your very own Web Clients Welcome to this brief tutorial. This tutorial will outline the creation of simple perl scripts which have the capability of requesting HTML data from the internet, storing it in a file, mirroring it, printing it, and searching it. Perl: You've learned how to write scripts, parse text, blah blah... now what? Sure, its nice to save a text file to your hard drive... rename it... and... uh- save another text file after that. But surely there has to be somthing else- beside cgi- that perl can be used for. Here's an answer to that question- just one of the thousands found at www.cpan.org. Within five minutes you'll be writing perl scripts capable of retreaving webpages. Cool, eh? Let's begin. The first step to writing a www aware perl script is to download the libwww-perl module. This module features a non-object oriented package, and a object oriented package. For your conveniance, I'll cover the non-object oriented (it's faster and does the same stuff). To download this perl module, simply go here:
http://www.cpan.org/authors/id/GAAS/libwww-perl-5.63.tar.gz
For those wishing to do this on the Windows platform, you can find this at the ActiveState website (www.ActiveState.com, I believe)- if not, search either google for libwww-perl, or www.cpan.org, for a windows version. This tutorial was written on a linux machine, thus, I wont truly be able to refer you to the windows download location. Once you've downloaded it, un tar/zip it to a new directory. Enter the shell/DOS, change directory to the directory you unzipped/tarred to, and type the following:
perl Maketest.PL
-wait for this program to complete. If it cannot find this this file, type dir on dos, or ls on unix to view the content of the directory. Type perl and whatever file you have that ends in .PL (capitals)
Next, type
make
-wait for this to finish, then type
make test
-and then
make install
once you type all four of these commands(perl Makefile.pl -> make -> make test -> make install), your copy of Perl will have been patched to recognize the wwwlib-perl module.
Next, open up your favorite perl editor, and create a new file. In this file, you must say- "Hey, Perl-- I want to use the wwwlib-perl thingie I just installed, so stick it in my code" Which can be roughly translated into perl-speak by typing
use LWP::Simple;
So, not much of a program yet, eh? What LWP stands for is libwww-perl (Duh)... which should help you remember it. What next? Well, you've got a program that knows this modules there- but... how do you use it?
Simply! This package is called "Simple" for just that reason. Go figure. What you'll find is that your copy of Perl has suddenly been expanded to house several brand spanking new functions- thats right, simple, old fashioned, functions.
1. get($url);
2. getstore($url, $filename);
3. getprint($url);
4. mirror($url, $filename);
Oohh, ahhh! I'm sure you guys could memorize those right now. Let me tell you what each of those do, and how your expected to use them. First, you've got get($url). Simply take a scalar variable- lets say $html, and assign it to get("http://get_this_url/"). You can replace the string inside the get function with the url of whatever website you'd like to get. SO, This is an example of doing just that:
use LWP::Simple;
$html = get("http://www.megathink.com");
print $html;
A program that gets a webpage's html and crams it into a scalar
Don't forget the use LWP::Simple; at the top! That can fudge everything up. Let's say you wanna do this real quick- illiminate the need for a variable at all. Well- you've got the getprint($url); function to do that! Simply type getprint("http://www.whatever.com/");, and the program will automatically print the html of that website. What if you wanna store the html to a file on your harddrive- for backup. Or, let's say you want to start your own cache of websites. That's just as easy! Type getstore("http://url", "stored.htm");, and badda bing, badda boom- you've got a new .htm file in your directory, loaded to the brim with whatever URL you requested. A working example, you say? Sure!
use LWP::Simple;
getstore("http://www.megathink.com", "temp.htm");
Well- thats cool, huh? No? You want to be able to only store a website when it has changed-- like google does, for example? That's no big deal-- change the function in the last example ("getstore"), to mirror- leaving the parameters as they are- and the program will only store the html file if it has changed from the version already stored on disk. Cool? Yup! At this point, you have a ton of things you can do. Let's say you want to check for dead links. Simple get the html into a variable ($html = get("http://www.google.com/");), and use a couple search strings and splits on it until you find all the <.a href.> tags-- next, find the src= parameter, and add each URL to an array. Create a loop (foreach $i(@array_of_links){}) to cycle through each, and attempt to connect to them.If the link is bad, get() will return a false string ("") [there is nothing between the quotes]. Otherwise, the get will return the html. I don't want to ruin this for you- since I'm sure you'd love to try it on your own [yeaaaa, riiight]. Another "creative" idea is to write a proxy. If you run Apache on your machine (or IIS for you windowers), you can now use these functions in cgi programs. YUP! Think of the bucks you can make for writing a proxy to get past that god forsaken netnanny, or bess, software your school/home/office forces onto you. Simply write a script to accept a query_string of a URL, and use getprint() to display it. Cha-ching! I hope you enjoyed reading this tutorial, and continue to contribute to the free Perl community. It was my pleasure writing this lil' file. Please post feedback so I know whether or not I'm actually helping.
Happy New Year!
|
| Other 3 submission(s) by this author
| |
Report Bad Submission
|
Your Vote
|
| |
Other User Comments
|
1/2/2002 6:42:03 PM: T. E. Geek
I just wanted to say thank you to those who have voted. I'm always encouraged when someone benefits from somthing I do. Thank you :-) (If you'd like me to write a tutorial on anything you don't understand, simply post it- I'll write it asap) (If this comment was disrespectful, please report it.)
| 1/18/2002 5:42:05 AM: Scratch Monkey
Very nice tutorial, well written and easy to understand even for a complete Perl lameo such as myself. Keep up the good work, looking forward to more of your tutorials in the future. (If this comment was disrespectful, please report it.)
| 1/18/2002 5:23:34 PM: T. E. Geek
Thank you very much for your warm comment :-) I'm open to suggestions: If you guys have anything you'd like a tutorial written for- just post it as a comment. If no one says anything, I'll write a tutorial on writing a POP3 client- and eventually- how to use the AIM module to create chat bots ;-)
Thanks for the support! (If this comment was disrespectful, please report it.)
| 1/22/2002 4:54:06 PM: Tr1pX
Write a tutorial for making a client server application where you describe how to make the client comunicate with the server. please send a reply to this to tr1px@hackermail.com (If this comment was disrespectful, please report it.)
| 1/26/2002 9:30:39 AM: kamal
this script is so good. umph!
(If this comment was disrespectful, please report it.)
| 2/19/2002 3:30:13 PM: Terry Paul
Thanks+for+sharing+your+knowledge+of+the+subject%2E+It%27s+really+kewl+that+there+are+st ill+people+like+you+out+there+doing+good+and+sharing+what+you+know%21+Keep+up+the+awesome+work+bro%2 1 (If this comment was disrespectful, please report it.)
| 3/3/2002 1:49:03 PM: Taylor
I'd like to know how to connect to AIM using Cold Fusion, ASP, VB, or Perl. (If this comment was disrespectful, please report it.)
| 3/10/2002 4:02:53 AM: Flaxus
I got a good laugh and learned something usefull at the same time. Thanks a million (If this comment was disrespectful, please report it.)
| 3/18/2002 3:59:53 AM: Jay
How to use these functions are explained in the wwwlib manpage... This article was not needed. (If this comment was disrespectful, please report it.)
| 3/19/2002 12:11:33 AM: T. E. Geek
Thats a good point. It should also be noted that C++, C, php, perl, sql, vb, java, pascal, kylix, delphi, xml, JavaScript, VBScript, Bash, Ata, Lisp, Basic, Cobol, and for you mac users, Applescript are all various other computer-related utilities which come with documentation. Surely, one could aster any of these languages quickly and easily with the documentation provided by each. (continued) (If this comment was disrespectful, please report it.)
| 3/19/2002 12:11:53 AM: T. E. Geek
(continuation) My only question is why amazon.com offers all of those superfluous books on C++, OpenGL, ad nauseum, when anyone could simply learn what they crave through documentation.Documentation is not always user-friendly. I had hoped that this brief introduction would be a bit more friendly and easy to understand than the documentation provided. Furthermore,had you read the title of this article, you would realize that this posting was intended for those that do not know about PWL to begin with.Although this article was not intended for a user of your level, I still regret that my work was not to your liking. (If this comment was disrespectful, please report it.)
| 3/20/2002 12:13:25 AM: T. E. Geek
On a happier note, as soon as life releases me from my utterly boring school-related responsibilities, I am going to write a tutorial on the Net-AIM module; perhaps even some basic chatter-bot theory. Thank you for being so supportive of this article! I promise many more in the future. (If this comment was disrespectful, please report it.)
| 4/5/2002 1:44:31 AM: Marcel
First of all thx for your Manual. I still keep on smiling. In order to provide your Manual, i want to complete the Information for Windows-Perl-Users. - Since "ActivePerl 5.6.0.613" the LWP-Module is included ('think it was distributed in the Year 2000...) - It fits ;-) Last not least: This article was needed. Keep on going this way. (If this comment was disrespectful, please report it.)
| 4/10/2002 7:39:15 PM: Allen
########################## use LWP::Simple; use strict; use LWP::UserAgent; use CGI qw ( :standard);
print "Content-type: text/html\n\n"; my $url='http://www.yahoo.com'; my $con = get $url; print "$con"; ######################## Questions: 1) It works fine and gets the whole page info of http://www.yahoo.com but PROBLEM: if I switch to a page to this web page, get nothing. Steps Replace: my $url='http://merchantaccount.quickbooks.com/j/mas/signup';
2) When I am in a webpage, such as yahoo page, I would like to select a radio button and press "Next" to continue. How could I modify the above to do it. Need help on it. Thanks Allen...
(If this comment was disrespectful, please report it.)
| 5/3/2002 3:23:28 PM: Ddl_Smurf
Hey, thanks, that looks simple enough. I do not code in perl, but it looks like a great language. I have experience with Delphi and VB, and C, and quite a few others, but I'm really having trouble getting through perl tutorials. Could you direct me to one that is as simple and friendly as yours ? Thanks, Best regards. (If this comment was disrespectful, please report it.)
| 5/3/2002 8:25:06 PM: T. E. Geek
Well! I did write an introductory tutorial to the perl language itself awhile back ;-) If you're interested, parse the perl planetsourcecode directory, and see if you can find it. It should help you get your hands dirty ;-) (If this comment was disrespectful, please report it.)
| 11/4/2002 5:25:01 AM: FunnyPic Huh
nice little tutorial, perl tutorials are harder to understand than any other language i've learned. EXCEPT yours. I wish there were a few chapters worth of this to read. Keep up the good work (If this comment was disrespectful, please report it.)
| 4/9/2003 5:44:46 AM:
hehe quite a entertaining read indeed.. good work (If this comment was disrespectful, please report it.)
|
Add Your Feedback
Your feedback will be posted below and an email sent to
the author. Please remember that the author was kind enough to
share this with you, so any criticisms must be stated politely, or they
will be deleted. (For feedback not related to this particular article, please
click here instead.)
To post feedback, first please login.
|
|