Important alert: (current site time 7/15/2013 8:43:17 PM EDT)
 

VB icon

Automate Periodic downloads or released files and packages

Email
Submitted on: 7/29/2000 11:59:37 PM
By: Found on the World Wide Web 
Level: Intermediate
User Rating: By 1 Users
Compatibility: 5.0 (all versions), 4.0 (all versions), 3.0 (all versions), Pre 3.0
Views: 14045
 
     Automate Periodic downloads or released files and packages. This small utility makes it possible to keep a list of URLs in a configuration file and periodically retrieve those pages or files with simple command. This utility is best suited for small batch jobs to download eg. most recent versions of the software files. If you pass an URL that is already on disk, be sure to supply option --overwrite to allow overwriting old files...
 
code:
Can't Copy and Paste this?
Click here for a copy-and-paste friendly version of this code!
				
=**************************************
= Name: Automate Periodic downloads or released files and packages
= Description:Automate Periodic downloads or released files and packages.
This small utility makes it possible to keep a list of URLs in a
configuration file and periodically retrieve those pages or files with
simple command. This utility is best suited for small batch jobs to
download eg. most recent versions of the software files. If you pass an URL
that is already on disk, be sure to supply option --overwrite to allow
overwriting old files...
= By: Found on the World Wide Web
=**************************************

#!/usr/local/bin/perl
#
# @(#) Perl - Batch download ftp or http file with config file settings.
# @(#) $Id: mywebget.pl,v 1.34 1999/03/01 21:37:15 jaalto Exp $
#
# File id
#
#.Copyright (C) 1998 Jari Aalto
#.Created: 1999-02
#.$Contactid: <jari.aalto@poboxes.com> $
#.$URL: ftp://cs.uta.fi/pub/ssjaaa/ $
#.$Keywords: Perl txt html conversion $
#.$Perl: 5.004 $
#
#This program is free software; you can redistribute it and/or
#modify it under the terms of the GNU General Public License as
#published by the Free Software Foundation; either version 2 of
#the License, or (at your option) any later version.
#
#This program is distributed in the hope that it will be useful, but
#WITHOUT ANY WARRANTY; without even the implied warranty of
#MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
#General Public License for more details.
#
#You should have received a copy of the GNU General Public License along
#with this program; if not, write to the Free Software Foundation,
#Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
#
#About program layout
#
#	Code written with Unix Emacs and indentation controlled with
#	Emacs package tinytab.el, a generic tab minor mode for programming.
#
#The {{ }}} marks you see in this file are party of file "fold"
#control package called folding.el (Unix Emacs lisp package).
#ftp://ftp.csd.uu.se/pub/users/andersl/beta/ to get the latest.
#
#There is also lines that look like # ....... &tag ... and they
#are generated by Emacs Lisp package `tinybm.el', which is also
#document structure tool. You can jump between the blocks with
#Ctrl-up and Ctrl-down keys and create those "bookmarks" with
#Emacs M-x tibm-insert. See www contact site below.
#
#Funny identifiers at the top of file
#
#The GNU RCS ident(1) program can print useful information out
#of all variables that are in format $ IDENTIFIER: text $
#See also Unix man pages for command what(1) which outputs all lines
#matching @( # ). Try commands:
#
#% what PRGNAME
#% ident PRGNAME
#
#Introduction
#
#Please start this perl script with options
#
#--help to get the help page
#
#Www contact site
#
#See http://www.netforward.com/poboxes/?jari.aalto and navigate
#to html pages in the site to get more information about me
#	and my tools (Emacs, Perl, procmail mostly)
#
#Description
#
#	If you retrieve latest versions of certain program blocks
#	periodically, this is the Perl script for you. Run from cron job
#	or once a week to upload newest versions of files around the net.
#
#	_Note:_ This in simple file by file copier and does not offer
#	any date comparing or recursive features like found from C-program
#	wget(1) http://www.ccp14.ac.uk/mirror/wget.htm and
#	ftp://prep.ai.mit.edu/pub/gnu/
#
#Change Log
#
#	(none)
use strict;
BEGIN { require 5.004 }
#A U T O L O A D
#
#The => operator quotes only words, and File::Basename is not
#Perl "word"
use autouse 'Carp' => qw( croak carp cluck confess );
use autouse 'Text::Tabs'=> qw( expand);
use autouse 'Cwd'=> qw( cwd );
use autouse 'Pod::Text' => qw( pod2text );
use autouse 'File::Copy'=> qw( copy move);
use autouse 'File::Path'=> qw( mkpath rmtree);
#Standard perl modules
use Env;
use English;
use File::Basename;
use Getopt::Long;
#Other CPAN modules
use LWP::UserAgent;
use Net::FTP;
use vars qw ( $VERSION );
#This is for use of Makefile.PL and ExtUtils::MakeMaker
#So that it puts the tardist number in format YYYY.MMDD
#The REAL version number is defined later
#The following variable is updated by my Emacs setup whenever
#this file is saved
$VERSION = '1999.0301';
# ****************************************************************************
#
#DESCRIPTION
#
#Set global variables for the program
#
#INPUT PARAMETERS
#
#none
#
#RETURN VALUES
#
#none
#
# ****************************************************************************
sub Initialize ()
{
use vars qw
(
$PROGNAME
$LIB
$RCS_ID
$VERSION
$CONTACT
$URL
	%URL_PATH_DENIED_HASH
);
$LIB	= basename $PROGRAM_NAME;
$PROGNAME= $LIB;
$RCS_ID= '$Id: mywebget.pl,v 1.34 1999/03/01 21:37:15 jaalto Exp $';
$VERSION = (split (' ', $RCS_ID))[2];# version number in format N.NN+
$CONTACT = "<jari.aalto\@poboxes.com>"; # Who is the maintainer
$URL = "ftp://cs.uta.fi/pub/ssjaaa/";
$OUTPUT_AUTOFLUSH = 1;
}
# ***************************************************************** &help ****
#
#DESCRIPTION
#
#Print help and exit.
#
#INPUT PARAMETERS
#
#$msg[optional] Reason why function was called.
#
#RETURN VALUES
#
#none
#
# ****************************************************************************
=pod
=head1 NAME
@(#) mywebget.pl - Perl Web URL retrieve program
=head1 SYNOPSIS
mywebget.pl http://example.com/ [URL] ..
mywebget.pl --config file-with-urls.txt
mywebget.pl --verbose --overwrite http://example.com/
mywebget.pl --verbose --overwrite --Output ~/dir/ http://example.com/
mywebget.pl --new --overwrite http://example.com/kit-1.1.tar.gz
=head1 OPTIONS
=head2 General options
=over 4
=item B<--Create-paths>
Create paths that do not exist in C<lcd:> directives. Normally any
LCD command that fails to find the path would interrupt the program. With
this option local directories are created as needed.
=item B<--config [FILE]>
Read URLs from configuration file. If you leave out the FILE, then
configuration is read from file pointed by enviromnet variable.
See ENVIRONMENT.
File can contain comments starting with #. You can also define tour own
variables that you use in this configuration file with the format
VARIABLE = VALUE
The right hand can refer to previously defined variables or existing
environment variables. This is C<not> Perl code although it may look that,
but just an allowed syntax in the configuration file. Notice that the
C<is no dollar to the left hand> when you define the varible name.
Here is sample:
#@(#) $HOME/.mywebget - Perl mywebget.pl configuration file
#
#	This is comment
#	Another comment
ROOT= $HOME			# define variable
UPDATE = $ROOT/updates
DOWNL = $ROOT/download
lcd: $UPDATE			# <directive> chdir here
tag1: local-copies tag1: local
	file://absolute/dir/file-1.23.tar.gz
lcd: $DOWNL
tag1: external
 tag2: external-http
	http://www.example.com/page.html
	http://www.example.com/page.html save:/dir/dir/page.html
 tag2: external-ftp
	ftp://ftp.com/dir/file.txt.gz save:xx-file.txt.gz login:foo pass:passwd x:
	lcd: $HOME/download-kit
	ftp://ftp.com/dir/kit-1.1.tar.gz new:
# End of configuration example
Possible directives in the B<ftp://> line are
=over 4
C<lcd:DIRECTORY>
Set Local download directory to DIRECTORY. Any environment variables are
substituted in path name. If this tag is found, it replaces setting of
B<--Output>. If path is not a directory, terminate with error. See also
B<--Create-paths> and B<--no-lcd>.
C<login:LOGIN-NAME>
Ftp login name. Default value used is "ftp".
C<new:>
If this is found from a current line, then the newest file will be retrieved.
This variable is reset to the value of C<--new> after the line has been
processed. Neweest means, that an ls() command is run in the ftp directory
and any files that resemble the filename given in the configuration file
is examined, sorted and heurestically determined according to file's
version number which one is latest. Time stamps of the files are not
checked. For example files that have version information in YYYYMMDD format
will most likely to be retrieved right.
The only requirement is that filename C<must> follow the universal version
numbering standard:
FILE-VERSION.extension	# de facto VERSION is defined as [\d.]+
file-19990101.tar.gz	# ok
file-1999.0101.tar.gz	# ok
file-1.2.3.5.tar.gz		# ok
file1234.txt		# not recognized. Must have "-"
file-0.23d.tar.gz		# warning !
Files that have some alphabetic version indication at the end of VERSION
are not handled correctly. Bitch the developer and persuade him to stick
to the de facto standard so that he's files can be retrieved intelligently.
C<pass:PASSWORD>
Default value is generic C<mail\@some.com> email address.
C<regexp:REGEXP>
Get all files in ftp directory matching regexp. Keyword SAVE: is ignored.
C<save:LOCAL-FILE-NAME>
Save file under this name to local disk.
C<tagN:NAME>
You can group files under C<tagN>,, so that eg. option B<--tag1> would start
downloading files from that point on until next C<tagN> is found. There
are currently unlimited number of tag levels: tag1, tag2 and tag3, so that you
can arrange your downlods hierarchially in the configuration file. For
example to download all linux files, you would give option B<--tag linux>.
To download only the NT Emacs latest binary, you would give option
B<--tag emacs-nt>. Notice that you do not give the C<level> in the option,
program will find it out from the configuration file after the tag name
matches.
The downloading stops at next tag of the C<same level>. That is, tag2 stops
only at next tag2, when upper level tag is found (tag1) or or until end of
file.
tag1: linux		# All Linux downlods under this category
	tag2: sunsitetag2: another-name-for-this-spot
	#List of files to download from here
	tag2: ftp.funet.fi
	#List of files to download from here
tag1: emacs-binary
	tag2: emacs-nt
	tag2: xemacs-nt
	tag2: emacs
	tag2: xemacs
C<x:>
Extract (Unpack) the file. See also option B<--unpack> and B<--no-unpack>
=back
=item B<--Firewall FIREWALL>
Use FIREWALL when accessing files via ftp:// protocol.
=item B<--new -n>
Get newest file. This applies to datafiles; which are all that do not end
to extension .asp .html. When new releases are announced, the version
number in filename usually tells which is the current one so getting
harcoded file with:
mtwebget.pl -o -v http://example.com/dir/program-1.3.tar.gz
is not usually practical from automation point of view. Adding B<--new>
option to the command line causes double pass: a) the whole
http://example.com/dir/ is examined for all files. b) files matching
approximately filename program-1.3.tar.gz are examined, heuristically
sorted and file with latest version number in a is retrieved.
=item B<--no-lcd>
Ignore C<lcd:> directives in configuration file.
In the configuration file, any C<lcd:> directives are obeyed as they are seen.
But if you do want to retrieve URL to your current directory, be sure to
supply this option. Otherwise the fil will end to the directory pointer by
C<lcd:> and C<save:>
=item B<--no-save>
Ignore C<save:> directives in configuration file. If the URLs have
C<save:> options, they are ignored during fething. You usually want to
combine B<--no-lcd> with B<--no-save>
=item B<--no-unpack>
Ignore C<unpack:> directives in configuration file.
=item B<--Output DIR>
Before retrieving any files, chdir to DIR.
=item B<--overwrite>
Allow overwriting existing files when retrieving URLs.
Combine this with B<--new> if you periodically update files.
=item B<--Proxy PROXY>
Use PROXY for HTTP. (See B<--Firewall> for FTP.)
=item B<--prefix PREFIX>
Add PREFIX to all retrieved files.
=item B<--Postfix POSTFIX -P POSTFIX>
Add POSTFIX to all retrieved files.
=item B<--prefix-date -D>
Add iso8601 ":YYYY-MM-DD" prefix to all retrived files.
This is added before possible B<--prefix-www> or B<--prefix>.
=item B<--prefix-www -W>
Usually the files are stored with the same name as in the URL dir, but
if you retrieve files that have identical names you can store each
page separately so that the file name is prefixed by the site name.
http://example.com/page.html--> example.com::page.html
http://example2.com/page.html--> example2.com::page.html
=item B<--regexp REGEXP>
Retrieve URLs matching REGEXP from your C<configuration> file. This cancels
B<--tag> option.
=item B<--tag NAME>
Search tag NAME from the config file and download only entries defined
under that tag. Refer to B<--config FILE> option description. You can give
Multiple B<--tag> switches. Option B<--regexp> has no effect if this has been
selected.
=item B<--unpack -x>
Unpack any files after retrieving them. The command to unpack a file
is defined in configuration file. Default Unpacking methosds are defined
as if you had typed them in the configuration file. The left hand is
regexp and right hand is command, where %s is substituted by filename.
The quotes to the left hand are required.
'\.tar\.gz$'= gzip -dc %s | tar xvf -
'\.gz$'	= gzip -d %s
'\.zip$'	= unzip -d %s
=back
=head2 Miscellaneous options
=over 4
=item B<--debug LEVEL -d LEVEL>
Turn on debug with positive LEVEL number. Zero means no debug.
This option turns on B<--verbose> too.
=item B<--help> B<-h>
Print help page.
=item B<--Version -V>
Print program's version information.
=back
=head1 README
Automate Periodic downloads or released files and packages.
This small utility makes it possible to keep a list of URLs in a
configuration file and periodically retrieve those pages or files with
simple command. This utility is best suited for small batch jobs to
download eg. most recent versions of the software files. If you pass an URL
that is already on disk, be sure to supply option B<--overwrite> to allow
overwriting old files.
If the URL ends to slash, then the directory is list on the remote machine
is stored to file name:
!path!000root-file
The content of this file can be either index.html or the directory listing
depending on the used http or ftp protocol.
While you can run this program from command line to retrieve individual
files, it has been designed t use separate configuration file via B<--config>
option. In the configuration file you can control the downloading with
separate directived like C<save:> which tells to save the file under
different name.
The simplest way to retreive a latest version of a kit from FTP site is:
mywebget.pl --new --overwite --verbose \
http://www.example.com/kit-1.00.tar.gz
Don't worry about the filename "kit-1.00.tar.gz". The latest version, say,
kit-3.08.tar.gz will be retrieve. The option B<--new> instructs to find
newer versions of the URLs.
=head1 EXAMPLES
Read a directory and stored it to filename YYYY-MM-DD::!dir!000root-file.
mywebget.pl --prefix-date --overwrite --verbose http://www.example.com/dir/
To update newest version of the kit, but only if there is none in the
disk already. The --new option instructs to find nwer packages and
the filename is used only for guidance what the file look like:
mywebget.pl --overwrite --new --verbose \
	ftp://ftp.example.com/dir/packet-1.23.tar.gz
To overwrite file and add a date prefix to the file name:
mywebget.pl --prefix-date --overwrite --verbose \
http://www.example.com/file.pl
--> YYYY-MM-DD::file.pl
To add date and WWW site prefix to the filenames:
mywebget.pl --prefix-date --prefix-www --overwrite --verbose \
http://www.example.com/file.pl
--> YYYY-MM-DD::www.example.com::file.pl
Get all updated files under KITS and use default configuration file:
mywebget.pl --verbose --config --overwrite --new --tag kits
mywebget.pl -v -c -o -n -t kits
Get files as they read in the configuration file to the current directory,
ignoring any C<lcd:> and C<save:> directives:
mywebget.pl --config $HOME/.mywebget /
--no-lcd --no-save --overwrite --verbose \
	http://www.example.com/file.pl
To check if your're C<lcd:> directives refer to live directories in disk, run the program with non-matching regexp and it parses the file and checks the
lcd's on the way:
mywebget.pl -v -r dummy-regexp
-->
mywebget.pl.DirectiveLcd: LCD [$EUSR/directory ...]
is not a directory at /users/jaalto/bin/mywebget.pl line 889.
=head1 ENVIRONMENT
Variable C<MYWEBGET_CFG> can point to the default configuration file.
=head1 SEE ALSO
C program wget(1) http://www.ccp14.ac.uk/mirror/wget.htm and
Old Perl 4 program webget(1) http://www.wg.omron.co.jp/~jfriedl/perl/
With the Libwww Perl library you find scripts
lwp-download(1) lwp-mirror(1) lwp-request(1) lwp-rget(1)
=head1 AVAILABILITY
Latest version of this file is always at CPAN
http://www.perl.com/CPAN-local//scripts/ Reach author at
jari.aalto@poboxes.com or http://www.netforward.com/poboxes/?jari.aalto
=head1 SCRIPT CATEGORIES
CPAN/Administrative
=head1 PREREQUISITES
Modules C<LWP::UserAgent> and C<use Net::FTP> are required.
=head1 COREQUISITES
No optional CPAN modules needed.
=head1 OSNAMES
C<any>
=head1 VERSION
$Id: mywebget.pl,v 1.34 1999/03/01 21:37:15 jaalto Exp $
=head1 AUTHOR
Copyright (C) 1996-1999 Jari Aalto. All rights reserved. This program is
free software; you can redistribute it and/or modify it under the same
terms as Perl itself or in terms of Gnu General Public licence v2 or later.
=cut
sub Help (;$)
{
my $id = "$LIB.Help";
my $msg = shift; # optional arg, why are we here...
pod2text $PROGRAM_NAME;
exit 1;
}
# ************************************************************** &args *******
#
#DESCRIPTION
#
#Read and interpret command line arguments
#
#INPUT PARAMETERS
#
#none
#
#RETURN VALUES
#
#none
#
# ****************************************************************************
sub HandleCommandLineArgs ()
{
my $id = "$LIB.HandleCommandLineArgs";
my ( $version );
# .......................................... command line options ...
use vars qw
(
	$CHECK_NEWEST
	$debug
	$DIR_DATE
	$CFG_FILE
	$FIREWALL
	$HELP
	$LCD_CREATE
	$NO_SAVE
	$NO_LCD
	$NO_UNPACK
	$OVERWRITE
	$OUT_DIR
	$PREFIX
	$PREFIX_DATE
	$PREFIX_WWW
	$POSTFIX
	$PROXY
	$URL_REGEXP
	$UNPACK
	$TAG_REGEXP
	@TAG_LIST
	$verb
	$MYWEBGET_CFG
);
$FIREWALL = "";
# .................................................... read args ...
Getopt::Long::config( qw
(
require_order
no_ignore_case
no_ignore_case_always
));
GetOptions # Getopt::Long
(
	 "Version"	=> \$version
	, "config:s"	=> \$CFG_FILE
	, "Create-paths"=> \$LCD_CREATE
	, "debug:i"	=> \$debug
, "d"=> \$debug
	, "Firewall=s"	=> \$FIREWALL
, "help"=> \$HELP
	, "n"		=> \$CHECK_NEWEST
	, "new"		=> \$CHECK_NEWEST
	, "no-lcd"	=> \$NO_LCD
	, "no-unpack"	=> \$NO_UNPACK
	, "no-save"	=> \$NO_SAVE
	, "overwrite"	=> \$OVERWRITE
	, "Output:s"	=> \$OUT_DIR
	, "prefix:s"	=> \$PREFIX
	, "D|prefix-date"=> \$PREFIX_DATE
	, "W|prefix-www"=> \$PREFIX_WWW
	, "Postfix:s"	=> \$POSTFIX
	, "Proxy=s"	=> \$PROXY
	, "regexp=s"	=> \$URL_REGEXP
	, "tag=s"	=> \@TAG_LIST
	, "unpack"	=> \$UNPACK
	, "x"		=> \$UNPACK
	, "verbose"	=> \$verb
);
$version	and die "$VERSION $PROGNAME $CONTACT $URL\n";
$HELP	and Help();
$debug	and $verb = 1;
if ( defined $URL_REGEXP and @TAG_LIST )
{
	die "You can't use both --tg and --regexp options.";
}
if ( defined @TAG_LIST )
{
	#-s -t -n tag--> whoopos....
	if ( grep /^-/ , @TAG_LIST )
	{
	die "$id: You have option in TAG_LIST: @TAG_LIST\n";
	}
	$TAG_REGEXP = 'tag(\d+):\s*(\S+)\b'
}
if ( defined $CFG_FILE and $CFG_FILE eq '' )
{
	unless ( defined $MYWEBGET_CFG )
	{
	die "$id: You have not set environment variable MYWEBGET_CFG";
	}
	my $file = $MYWEBGET_CFG;
	unless ( -r $file )
	{
	die "$id: MYWEBGET_CFG is not readable [$file]";
	}
	$verb and print "$id: Using default config file $file\n";
	$CFG_FILE = $file;
}
}
# ****************************************************************************
#
#DESCRIPTION
#
#Return ISO 8601 date YYYY-MM-DD
#
#INPUT PARAMETERS
#
#none
#
#RETURN VALUES
#
#$str
#
# ****************************************************************************
sub DateYYYY_MM_DD ()
{
my $id= "$LIB.DateYYYY_MM_DD";
my (@time)= localtime(time);
my $YY= 1900 + $time[5];
my ($DD, $MM) = @time[3..4];
#my ($mm, $hh) = @time[1..2];
$debug and warn "$id: @time\n";
sprintf "%d-%02d-%02d", $YY, $MM + 1, $DD;
}
# ****************************************************************************
#
#DESCRIPTION
#
#	Return temporary file name.
#
#INPUT PARAMETERS
#
#	$prefix		Prefix to use in front of filenames.
#
#RETURN VALUES
#
#$filename
#
# ****************************************************************************
sub GetTempFile ($)
{
my $id = "$LIB.GetTempFile";
my ( $prefix ) = @ARG;
# See also PerlFaq 5 IO::Handle::new_tmpfile
my $dir;
my $file = $prefix;
if ( -e "/tmp" )
{
	$dir = "/tmp";
}
elsif ( -e "/temp" )
{
	$dir = "/temp";
}
elsif ( -e "$HOME/temp" )
{
	$dir = "$HOME/temp";
}
else
{
	$dir = ".";
}
$file = "$dir/$prefix.$PROCESS_ID";
$debug and warn "$id: RET $file";
$file;
}
# ****************************************************************************
#
#DESCRIPTION
#
#	Expand given PATH by substituting any Environment variables in it.
#
#INPUT PARAMETERS
#
#	$string	Path information, like $HOME/.example
#
#RETURN VALUES
#
#	string	Expanded path.
#
# ****************************************************************************
sub ExpandPath ($)
{
my $id	= "$LIB.ExpandPath";
local ( $ARG ) = @ARG;
my ( $key, $value );
my $orig = $ARG;
while ( ($key, $value) = each %ENV )
{
	s/\$$key/$value/;
}
#	The env variables may contain leading slashes, get rid of them
#
#	[$ENV = /dir/ ]
#
#	$ENV/path--> /dir//path
#
s,//+,/,;
$debug and warn "$id:\t\t$orig ==> $ARG\n";
$ARG;
}
# ****************************************************************************
#
#DESCRIPTION
#
#	Check if the URL has access problems.
#
#INPUT PARAMETERS
#
#	$url		full filename
#	$add		[optional] Flag, if non-zero recor URL as denied.
#
#RETURN VALUES
#
#	0, 1		1 is denied.
#
# ****************************************************************************
{
my %staticDeniedhash;
sub CheckDenied ($ ; $)
{
my$id		 = "$LIB.CheckDenied";
local$ARG	 = shift;
my	 ( $add )	 = @ARG;
my $ret;
my $path = $1 if m,(.*)/,;
if ( $add )
{
	$staticDeniedhash{$path} = 1;
}
elsif ( $staticDeniedhash{$path} )
{
	print " ** Access denied: $ARG\n";
	$ret = 1;
}
$debug and warn "$id: RET $ARG --> $path\n";
$ret;
}}
# ****************************************************************************
#
#DESCRIPTION
#
#	Unpack list of files
#
#INPUT PARAMETERS
#
#	\@array	List of files
#	\%hash	Unpack command hash table: REGEXP => COMMAND, where
#		%s is substituted for filename
#
#
#RETURN VALUES
#
#none
#
# ****************************************************************************
sub Unpack ($ $)
{
my $id = "$LIB.Unpack";
my ( $filesArray, $cmdHash ) = @ARG;
my ( $regexp, $cmd , @array);
local $ARG;
for ( @$filesArray )
{
	$debug and warn "$id: unpacking $ARG\n";
	@array = sort { length $b <=> length $a } keys %$cmdHash;
print "@array\n";
	for $regexp ( @array )
	{
	$cmd = $cmdHash->{ $regexp };
	if ( /$regexp/ )
	{
		$cmd = sprintf $cmd, $ARG;
		$debugand warn "$id: [$regexp] $cmd\n";
		print "$id: ", cwd(), "/$ARG\n";
		system $cmd;
		s/$regexp//;
	}
	}
}
}
# ****************************************************************************
#
#DESCRIPTION
#
#	Read directory content
#
#INPUT PARAMETERS
#
#	$path
#
#RETURN VALUES
#
#	@	list of files
#
# ****************************************************************************
sub DirContent ($)
{
my $id	 = "$LIB.DirContent";
my ( $path ) = @ARG;
$debug and warn "$id: $path\n";
local *DIR;
unless ( opendir DIR, $path )
{
	print "$id: can't read $path $ERRNO";
	next;
}
my @tmp = readdir DIR;
closedir DIR;
$debug > 1 and warn "$id: @tmp";
@tmp;
}
# ****************************************************************************
#
#DESCRIPTION
#
#	Scan until valid tag line shows up. Return line if it is under the
#	TAG
#
#INPUT PARAMETERS
#
#	$line		line content
#	$tag		Tag name to look for
#	$reset		If set, do nothing but reset state variables.
#			You should call with this if you start a new round.
#
#RETURN VALUES
#
#	($LINE, $stop)	The $LINE is non-empty if it belongs to the TAG.
#			The $stop flag is non-zero if TAG has ended.
#
# ****************************************************************************
{
my
(
	 $staticTagLevel
	, $staticTagName
	, $staticTagFound
);
sub TagHandle ($$ ; $)
{
my $id 	= "$LIB.TagHandle";
local $ARG		= shift;
my ( $tag , $reset)	= @ARG;
# ........................................................ reset ...
if ( $reset )
{
	$debugand warn "$id: RESET";
	$staticTagLevel = $staticTagName = $staticTagFound = "";
	return $ARG;
}
# ...................................................... tag ...
my ($tagNbr, $tagN, $stop);
my %choices = /$TAG_REGEXP/go;
unless ( $staticTagFound )
{
for ( ($tagNbr, $tagN) = each %choices )
{
	if ( $debug and $tagNbr )
	{
		warn "$id: [$tagNbr] $tagN eq $tag";
	}
	if ( $tagNbr and$tagN eq $tag )
	{
		$staticTagLevel = $tagNbr;
		$staticTagName= $tagN;
		$staticTagFound = 1;
		$debug and warn "$id: TAG FOUND [$staticTagName] $ARG\n"
	}
	}
	$ARG = "" unless $staticTagFound;	# Read until TAG
}
else
{
	#We're reading lines after the tag was found.
	#Terminate teminate on next found tag name
for ( ($tagNbr, $tagN) = each %choices )
{
	if ( $tagNbr and $tagNbr <= $staticTagLevel )
	{
		$debug and warn "$id: End at [$staticTagName] $ARG\n";
		$stop = 1;
	}
	}
}
($ARG, $stop);
}}
# ****************************************************************************
#
#DESCRIPTION
#
#	Handle Local directory change and die if can't checnge to
#	directory.
#
#INPUT PARAMETERS
#
#	$dir	Where to chdir
#	$make	Flag, if allowed to create directory
#
#RETURN VALUES
#
#none
#
# ****************************************************************************
sub DirectiveLcd ($;$)
{
my $id 	 = "$LIB.DirectiveLcd";
my ( $dir , $mkdir ) = @ARG;
my $lcd = ExpandPath $dir;
unless ( -d $lcd )
{
	not $mkdir and die "$id: LCD [$dir $lcd] is not a directory";
	$verb and warn "$id: Creating directory $lcd";
	mkpath( $lcd, $verb) or die "$id: mkpath $lcd failed $ERRNO";
}
$verb	and print "$id: $lcd\n";
chdir $lcd	ordie"$id: $lcd $ERRNO";
}
# ****************************************************************************
#
#DESCRIPTION
#
#	Examine list of files and return the newest file that match FILE
#	the idea is that we divide the filename into 3 parts
#
#	PREFIX VERSION REST
#
#	So that for example filename
#
#	emacs-20.3.5.1-lisp.tar.gz
#
#	is exploded to parts
#
#	emacs -20.3.5.1- lisp.tar.gz
#
#	After this, the VERSION part is examined and all the numbers from
#	it are read and converted to zero filled hhs keys, so that sorting
#	between versions is possible:
#
#	(20 3 5 1) --> "0020.0003.0005.0001"
#
#	A hash table for each file is build according to this version key
#
#	VERSION-KEY => FILE-NAME
#
#	When we finally sort the has by key, we get the latest version number
#	and the associated file.
#
#INPUT PARAMETERS
#
#	$file	file to use as base
#	\@files	list of files
#
#RETURN VALUES
#
#	$file	File that is newest, based on version number.
#
# ****************************************************************************
sub LatestVersion ( $ $ )
{
my $id 		 = "$LIB.LatestVersion";
my ( $file , $array ) = @ARG;
$debug and warn "$id: $file --> @$array\n\n";
local$ARG= $file;
my	$ret= $file;
my	$regexp = '^(\\D+)(-\\d[-\\d.]*)(\\D+)';
my ( $pfx, $post, $ver, $max );
if ( /$regexp/o )
{
	$pfx	= $1;
	$ver	= '-([-\d.]+)'; 	#	NN.NNYYYY-MM-DD
	$post	= $3;
	$debug and warn "$id: PFX: $pfx POSTFIX: $post\n";
	my ( @try, %hash, %hash2, $key , @v , $version, $file );
# .................................................. arrange ...
	# If there is verison numbers, then sort all according
	# to version.
	for ( @$array )
	{
	unless ( /$pfx.*$post/ and /$regexp/o )
	{
		$debug and warn "$id: REJECTED\t\t$ARG\n";
		next;
	}
	$debug and warn "$id: MATCH: $1 $2 $3\n";
	$key = "";
	@v = ( /(\d+)/g );
	# Record how many separate digits we found.
	$max = @v	if @v > $max;
	while ( @v < 8 )# fill until 8 elements
	{
		push @v, 0;
	}
	for $version ( @v )
	{
		#	1.0 --> 0001.0000.0000.0000.0000.0000
		$key .= sprintf "%015d.", $version;
	}
	$hash { $key } = $ARG;
	$hash2 { $v[0] } = $ARG;
	}
	#If there were date based versions:
	#
	#	wemi-199802080907.tar.gz
	#	wemi-19980804.tar.gz
	#	wemi-199901260856.tar.gz
	#	wemi-199901262204.tar.gz
	#
	#Then sort directly by the %hash2, which only contains direct
	#NUMBER key without prefixed zeroes. For multiple numbers we
	#sort according to %hash
	if ( $max == 1 )
	{
	@try = sort { $b <=> $a } keys %hash2;
	%hash = %hash2;
	}
	else
	{
	@try = sort { $b cmp $a } keys %hash;
	}
	if ( $debug )
	{
	warn "$id: Choices: $ver $pfx.*$post\n";
	my $arg;
	for $arg ( @try )
	{
		warn "$id: $hash{$arg}\n";
	}
	}
	#If nonly one answer, then use that. Or if we grepped versioned
	#files, take the latest one.
	if ( @try )
	{
	$ret = $hash{ $try[0] };
	}
}
$debug and warn "$id: RETURN [$ret]\n";
$ret eq ''	and die "$id: Internal error, Run with debug on.";
$ret;
}
# ****************************************************************************
#
#DESCRIPTION
#
#Get file via FTP
#
#INPUT PARAMETERS
#
#	$site	Dite to connect
#	$path	dir in SITE
#
#	$getFileFile to get
#	$saveFileFile to save on local disk
#
#	$firewall
#
#	$new	Flag, Should only the newest fiel retrieved?
#
#RETURN VALUES
#
#	()	RETURN LIST whose elements are:
#
#	$statError reason or "" => ok
#	@	list of retrieved files
#
# ****************************************************************************
sub UrlFtp ($$ $$$ ; $ $$ $)
{
my $id= "$LIB.UrlFtp";
# ......................................................... args ...
my
(
	$site, $path,
	$getFile, $saveFile, $regexp,
	$firewall,
	$login, $pass,
	$new
) = @ARG;
$login = "ftp"			if $logineq "" ;
$pass = "batch-ftp\@example.com"	if $passeq "" ;
my $url = "$site/$path";
$url =~ s,/$,,;$url .= "/";	# make sure there is terminating /
return	if CheckDenied $url;
return	if CheckDenied $site;
# ............................................ private variables ...
my ( $ftp, $stat, @files);
my $timeout	= 120;
my $singleTransfer;
if ( (not defined $regexp or $regexp eq '') and ! $new )
{
	$singleTransfer = 1;
}
local $ARG;
if ( $debug )
{
	warn "$id:\n\tsingleTransfer: $singleTransfer\n",
	"\t$site, $path [firewall:$firewall] $login,$pass [new:$new]\n"
	;
}
$verb and print
	"$id: Connecting to ftp://$site$getFile --> $saveFile $regexp \n";
#	One file would be transferred, but it already exists and
#	we're not allowed to overwrite --> do nothing.
if ( $singleTransfer and -e $saveFile and not $OVERWRITE )
{
	$verb and print "$id: [ignored, exists] $saveFile\n";
	return;
}
# .................................................. make object ...
if ( $firewall ne '' )
{
$ftp = Net::FTP->new
(
$site,
(
Firewall => $firewall,
Timeout => $timeout
)
);
}
else
{
$ftp = Net::FTP->new
(
$site, ( Timeout => $timeout )
);
}
unless ( defined $ftp )
{
	print "$id: Cannot make route to $site $ERRNO\n";
	CheckDenied $site, "1-record-as-denied";
	return;
}
# ........................................................ login ...
$debug and print "$id: Login to $site ..\n";
unless ( $ftp->login($login, $pass) )
{
print "$id: Login failed $login, $pass\n";
	CheckDenied $url, "1-record-as-denied";
	goto QUIT;
}
$ftp->binary();
my $cd = $path;
$cd = dirname $path	unless $path =~ m,/$, ;
if ( $cd ne '' )
{
	unless ( $ftp->cwd($cd) )
	{
	print "$id: Remote cd $cd failed.\n";
	CheckDenied $url, "1-record-as-denied";
	goto QUIT;
	}
}
# .......................................................... get ...
$ftp->binary();
if ( $singleTransfer )
{
	$verb and print "$id: Getting file... $getFile\n";
	unless ( $ftp->get($getFile, $saveFile) )
	{
	print "$id: ... ** error $getFile\n";
	}
	push @files, $saveFile;
}
else
{
	my ( @list, $i);
	$debug and print "$id: Getting list of files $site ...\n";
	$i= 0;
	$debug and warn "$id: Running ftp dir ls()\n";
# .............................................. select list ...
	@list = $ftp->ls();
	if ( $regexp ne '' )
	{
	@list = sort grep $regexp, @list;
	}
	else
	{
	my $name = basename $getFile;
	my $file = LatestVersion $name, \@list;
	$verb and print "$id: ... Getting latest version: $file\n";
	@list = ( $file );
	}
# ................................................ get files ...
	$debug and warn "$id: List length ", scalar @list, " --> @list\n";
	for ( @list )
	{
	$i++;
	$verb and printf "$id: ... %2d%%", int ( $i * 100 / @list);
	$verb and print " $ARG\n";
	if ( -e )
	{
		if ( $new )
		{
		my $remoteSize = $ftp->size( $ARG );
		my $localSize= (stat)[7];
		if ( $remoteSize == $localSize )
		{
			$verb and print "$id: [SIZE already on disk] $ARG\n";
			next;
		}
		}
		elsif ( not $OVERWRITE )
		{
		$verb and print "$id: [OVWR already on disk] $ARG\n";
		next;
		}
	}
	unless ( $stat = $ftp->get($ARG) )
	{
		print "$id: ... ** error $ARG $ERRNO $stat\n";
		next;
	}
	push @files, $ARG;
	}
}
QUIT:
{
	$ftp->quit() if defined $ftp;
}
$debug and warn "$id: RET $stat, @files\n";
($stat, @files);
}
# ****************************************************************************
#
#DESCRIPTION
#
#	Get content of URL
#
#INPUT PARAMETERS
#
#	$url			The URL pointer
#	\%errUrlHashRef		Hahs where to store the URL-ERROR_CODE
#	\%errExplanationHashRef	Hash where to store ERROR_CODE-EXPLANATION
#
#RETURN VALUES
#
#	()	RETURN LIST whose elements are:
#
#	$statError reason or "" => ok
#	@	list of retrieved files
#
# ****************************************************************************
sub UrlHttp ( $$$ $$$ )
{
my $id = "$LIB.UrlHttp";
my
(
	 $url
	, $file
	, $new
	, $proxy
	, $errUrlHashRef
	, $errExplanationHashRef
) = @ARG;
$verb and print "$id: $url --> $file\n";
my ( $ret , @files , $obj, $tmpFile, $stat);
return	if CheckDenied $url;
# ........................................................ setup ...
my $ua = new LWP::UserAgent;
my $request = new HTTP::Request( 'GET' => $url );
if ( defined $proxy )
{
	$ua->proxy("http", $proxy);
}
# .......................................................... get ...
if ( $new and -e $file )
{
	$debug and warn "$id: Existing file, using temp file.\n";
	$tmpFile = GetTempFile "$PROGNAME.$file";
	$obj	 = $ua->request( $request , $tmpFile );
	$stat= $obj->is_success;
	if ( $stat )
	{
	my $remoteSize = (stat $tmpFile )[7];
	my $localSize= (stat $file)[7];
	if ( $remoteSize == $localSize )
	{
		$verb and print "$id: [SIZE already on disk] $file\n";
	}
	else
	{
		unless ( move( $tmpFile, $file ) )
		{
		$verb and print "$id: MOVE fail $tmpFile --> $file\n";
		}
	}
	}
	-e $tmpFile and unlink $tmpFile;
}
else
{
	$obj	 = $ua->request( $request , $file );
	$stat= $obj->is_success;
}
$debug and print "$id: content-type: ", $obj->content_type, " $stat\n";
unless ( $stat )
{
	CheckDenied $url, "1-record-as-denied";
	$errUrlHashRef->{ $url } = $obj->code;
	# There is new error code, record it.
	if ( not defined $errUrlHashRef->{ $obj->code } )
	{
	$errExplanationHashRef->{ $obj->code } = $obj->message;
	}
	$ret = $errUrlHashRef->{ $obj->code };
	print " ** error: ", $obj->message, "\n";
}
else
{
	push @files, $file;
}
( $ret, @files );
}
# ****************************************************************************
#
#DESCRIPTION
#
#	Copy content of PATH to FILE.
#
#INPUT PARAMETERS
#
#	$path	From where to read. If this is directory, read files
#		in directory. If this is file, copy file.
#
#	$file	Where to put resuts.
#	$prefix	[optional] Filename prefix
#	$postfif[optional] postfix
#
#RETURN VALUES
#
#RETURN VALUES
#
#	()	RETURN LIST whose elements are:
#
#	$statError reason or "" => ok
#	@	list of retrieved files
#
#
# ****************************************************************************
sub UrlFile ($ $ ; $$)
{
my $id = "$LIB.UrlFile";
my ( $path, $file , $prefix, $postfix ) = @ARG;
my ( $stat, @files );
$debug and warn "$id: $path, $file\n";
if ( -f $path and not -d $path )
{
	if ( $CHECK_NEWEST )
	{
	my @dir = DirContent dirname( $path );
	if ( @dir )
	{
		my $base = dirname($path);
		$file = LatestVersion basename($path) , \@dir;
		$path = $base . "/" . $file;
	}
	else
	{
		$verb and print "$id: Can't set newest $file";
	}
	}
	$file = $prefix . $file . $postfix;
	$debug and warn "$id: FileCopy $path => $file\n";
	unless ( copy($path, $file) )
	{
	print "$id: FileCopy $path => $file $ERRNO";
	}
	else
	{
	push @files, $file;
	}
}
else
{
	my @tmp = DirContent $path;
	local *FILE;
	$file =~ s,/,!,g;
	if ( -e $file and not $OVERWRITE )
	{
	print "$id: [ignored, exists] $file\n";
	return;
	}
	unless ( open FILE, ">$file" )
	{
	print "$id: can't write $file $ERRNO\n";
	return;
	}
	print FILE join "\n", @tmp;
	close FILE;
	push @files, $file;
}
( $stat, @files );
}
# ****************************************************************************
#
#DESCRIPTION
#
#
#
#INPUT PARAMETERS
#
#	\@data	Configuration file content
#
#
#RETURN VALUES
#
#none
#
# ****************************************************************************
sub Main ($$)
{
my $id = "$LIB.Main";
my ( $TAG_NAME, $data ) = @ARG;
$debug and warn "$id ********** $TAG_NAME\n";
my $date = DateYYYY_MM_DD();
my ( %URL_ERROR_HASH , %URL_ERROR_REASON_HASH );
my ( $type, $url, $path, $site, $stat , $file , $line);
my ( $origFile, $login, $pass , $sitePath, $regexp, $lcd, $new, $stop );
my ( $count, $var, $val , %variables, @files , $unpack);
my $prefix 	= "";
my $postfix = "";
local $ARG;
my %EXTRACT_HASH =
(
	 '\.tar\.gz\Z' => "gzip -d -c %s | tar xvf -"
	, '\.gz\Z'	=> "gzip -d -c %s"
	, '\.tar\Z'	=> "tar xvf %s"
	, '\.zip\Z'	=> "unzip %s"
);
# ............................................... prepare output ...
if ( $OUT_DIR )
{
	$verb		and print "$id: chdir $OUT_DIR\n";
	chdir $OUT_DIR	ordie"$id: chdir $OUT_DIR $ERRNO";
}
for ( @$data )
{
	$line = $ARG;
	chomp;
	$pass = $login = $regexp = $lcd = $file = "";
	s/[#].*$//;				# Kill comments
	next if /^\s*$/;			# ignore empty lines
# ............................................ Variable defs ...
	%variables = /(\S+)\s*=\s*(\S+)/g;
	while ( ($var, $val) = each %variables )
	{
	$debug and warn "$id:\t\t$var = $val\n";
	$ENV{ $var } = ExpandPath $val;
	}
	%variables = ();
	%variables = /'(\S+)'\s*=\s*(.*)/g;
	while ( ($var, $val) = each %variables )
	{
	$debug and warn "$id:\t\t$var = $val\n";
	$EXTRACT_HASH{ $var } = $val;
	}
# ............................................... directives ...
	$new= $CHECK_NEWEST;
	$unpack = $UNPACK;
	$pass	= $1	if /\bpass:\s*(\S+)/;
	$login	= $1	if /\blogin:\s*(\S+)/;
	$regexp	= $1	if /\bregexp:\s*(\S+)/;
	$new	= 1	if /\bnew:/;
	$unpack	= 1	if /\bx:/;
	$unpack = 0	if $NO_UNPACK;	# cancel if option given
	if ( $NO_LCD == 0 and /lcd:\s*(\S+)/ )
	{
	DirectiveLcd $1, $LCD_CREATE;
	}
# ................................................... regexp ...
	if ( defined $URL_REGEXP )
	{
	if ( not /$URL_REGEXP/o )
	{
		$debug and warn "$id: [regexp ignored] $ARG\n";
		next;
	}
	}
	if ( defined $TAG_REGEXP )
	{
	($ARG, $stop ) = TagHandle $ARG, $TAG_NAME;
	last if $stop;
	next if $ARG eq '';
	}
# ................................................. grab url ...
	m!^\s*((http|ftp|file):/?(/([^/\s]+)(\S*)))!;
	unless ( defined $1 and defined $2 )
	{
	$debug and warn "$id: [skipped] $line\n";
	next;
	}
# ............................................... components ...
	$url = $1;
	$type = $2;
	$path = $3;
	$site = $4;
	$sitePath = $5;
	$origFile = $sitePath;
	( $file = $url ) =~ s,^\s*\S+/,,;
	$file = $path . "000root-file" if $file eq "";
	if ( $NO_SAVE == 0 and /save:(\S+)/ )
	{
	$file = $1;
	}
	$postfix = $POSTFIX		if defined $POSTFIX;
	$prefix	 = $PREFIX . $prefix	if defined $PREFIX;
	$prefix = $site . "::" . $prefixif $PREFIX_WWW;
	$prefix = $date . "::" . $prefixif $PREFIX_DATE;
	$file = $prefix . $file . $postfix;
# .................................................... do-it ...
	$debug and warn "$id: <$type> <$site> <$path> <$url> <$file>\n";
	$ARG= $type;
	@files = ();
	if ( /http/i )
	{
	$count++;
	($stat, @files ) = UrlHttp $url, $file, $new
		, $PROXY
		, \%URL_ERROR_HASH , \%URL_ERROR_REASON_HASH
		;
	}
	elsif ( /ftp/i )
	{
	$count++;
	($stat, @files ) = UrlFtp $site, $sitePath
		, $origFile, $file, $regexp
		, $FIREWALL, $login, $pass
		, $new
		;
	}
	elsif ( /file/i )
	{
	($stat, @files ) = UrlFile $path, $origFile, $prefix, $postfix;
	}
# .................................................. &unpack ...
	$unpack and @files and Unpack \@files, \%EXTRACT_HASH;
}
if ( not $count and $verb)
{
	$CFG_FILE ne ''
	and printf "$id: No matching [%s] items in $CFG_FILE\n",
		$URL_REGEXP ;
	$CFG_FILE eq ''
	and print "$id: Nothing. Use config file or give URL?\n";
}
}
# ............................................................ &main ...
Initialize();
HandleCommandLineArgs();
my ( @data);
my $id = "$LIB.start";
# ......................................................... args ...
if ( defined $CFG_FILE )
{
	$verb and print "$id: Reading [$CFG_FILE]\n";
	local *F;
	open F, $CFG_FILE	or die "$id: [$CFG_FILE] $ERRNO";
	@data = <F>;
	close F;
	unless ( @data )
	{
	die "$id: Nothing found from $CFG_FILE";
	}
}
push @data, @ARGV	if @ARGV;	# Add command line URLs
local $ARG;
if ( @TAG_LIST )
{
	for ( @TAG_LIST )
	{
	TagHandle undef, undef, "1-reset";
	Main $ARG, \@data;
	}
}
else
{
	Main "", \@data;
}
0;
__END__


Other 100 submission(s) by this author

 


Report Bad Submission
Use this form to tell us if this entry should be deleted (i.e contains no code, is a virus, etc.).
This submission should be removed because:

Your Vote

What do you think of this code (in the Intermediate category)?
(The code with your highest vote will win this month's coding contest!)
Excellent  Good  Average  Below Average  Poor (See voting log ...)
 

Other User Comments


 There are no comments on this submission.
 

Add Your Feedback
Your feedback will be posted below and an email sent to the author. Please remember that the author was kind enough to share this with you, so any criticisms must be stated politely, or they will be deleted. (For feedback not related to this particular code, please click here instead.)
 

To post feedback, first please login.