article

Use Index Server to do full text extraction on PDF files

Email
Submitted on: 1/5/2015 7:30:00 AM
By: Glenn Cook (from psc cd)  
Level: Beginner
User Rating: By 3 Users
Compatibility: ASP (Active Server Pages), VbScript (browser/client side)
Views: 1234
 
     I call this the $25K solution because that's how much the next cheapest alternative was going to cost this company I was working for. Actually, we found alternatives for over $150,000 and these other solutions only purhased the site licences- no configuration, no development, no free upgrades! So the next time somebody asks you if you're worth $100 an hour, tell them that you're the expert that can help them save money! Then tell them this story...

 
				

The Ifilter is back at Adobe! Over the past two months it was removed from Adobe's site because of trade embargo restrictions.  ASP Trencher, G. Jurgele, informed me that it is available again at the FTP site and will be available on their website soon.  Here are the URL's.

  • http://www.adobe.com/prodindex/acrobat/ifilter.html
  • ftp://ftp.adobe.com/pub/adobe/acrobat/win/all/

Skip to "The Index Server Tutorial"

I call this the $25K solution because that's how much the next cheapest alternative was going to cost this company I was working for. Actually, we found alternatives for over $150,000 and these other solutions only purhased the site licences- no configuration, no development, no free upgrades!  So the next time somebody asks you if you're worth $100 an hour, tell them that you're the expert that can help them save money! Then tell them this story:

OBJECTIVE: Provide web-based full text searching on thousands of pdf files for a major car manufacturer's Intranet.  WindowsNT is the server platform of choice.

PROBLEM: This Intranet/Extranet was going to be designed to support dealerships nationally with important up-to-date technical bulletins.  In the past two years they had compiled thousands of pdf files which they were indexing with Adobe's Acrobat and they would burn CD's quarterly for all the dealerships.  This means that real imortant updates would have to be sent by Fed-X for delivery the next day in paper format which would often end up misplaced, destroyed, or just plain impossible for the mechanics to find.

SOLUTION: Adobe's "PDF Ifilter DLL"! When we found it on Adobe's website, noone knew about it.  We talked with the Adobe Engineers who referred us to companies that had $25K plus solutions for what we needed.  Microsoft's Engineers also referred us to expensive alternatives.   When we found this needle in a haystack we couldn't believe noone knew about it.   Microsoft referred to it briefly on a couple web pages and Adobe referred to it on three pages but it was barely documented and not suported by either.  It was funny when the Adobe salesperson called a week later and we told them that we were using one of their DLL's as a solution.  He blew his lid because he was in the process of setting up meetings with other vendors for us.  He was also embarrassed that none of THEIR engineers that we talked to knew about it.

You'll notice that Adobe is working on a beta DLL that works with IIS 4.0.  There is also improved documentation and more referrences to it on their site.  Unfortunately the PDF Ifilter 1.0 ONLY works with Index Server 1.0 and IIS 3.0.  The IIS4.0 beta is available as well.

The ifilter.exe file registers a dll in seconds and voila, Index server amazingly picks apart those darn pdf files.  Enjoy!

Now, go learn how to make index server work for you!  It's here ( In its "draft" stages!)


Other 7 submission(s) by this author

 


Report Bad Submission
Use this form to tell us if this entry should be deleted (i.e contains no code, is a virus, etc.).
This submission should be removed because:

Your Vote

What do you think of this article (in the Beginner category)?
(The article with your highest vote will win this month's coding contest!)
Excellent  Good  Average  Below Average  Poor (See voting log ...)
 

Other User Comments


 There are no comments on this submission.
 

Add Your Feedback
Your feedback will be posted below and an email sent to the author. Please remember that the author was kind enough to share this with you, so any criticisms must be stated politely, or they will be deleted. (For feedback not related to this particular article, please click here instead.)
 

To post feedback, first please login.