Monday, January 25, 2010

Search PDF files in MOSS


Search is one of the powerful features of MOSS Portal. It allows the user to retrieve data seamlessly showing the right data to the right people. The Search also spreads its wings to all the office suite. Many a times the office documents are stored as PDF files and are uploaded to Portal.
Adobe has also provided filters that will enable searching the PDF files. the below steps describes the process to index within PDF files
Getting Adobe IFilter 9 to work with SharePoint
Download the Adobe IFilter. If you are using the Adobe 8, then you will need to download this from the adobe site. If you are using version 9.0, the IFilter is already installed on the machine.
Enable the PDF File Indexing
  • Download Adobe Reader 9.0, which includes IFilter 9.x.x.x, from http://www.adobe.com/products/acrobat/
  • Download the Acrobat PDF Picture. This will be used to display the pdf file icon. http://www.adobe.com/misc/linking.html
  • Add the PDF file type to the Extensions List for WSS search by editing the registry
    • Start regedit
    • Open the key HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Applications\{Random GUID}\Gather\Search\Extensions\ExtensionList
    • Add PDF to the list as a new String Value. Use a new high value e.g. if 37 is the highest value, use "38" as the key with the value "pdf"
  • Add the Acrobat PDF picture to the SharePoint templates directory. Copy the Acrobat PDF picture called pdficon_small.gif in the 12 Hive\TEMPLATE\IMAGES folder, e.g. %programfiles%\Common Files\Microsoft Shared\Web Server Extensions\12\TEMPLATE\IMAGES.
  • Bind the Acrobat PDF picture to the PDF file type
    • Open the 12 Hive\TEMPLATE\XML\DOCICON.XML file
    • Find the part
    • Add the following mapping:
  • Set IFilter mapping in registry
    • Start regedit
    • Open the key HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\
    • Add (or modify) the .pdf key
    • Add a Multi-String value with value {E8978DA6-047F-4E3D-9C78-CDBE46041603} or modify if another GUID value already exists.
    • Open the key HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\
    • Add (or modify) the .pdf key
    • Add a Multi-String value with value {E8978DA6-047F-4E3D-9C78-CDBE46041603} or modify if another GUID value already exists.
  • Add the Adobe Reader folder to the environment path variable
    • Right Click on My Computer
    • Open Properties
    • Open the Advanced tab
    • Go to the Environment variables
    • Edit the Path variable
    • Add your Reader folder to the Path list, e.g. C:\Program Files\Adobe\Reader 9.0\Reader
  • Restart the Search service by restarting your server or executing the following commands:
    • Run: net stop osearch
    • Run: net start osearch
  • Crawl the PDF documents
    • Existing PDF documents that were crawled before the Adobe PDF IFilter has been installed are not indexed during an incremental crawl. You have to edit each existing PDF file to trigger the crawler to reindex the file during an incremental crawl. It´s easier to run a full crawl after you have installed the Adobe PDF IFilter.
With this, the MOSS crawlers will crawl the content of the PDF files also, enabling the users to retrieve data from the  PDF files also

No comments:

Post a Comment