Searching portal and wcm and maybe more with google search appliance

Wow, what a ride this has been.  Mission: expose IBM Web Content Management and WebSphere Portal to Google Search Appliance.  Should you choose to accept this mission, beware that you will go from zero to 500 mph in 2 seconds flat, and then slam on the brakes and beg for forgiveness (or puke out the window).  From a nuts and bolts perspective, and at a 30k foot level, this mission is possible.  Add in all of the constraints and oddities and the dancing kangaroo, and it becomes a train wreck (with some inkling of survival).  Here are my rules for pulling this stunt off for all future endeavors 1. A link is a link  is a link, and Google will follow.  Text is text is text, and Google will read it.  If you have things you do not want indexed, be prepared when I sit you down and tell you that your theme/skin/portlets/content is getting ready to be peppered with GoogleOn and GoogleOff.  I'll show you how, but I'm not creating the business rules 2. Google must index your content with an account that has access to the content, I cannot change this (see getting into a car without a key or breaking windows) 3. If you desire Google to only return results a user has access to, be prepared to sit down again when I explain the potential infrastructure changes you are going to need to make.   They aren't too bad, but you need to have a ray of hope as I explain it, and re-explain it, and probably 15 more times explain it.  No matter how you slice it, your content will take a hit at some point in every search request.  It's small, the world will not come crashing down.  Be open to new ideas here and we can get through it 4. Since I'm making the rules (at least right now) if you use a File Resource component in your WCM content, I will be a happier camper if you can use same authoring template for all of this content type 5. See #4 about the rules, but if you have fields i  your authoring template that you want indexed as content meta data, please use the text component, not option select, etc (at least until I complete code extension #2 to address this) 6. Be prepared to have someone that can administer your Google appliance at my disposal, and at the drop of a hat (except when I make a coffee run) 7. I'm not very fond of exclusions of content being searchable as a requirement.  You get nothing or you get everything.  If, and only if, you execute IA correctly, I can pick where to start indexing as long as all children will be included 8. Friendly URLs make this much easier, but I no longer consider this a total dependancy 9. Just a little teaser, I expect to have IBM Connections and Lotus Quickr-J completed shortly, thus expanding the options 10. Happy enterprise searching

Leave a Reply

Your email address will not be published. Required fields are marked *