Working with Enterprise Search Part 1

This is a live blogging post from the International SharePoint Conference London 2012. So don’t expect well-written proza here.

Session by Neil Hodgkinson

Four key functions

  • Administration
  • Query
  • Crawl
  • Query Processor

Essential choices

Content Sources

  • default setup not recommended
  • split config is best practice
  • ensure permissions on webapps and upa
  • starting the crawl takes some time

Crawling

  • crawl data flow
  • crawl rules: powerful, but be aware not to take your own farm down
  • deploy ifilters on the crawl servers to crawl additional file types (icons can go on the web servers too)
  • crawler impact rules: even more dangerous but gives some control

Queries & results

  • query data flow
  • authoritative pages to influence search ranking (use exact url)
  • non-authoritative is also possible
  • metadata properties (careful with the config)
  • search scopes: powerful, but add to disk footprint and security trimming can cause timeouts
  • search result removal: uses crawl exclusion rules, do not use it to restrict visibility to stuff

Going Deeper

Stop Word Files

  • very little value in them
  • only affects query
  • just don’t use them

Thesaurus

  • replacement sets: replace words in queries
  • expansion sets: add words win queries

Dictionary

  • custom dictionary affects crawl and query
  • not used much

Query suggestions

  • enabled on the search web parts
  • enhanced with powershell

Tuning the security descriptor cache

  • query processor fills its security descriptor cache
  • you can adapt it, default is 10.000 after SP1, 1.000 in RTM
  • slow queries? could be slow security trimming (see uls logs), check your cache setting!

Crawl policies

  • item deletion controlled by policy
  • delete when access denied: 30 days
  • delete for other errors: 60 days

Indexing Performance

  • how many threads can my crawler use?
  • reduced, partly reduced or maximum
  • maximum also sets to high priority
  • set daemon memory in registry! DedicatedFilterProcessMemory - default will assume 4 GB of Ram
  • consider doubling this

Crawl db maintenance

  • limit log data retention, default is 90 days
  • consider lowering to e.g. 30 days

Crawl targets

  • use a specific server
  • consider sql resource governor to control