Category Archives: Search Projects

My collection of Search projects.

Apache Solr + University of Rockies

Search Engine Apache Solr added to rockies.edu.

Search Engine Apache Solr added to rockies.edu.

In the Fall of 2013, my team was tasked with R&D on integrating a search solution within the University of Rockies. Starting from the ground up, we pursued the idea of open-source search server, Apache Solr. After hours vetting out a workflow and experimenting, we were able to create a search product that not only touches base with Rockies, but can be extended to other web properties owned by the Marketing Group.

Some keypoints we put into consideration were the following:

  1. Search results….what type of results should we expose?
  2. Crawling and indexing…how do we crawl our domain and index our results?
  3. Web security…what standards do we need to put in place granted our search server is open-source?
  4. Third party dependencies…can we bring application ownership in-house?
  5. Future maintenance…what is our SOP and response time as the domain’s content changes?
  6. Technology Services protocols…what moving pieces are pertinent to change management guidelines, etc.?

The official release of UoR search went live in December 2013 and continuous improvements are slated throughout the year, so stay tuned. For now, feel free to explore this feature at, www.rockies.edu.

Crawl Metatags with Nutch 1.7

In regards to the Stackoverflow recommendation on enabling the metatag plugin, I came across a roadblock when I had to merge this solution to my integration of AJAX Solr. Unfortunately, taking the recommendation at face value caused a JavaScript error of undefined when accessing the the meta tag key/value pair from the JSON object. Granted the recommendation chained metatag.description together, it interpreted metatag to be an object that did not exist.

Reviewing the key/value structure of JSON, I came across this discussion on Parsing JSON with hyphenated key names, I thought the same would hold true for mine. That said, I’ve augmented the Stackoverflow suggestion slightly to leverage underscores versus dot syntax and came up with the following:


/* For schema.xml on Nutch and Solr */
<field name="metatag_description" type="text_general" stored="true" indexed="true"/>
<field name="metatag_keywords" type="text_general" stored="true" indexed="true"/>

/* For solrindex-mapping.xml on Nutch */
<field dest="metatag_description" source="metatag.serptitle"/>
<field dest="metatag_keywords" source="metatag.serpdescription"/>

This was implemented on Nutch 1.7 on a Solr 4.5.0 instance.

Please refer to the following for context:

  1. Extracting HTML meta tags in Nutch 2.x and having Solr 4 index it
  2. Parsing JSON with hyphenated key names
  3. Nutch – Parse Metatags