In the Fall of 2013, my team was tasked with R&D on integrating a search solution within the University of Rockies. Starting from the ground up, we pursued the idea of open-source search server, Apache Solr. After hours vetting out a workflow and experimenting, we were able to create a search product that not only touches base with Rockies, but can be extended to other web properties owned by the Marketing Group.
Some keypoints we put into consideration were the following:
Search results….what type of results should we expose?
Crawling and indexing…how do we crawl our domain and index our results?
Web security…what standards do we need to put in place granted our search server is open-source?
Third party dependencies…can we bring application ownership in-house?
Future maintenance…what is our SOP and response time as the domain’s content changes?
Technology Services protocols…what moving pieces are pertinent to change management guidelines, etc.?
The official release of UoR search went live in December 2013 and continuous improvements are slated throughout the year, so stay tuned. For now, feel free to explore this feature at, www.rockies.edu.
Reviewing the key/value structure of JSON, I came across this discussion on Parsing JSON with hyphenated key names, I thought the same would hold true for mine. That said, I’ve augmented the Stackoverflow suggestion slightly to leverage underscores versus dot syntax and came up with the following:
/* For schema.xml on Nutch and Solr */
<field name="metatag_description" type="text_general" stored="true" indexed="true"/>
<field name="metatag_keywords" type="text_general" stored="true" indexed="true"/>
/* For solrindex-mapping.xml on Nutch */
<field dest="metatag_description" source="metatag.serptitle"/>
<field dest="metatag_keywords" source="metatag.serpdescription"/>
This was implemented on Nutch 1.7 on a Solr 4.5.0 instance.