Homer Federated Search Engine

News - Statistic on EuroVOC Concept - New sort options - Verify EuroVOC Tags - Check Portal Tags - Flags & Logos - Tags Cloud - Logical Operator on Query

Here you can find

A complete search page connected with the production Engine, where you can try all the queries
The documentation to use the Federated Multilanguage Search Engine from your own portal
Some useful features like an API that exposes the list of the EuroVoc Tags
Reports to monitor the datasets inside the federation

Project

Powered by

Confinanced by

What's new

New statistic on EuroVOC Concept in each portal	30/04/2014
Specified how to sort the result of the search by creation and modified date	22/04/2014
A better way to check the validity of an EuroVOC tag	17/04/2014
check EuroVOC tags in portals, with statistic and detail list of data with tags	10/04/2014
logo of the portals and flags of the country in search results,	31/03/2014
Example code to build a Tags Cloud	18/03/2014
Detailed specification on how to use filters on logical operator AND, OR and NOT in search queries	18/03/2014

Search

Input parameter

Input text

Input Language Output Language

Tags Portal

Output

Output Format Sort by

Ascending Descending

Reset Search

Results Number of results: Elapsed time:

Back to search filter

Search url

Show search url proxy

Result list

#		Portal	Package ID	Lang	Country	Title	Detail

Detail - package

Back to result list

The Federated Multilanguage Search Engine is base on the opensource project Apache Solr. Please refer to the official documentation to have a comprehensive guide

Usefull link for documentation and tutorial

SolrWiki.org

Wiki Solr Query Sintax

SolrWiki.org

Wiki Solr Query Parameters

LuceneTutorial.com

Tutorial on Query Sintax

Use Case Semantic Multilanguage Search

Below a quick summary of the parameters needed to construct a search query

Query

The search url is composed in this way

http://opendata-federation.csi.it/fed-homer/	context/select/?	parameters

Base url	Search context	Parameters

Base URL

Is the root of every query

Search Context

documents	the data indexed	http://opendata-federation.csi.it/fed-homer/documents/select/?q=culture
ontology	the data indexed	http://opendata-federation.csi.it/fed-homer/ontology/select/?q=:

Parameters

Parameter

Description

Values

q

This is the main parameter, the search string

Logical operator AND, OR and NOT are supported. Is enough put search terms in brackets separated by <space>operator<space>

Free text, to search everything use *:*

Example

Search input	Parameter q	Url
Search data with both term *acqua* and *Piemonte*	q=(acqua AND Piemonte)	View
Search data with at least one term among *acqua* and *Piemonte*	q=(acqua OR Piemonte)	View
Search data with term *acqua* and not *Piemonte*	q=(acqua NOT Piemonte)	View

The logical operator AND, OR and NOT must be UPPERCASE

lang

the input language: when specified the search engine check if there is a correspondence in EuroVoc of the search terms (parameter q) and uses it. If not specified or if there isn't corrispondation, the search engine performs a literal search

el - ελληνικά
en - English
es - Españoles
fr - Français
it - Italiano
sl - Slovencina
sr - Српски

fq

Filter on query: it can be used multiple times in a single query, to filter on multiple values. The syntax is fq=field%3A(values separated by 'or' or 'and'). For example to filter with output language in english and portal from dati.piemonte.it or open-data.me: fq=language%3A(en)&fq=portal%3A(www.dati.piemonte.it%20or%20www.open-data.me) The colon between the field and the value must be encoded (%3A)

language
portal
tag

start

The first record for pagination. Default value: (zero), the first record

An integer number

rows

The number of records, for pagination. Default value: 10

An integer number

wt

The output format. Default value: xml

json
xml

sort

The sort order. Specify the field and if ascending or descending. For example to sort by author descending use &sort=author%20desc The space between the field and the order must be encoded (%20)

In the utility section is explained how to retreive all the possible values

Example

author
portal
...

Some useful features to help on developing the search page

1 - Check EuroVOC tag validity
Back top

Insert a tag, and check if it is an EuroVOC tag

2 - Check EuroVOC tags in portals
Back top

Choose an open data portal and check how many datasets contain EuroVOC tags

With this functionality is possible count how many dataset in an open data portal contain at least one EuroVOC tag. In the detail page, is possible to see all the datasets and their tags, where are highlighted in green datasets and tags EuroVOC compatible

3 - Autocomplete text field with EuroVOC tags
Back top

This example shows how to implement a text input with suggest on EuroVOC tags. The example load all the avaibles tags, in the url used by the ajax call is possible to add filters (eg languages)

Suggests found: 0 - URL used

How it works

This example uses the javascript plugin typeahead.js and it supposed that the input text has id='suggestText' and the language select has id='suggestLang'

var suggest = $("#suggestText").val();	
var lang = $("#suggestLang").val();								
var suggesttUrl = URL_PROXY_ONTOLOGY+encodeURIComponent("search_text_"+lang+":"+suggest+"*&start=0&rows=20&wt=json");							
suggestText.typeahead({
	source: function( request, response ) {
		$.ajax({ dataType: "json",url: suggesttUrl}).done(function( data ) {
				var suggests = new Array();
				$.each(data.response.docs, function(index, doc) {
					suggests[index] =  doc.term;
				});
				return response(suggests);
			});
		}
	});

4 - Server Proxy
Back top

Due to the same origin policy is not possible via javascript upload the contents of the search query directly in the page. This problem can be easily solved by providing a simple server side proxy that routes the calls

Below is an example of a proxy written in php.

Client - Javascript

// example1:  javascript query url that search  everythings in data and response the first 20 records in json format
var urlDocument = "proxy.php?path="+("documents/select/")+"&params=q=" +encodeURIComponent("*:*&start=0&rows=20&wt=json");

// example2:  javascript query url that search the term water in the ontology and response the first 10 records in json format
var urlDocument = "proxy.php?path="+("ontology/select/")+"&params=q=" +encodeURIComponent("term:water&start=0&rows=10&wt=json");

Server - Proxy PHP

<?php
  $url = "http://opendata-federation.csi.it/fed-homer/".$_GET['path']."?".$_GET['params'];
  $result = file_get_contents($url);
  // choose the output format 
  if (strpos($url,'wt=xml') !== false) {
	header('Content-type: application/xml');
  } 
  else{
	header('Content-type: application/json');
  }
  echo $result;
?>

5 - Tag Cloud
Back top

This example shows how to create a tag cloud. Via Javascript ask to the Federated Search Engine the statistics on Tag used, than populate a tag cloud.

Load tag url

Show load tag url proxy

HTML

<div  id='tagCloudPanel'></div>

Style

#tagCloudPanel{
	width: 500px;
	height: 400px;
}

.tag-cloud-item {
	text-align: center;
	padding: .3em;
	font-weight: bold;
}

Javascript

function loadCloudTags(){
  var tagsArray =  {"Tag": "Count"};
  // the url to get the statistic on tags ordered by count descending
  var action ="*:*&rows=0&facet=true&facet.limit=-1&wt=json&facet.sort=count&facet.field=tag";
  var url_facet = URL_PROXY_DOCUMENTS + encodeURIComponent(action);
  $.ajax({ dataType: "json",url: url_facet}).done(function( data ) {
    var counter = 0;
    var maxCount = 0;
    var minCount = 0;
    // loop on the first 30 records and save the max and the min occourrence
    $.each(data.facet_counts.facet_fields["tag"], function(value, label) {
      if(counter % 2 == 0){
        var count = data.facet_counts.facet_fields["tag"][value+1];
        tagsArray[label]= count;
        if(count>maxCount) maxCount = count;
        if(count font bigger
    var maxFontSize = 32;
    var minFontSize = 4;
    
    var cloudHtml = "";
    counter = 0;
    var tagCloudItems = new Array();
    // prepare the html items for the cloud
    $.each(tagsArray, function(label, count) {
      // the firs row is the labels (tag, count)
      if(counter>0){
        // font size as a percentage of the number of occurrences
        var fontSize = minFontSize+(count-minCount)/(maxCount-minCount)*(maxFontSize-minFontSize);
        tagCloudItems[counter-1] = "<span class='tag-cloud-item' style='font-size:"+Math.round(fontSize)+"px'>"+label+"</span> ";
      }
      counter++;
    });  
    // randomize the tags order!
    tagCloudItems.sort(function() {
      return .5 - Math.random();
    });
    
    // finnally add the tags to the cloud
    $.each(tagCloudItems, function(index, value ) {
      cloudHtml += value;
    });
    
    $("#tagCloudPanel").html(cloudHtml);

  });
}

Some useful features to help on developing the search page

1 - Homer category distribution
Back top

Category	Count

2 - Portals distribution
Back top

Portal	Count

3 - Tags distribution
Back top

Tag	Count

4 - EuroVOC Concepts in Portals distribution
Back top

Language EuroVOC Concepts

Portal	Count

Federated Search Engine Page

CSI Piemonte

Open Data Piedmont

Welcome to the Federated Search Engine

News - Statistic on EuroVOC Concept - New sort options - Verify EuroVOC Tags - Check Portal Tags - Flags & Logos - Tags Cloud - Logical Operator on Query

Here you can find

Project

Powered by

Confinanced by

What's new

Search page of the Federated Search Engine

Search

Results Number of results: Elapsed time:

Result list

Detail - package

Documentation

Query

Base URL

Search Context

Parameters

Utilities

1 - Check EuroVOC tag validity
Back top

List of tag foundShowing only the first 1000 records

2 - Check EuroVOC tags in portals
Back top

EuroVOC compatibility Verified first 1000 data

3 - Autocomplete text field with EuroVOC tags
Back top

How it works

4 - Server Proxy
Back top

Client - Javascript

Server - Proxy PHP

5 - Tag Cloud
Back top

HTML

Style

Javascript

Statistics

1 - Homer category distribution
Back top

2 - Portals distribution
Back top

3 - Tags distribution
Back top

4 - EuroVOC Concepts in Portals distribution
Back top

Welcome to the Federated Search Engine

News - Statistic on EuroVOC Concept - New sort options - Verify EuroVOC Tags - Check Portal Tags - Flags & Logos - Tags Cloud - Logical Operator on Query

Here you can find

Project

Powered by

Confinanced by

What's new

Search page of the Federated Search Engine

Search

Results Number of results: Elapsed time:

Result list

Detail - package

Documentation

Query

Base URL

Search Context

Parameters

Utilities

1 - Check EuroVOC tag validity Back top

List of tag foundShowing only the first 1000 records

2 - Check EuroVOC tags in portals Back top

EuroVOC compatibility Verified first 1000 data

3 - Autocomplete text field with EuroVOC tags Back top

How it works

4 - Server Proxy Back top

Client - Javascript

Server - Proxy PHP

5 - Tag Cloud Back top

HTML

Style

Javascript

Statistics

1 - Homer category distribution Back top

2 - Portals distribution Back top

3 - Tags distribution Back top

4 - EuroVOC Concepts in Portals distribution Back top

1 - Check EuroVOC tag validity
Back top

2 - Check EuroVOC tags in portals
Back top

3 - Autocomplete text field with EuroVOC tags
Back top

4 - Server Proxy
Back top

5 - Tag Cloud
Back top

1 - Homer category distribution
Back top

2 - Portals distribution
Back top

3 - Tags distribution
Back top

4 - EuroVOC Concepts in Portals distribution
Back top