This page is under construction ...
MOSAIC
Modular Search Application based on Index Fractions
MOSAIC enables the creation of own search applications using the OpenWebSearch.eu index.
The concept consists of the integration of an index partition exported from the Open Web Index
and a search service that builds on Apache Lucene and offers a REST API,
which makes the index searchable. Concept and implementation
should enable and encourage developers to create their own special-purpose search application.
Source Code and Demonstration
⮞ The source code and instructions are available on the OpenWebSearch.eu
GitLab
⮞ Try out the service via a web interface
Web Interface
Features, characteristics, and components
-
Easy use of one or more OpenWebSearch.eu index partitions.
Index partitions can be created and downloaded from the Open Web Index and integrated in MOSAIC.
The search can be performed in one or more indices at the same time.
-
Modular approach.
Modules can be developed and added that filter search results according to features in the metadata
and provide additional information to the search results.
-
Different usage modes.
MOSAIC can be configured to individual needs:
(a) out-of-the box with an integrated web interface,
(b) as a service with a REST API to be integrated with a search application,
(c) own metadata modules can be added, and
(d) the source code can be modified.
-
Proprietary and Open Search protocol.
The REST API offers search results in two formats, a proprietary protocol (JSON) and the Open Search protocol (XML).
-
Geo-coordinates.
MOSAIC allows to filter search results according to geo-coordinates of the locations contained
in the content.
-
Dynamic plain text management.
For performance reasons, plain text in the metadata is stripped, which decreases the size of the
metadata database and increases the search speed. Text snippets can be loaded on demand,
if the plain text is not fully available in the database.
-
Apache Lucene.
MOSAIC is based on the free and open source search engine software library Lucene
that takes care for basic search and ranking functionality.
-
DuckDB
MOSAIC employs the open-source relational database management system DuckDB
that handles the metadata of the web documents and enables filtering search results.
-
Quarkus.
In order to expose MOSAIC via a REST API, the web service framework Quarkus is used.
-
Other features.
Other common features are available, such as pagination and index information overview.
Concept and software architecture
Installation instruction
There are several ways how to try out and run MOSAIC. Detailed descriptions can be found
in the README on GitLab and in the
Tutorial
of the OpenWebSearch.eu book. In the following two quick ways are described to try out MOSAIC.
a) Download the source code, compile and start the service.
This requires git, Java 17, and Maven installed on your computer (Linux, MacOS, Windows).
-
Open a terminal and download the source code from GitLab:
git clone https://opencode.it4i.eu/openwebsearcheu-public/mosaic.git
-
Build the project (compiling and packaging the source code and creating the Lucene index):
cd mosaic/scripts/
build.sh
or build.bat
-
Start the service:
start.sh
or start.bat
-
Try out the REST API of the service in your web browser:
http://localhost:8008/search?q=graz
-
Optional: Try out the web interface in your web browser:
file://[path-to-mosaic]/mosaic/front-end/index.html
-
Optional: Include a different index. Download an index (folder with .ciff and .parquet files)
from the
demonstration repository
and save the folder with the two files in the
[path-to-mosaic]/resource
folder, then rebuild and start the service:
build.sh
or build.bat
start.sh
of start.bat
b) Use docker to run and try out MOSAIC.
This requires docker installed on your computer
-
Download and start the MOSIAC docker container:
docker run --rm -p 8008:8008 opencode.it4i.eu:5050/openwebsearcheu-public/mosaic
-
Try out the REST API of the service in your web browser:
http://localhost:8008/search?q=graz
Development instructions
⮞ Technical details on using MOSAIC are described in the
developer guide
Search application examples
Background information and license
MOSAIC has been designed and developed at CoDiS Lab, a research group
of the Institute of Interactive Systems and Data Science at the Graz University of Technology in Austria.
Involved people are Sebastian Gürtl, Alexander Nussbaumer, Christian Gütl, and previously Rohit Kaushik.
Further help and support was provided by the OpenWebSearch.eu consortium, in particular Gijs Hendriksen and Arjen de Vries from Radboud University.
The software is open source, the license will be specified soon.
Contact
For any question please contact Alexander Nussbaumer
or Sebastian Gürtl.
Imprint
... TODO ...